AGIV CRAB Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

About

NOTE: this project has stalled, because we are now importing buildings with addresses instead.

This import project page is about the AGIV CRAB Adressenlijst database import. The dataset was released in 2013 under the new Flemish OpenData License.

This dataset contains a list with all addresses in the Flemish region (over 3 million addresses), together with approximate coordinates. Since addresses are constantly changing (new houses built, streets renamed, ...) and importing the addresses will take a very long time, this page also talks about the maintenance that will happen during and after the import.

Import Plan Outline

The import happens in 3 main steps

  • Convert the AGIV CRAB Adressenlijst into manageable JSON chunks per postal code (~municipality) and per street. (repeated every few weeks or months to guarantee up-to-date CRAB data)
  • Compare the addresses in OSM with the addresses in the JSON files and report in various formats. (done automatically every time a mapper wants to compare a street or entire postal code, to guarantee the latests OSM data)
  • Load the report for one street in JOSM, and merge the data with existing OSM data.

Goals

The goal of this import is to use the high-quality dataset in order to steadily improve the addresses available in OSM. This will not be a blind import, all data will be edited by local mappers.

Schedule

It's impossible to find a schedule for this. It will be an ongoing job where every mapper works individually at his own speed.

Currently, tools are being developed and tested by OSM veterans to guarantee a smooth experience. After the testing, more people will hopefully participate, but fixing bugs in the tools will stay possible.

Import Data

Background

Address format in Belgium

An address in Belgium is determined by its postal code, streetname, housenumber (and possibly postbox or apartment number).

In most cases, the postal codes boundaries are the same as the municipal boundaries. Only some bigger cities are split into multiple postal codes, to keep the amount of streets per postal code manageable. There are also some very special organisations (NATO, public broadcasting, ...) that get their own postal code.

A streetname per postal code is considered as unique. A housenumber per street is also a unique.identifier for a building or site.

Housenumbers include bis-numbers. Those are either noted with suffix capital letters (7A is a bisnumber), and sometimes with suffix numbers or descriptions ("10 bis" and "10 ter" are bis-numbers, "10/1", "10/2", ... are also bis-numbers). Bis-numbers usually arise when a new house is built between existing houses with subsequent housenumbers. F.e. when a house is built between numbers 7 and 9, the new house will most likely get number 7A or 7 bis (since even numbers are reserved for the other side).

Apartment and bus (mailbox) numbers are similar in concept. They show difference between postboxes, or doors inside a big building. They are not identifying for a building, but they identify lower subdivisions. So it's not part of the housenumber. There's no real difference between apartment and bus-numbers, except that the CRAB maintainers say that appartment numbers have some intelligence (i.e. they encode the level of the building in the number), while bus numbers don't do that. The CRAB maintainers are also considering to drop the difference, and bring everything to the same name. The lack of standardisation in the numbering means that there are also more mistakes in that part of the data, which makes it hard to import.

Legal

Data source site: https://download.vlaanderen.be/Producten/Detail?id=447&title=CRAB_Adressenlijst
Data license: https://download.vlaanderen.be/Producten/GetGebruiksvoorwaardenPDF?id=447
Type of license (if applicable): Flemish OpenData License (translation: AGIV_CRAB_Import/Free_open_data_licence_Flanders)
Link to permission (if required): Permission did not need to be obtained reading the license but it was explicitly confirmed by Laura D'heer at AGIV.
OSM attribution (if required): https://wiki.openstreetmap.org/wiki/Contributors#AGIV
ODbL Compliance verified: yes

Import Type

This import will not be a machine import. The entire address dataset is divided into small tasks per street/postalcode and merged with OpenStreetMap data manually.

Data Preparation

Data Reduction & Simplification

The first script, to transform the CRAB database into JSON files, will do the data reduction and simplification. The database is one big, unsorted list of addresses up to apartment-number. The first script will split the list per postal code and per street to simplify it. It will also merge all addresses where only the apartment-number is different. Most address points should be imported to OSM as separate objects, while the flat numbers have the same front-door, thus should be mentioned as a tag on the same object.

Every address in the list also has what AGIV calls "the most precise known location". This varies from frontdoor precision (for a few thousand addresses), to the municipality center. But most addresses are precise up to building or parcel level. We can't automatically achieve a better precision than this.

The CRAB datasource also lists a house number label. This label is the merge of overlapping nodes. F.e. when housenumbers 7 and 9 are linked to the same CRAB feature. The CRAB database will have them both as separated addresses, with their respective housenumber, but their housenumberlabel will be "7-9". This can sometimes be a mistake in CRAB (f.e. the precision was only up to street position, which causes overlapping), other times, "7-9" is the valid housenumber for that one address, and should be available like this in OSM. Knowing which one is valid often needs a survey, so they will be presented to the mapper as a separate job.

Tagging Plans

Tags will be imported on nodes from the import tool, and then mappers need to move those tags to the correct building outline (either to an existing outline, or to a new outline). The tags should appear on the "most descriptive" object. If the address refers to a single house, the address tag should be on that house. If the address refers to an industrial site, the address should be on that site object.

Obligated tags

Only the streetname and housenumber are mandatory tags, as those are the only tags needed to have a complete and non-ambiguous address (together with the boundaries that should already be present in the data, and can be corrected or improved at any time).

  • addr:housenumber=*: The housenumber correponding to the housenumber in the source-data.

Source: Huisnummer

Source: Straatnm

Optional tags, provided by the tool

The mappers also get a lot of freedom to adapt the tagging to their own workflow or individual preferences. For the following ways of tagging, it's up to the actual mapper to decide whether or not (s)he wants to add it. The import tool allows the mapper to check some checkmarks, whether (s)he wants to add the following tags or not.

  • addr:flats=*: A list of flats derived from all addresses with different flat numbers at this position. This data is optional because:
    • There's no uniform notation, which makes the data not so usable.
    • The CRAB data is more likely to contain mistakes, so about every apartment requires a survey.
    • The data isn't needed to find an address (since all numbers should have the same entrance, and postboxes next to each other).

Source: Appartementnummer / Busnummer

  • addr:postcode=* and addr:city=*: The postal code and municipality of the address. This data is optional because:
    • It isn't needed since the boundaries are present
    • Not all mappers agree to put the municipality inside the addr:city=* tag, some prefer to put the name of the postal-code zone there, which is the name of the part-municipality in some cases.
    • It's duplicated data, making it harder to maintain.
    • Sometimes, it's handy to add the postal code and municipality to clarify border cases, or to use filters and queries in JOSM.

Optional tags, not provided by the tool

  • AssociatedStreet relation: Exact tagging for the relation is decided by the mapper. This is optional because:
    • We can't provide the relation in the import tool. The relation might already be partially present, and all member should be swapped (since we map building outlines and not nodes).
    • The data can already be derived from other tags and boundaries
    • Some users might want to add the data, to clarify border cases, or to use filters and queries in JOSM.

Forbidden tags

The following tags can be downloaded through the import tool, but they only serve to inspect the quality of CRAB data (in order to determine where a mistake might be made). They may never appear in the OSM database.

  • CRAB:herkomst=*: This tag denotes the source AGIV gives for the location (in Dutch). It ranges from front-door precision to municipality center (with several interpolations in-between). It's handy to see this quality description, but it should never appear in OSM like this, because we alter the quality (and hopefully improve it) by adding the address to a building.
  • CRAB:hnrlabels=*: This tag lists the housenumber labels for that address. Housenumber labels appear when different CRAB addresses overlap (due to lack of precision, or because they belong on the same physical object). This should not get in OSM because either the precision should be improved (move addresses to the right buildings), or a double housenumber should be put in the addr:housenumber=* key.

Changeset Tags

The source tags are documented on the import site. These should be:

  • source=Agiv CRAB: We use the CRAB database from Agiv, and we use Agiv aerial imagery to draw the outlines. No other data is used in most cases.
  • source:date=yyyy-mm-dd: The extraction date of the data from CRAB to the import tool. This is needed since by the time the data is added to OSM, the CRAB data might already be changed.

First data transformation

In the first step, the data will be converted from one huge list to manageable chunks.

There is a python script that extracts basic address information from the AGIV CRAB dataset and outputs the result. The entire script can be found here:

https://github.com/aptum/aptum.github.io/

The script finishes in about an hour, the Belgium Lambert 72 X-Y coordinates are transformed into WSG84 coordinates, and everything is grouped in a logical structure of pcode -> streetname -> housenumber.


First Data Transformation Results

The first result is a list of streets per postal code:

example: https://github.com/aptum/aptum.github.io/blob/master/data/2920.json

This list contains useful info for processing a particular postal code. Per street, it contains the bounding box of all addresses in that street (noted with the l, r, t and b keys), it contains also the name of the street (ASCII encodes with unicode escape characters). The sanitized name of the street (the name, brought to lowercase, and filtered to ASCII alphanumeric characters) is useful when you want to treat the streetname as variable or filename on a wide range of systems and languages.

Then, per street, there is a JSON with all addresses: https://github.com/aptum/aptum.github.io/blob/master/data/8840/roeselarestraat.json

The filename of this JSON is the same as the sanitized name of the previous JSON.

The JSON contains per address

  • housenumber: the housenumber that will most likely be directly imported to OSM
  • huisnrlabel: the housenumber label, this is equal to the housenumber in most cases, but differs when different addresses are assigned to the same position
  • lat, lon: the WSG84 coordinates of the best position known by AGIV
  • apptnrs, busnrs: arrays with the different apparment and bus numbers on this address. Normally, only one of these lists may be present. If there are two lists present, it's an error in the CRAB database.
  • source: the description of the AGIV precision (ranging from front-door precision to municipality center, building and parcel level precision are the most common)
  • street, municiplality and pcode: name of the street and municipality and the pcode as it appears in AGIV. Again ASCII encoded with escaped unicode signs.

Second data transformation

At the second data transformation, the JSON files described above are read, and compared live with OSM data (via overpass), and converted to the OSM XML format. This transformation happens on the fly, every time a mapper requests it.

The script is available here. https://github.com/aptum/aptum.github.io/blob/master/loadStreets.js

The data in CRAB is maintained by the municipalities, it's also the municipalities that place the streetname signs, and determine the official spelling of streetnames. However, there are a number of common differences (s.a. abbreviations) that can appear. They are ignored by the comparison script (i.e. the names are considered the same, and the numbers are compared with each other).

Second Data Transformation Results

This second data transformation creates a HTML table and a map that allows you to easily load the missing addresses in JOSM.

The HTML page can be seen here: http://aptum.github.io/import.html

Enter a postal code (like 2920 - this one shows you some unicode support), check whether you want to compare with OSM data, or if you only want to see the raw CRAB data, and click on "update".

The page has other options to filter certain streets (to make the loading faster), and to mark addresses as non-matching when they are too far from each other (useful for post-import quality control).

The generated table has 5 columns, clicking on the elements causes the data to load in JOSM (requires the most recent JOSM to be running)

  • Streetname: loads the data from OSM in the relevant BBOX to the current layer
  • Total: loads all addresses from CRAB to OSM for that street
  • Missing: loads only the addresses that are missing in OSM, and do not overlap in CRAB
  • Missing overlapping: loads only the addresses that are missing in OSM, but overlap in CRAB. These require some extra work, potential a survey.
  • Wrong: Shows all points that are in OSM, but could not be matched to an address in CRAB. This may be a mistake in CRAB, or a mistake in OSM.

The table can be sorted on every column, so you can use the table to find streets with many missing addresses.

The map contains the same info, with the same links, but can be used to get a better idea of the data quality.

Data Merge Workflow

Team Approach

The tools will be available for everyone, with specific guidelines in order to achieve optimal quality. The dataset will be used more as an additional source, than as a direct dump dataset (comparable to using Bing imagery to map things).

Workflow

See WikiProject Belgium/Using AGIV Crab data/Working with AGIV Crab Data in JOSM for the workflow documentation.

Dedicated upload account

Since mappers will be mapping much more than just the addresses provided in the source dataset (building outlines will also be mapped), and in some cases, surveying is part of the job, this cannot be considered a normal import. It's more comparable to mapping stuff based on background imagery. Here the housenumbers are used as a background to map the buildings. Many users will also map things next to the housenumbers in the same session (because they surveyed something, or because they notice something on the imagery).

As such, we consider the requirement for a dedicated user account as a limitation for the contributors.

Conflation

The tools provide information about the available addresses, but individual mappers must decide to draw the building outlines, or merge it with existing building outlines.

QA

There will be a continious QA through the comparison tools. Every mapper will map and control the region he knows. Next to the comparison between OSM and CRAB, other tools s.a. Osmose and keep right! will also be used from time to time.


When mistakes in CRAB are found, the Agiv provides tools to notify them of those mistakes, so the mistakes can get corrected, and in the next data update, the differences between OSM and CRAB will be gone. The reaction time is dependent on the municipalities, but it's usually a few weeks.