AGIV CRAB Import

From OpenStreetMap Wiki
Jump to: navigation, search

About

This import project page is about the AGIV CRAB Adressenlijst database import. The dataset was released in 2013 under the new Flemish OpenData License.

This dataset contains a list with all addresses in the Flemish region (over 3 million addresses), together with approximate coordinates. Since addresses are constantly changing (new houses build, streets renamed, ...) and importing the addresses will take a very long time, this page also talks about the maintenance that will happen during and after the import.

Import Plan Outline

The import happens in 3 main steps

  • Convert the AGIV CRAB Adressenlijst into manageable JSON chunks per postal code (~municipality) and per street. (repeated every few weeks or months to guarantee updated CRAB data)
  • Compare the addresses in OSM with the addresses in the JSON files and report in various formats. (done automatically every time a mapper wants to compare a street or entire postal code, to guarantee the latests OSM data)
  • Load the report for one street in JOSM, and merge the data with existing OSM data.

Goals

The goal of this import is to use the high-quality dataset in order to steadily improve the addresses available in OSM. This will not be a blind import, all data will be edited by local mappers.

Schedule

It's impossible to find a schedule for this. It will be an ongoing job where every mapper works individually on his own speed.

Currently, tools are being developed and tested by OSM veterans to guarantee a smooth experience. After the testing, more people will hopefully participate, but fixing bugs in the tools will stay possible.

Import Data

Background

Addres format in Belgium

An address in Belgium is determined by its postal code, streetname, housenumber (and possibly postbox or apartment number).

In most cases, the postal codes boundaries are the same as the municipal boundaries. Only some bigger cities are split into multiple postal codes, to keep the amount of streets per postal code manageable. There are also some very special organisations (NATO, public broadcasting, ...) that get their own postal code.

A streetname per postal code is considered as unique. A housenumber is also unique per street.

Housenumbers include bis-numbers. Those are either noted with suffix capital letters (7A is a bisnumber), and sometimes with suffix numbers or descriptions ("10 bis" and "10 ter" are bis-numbers, "10/1", "10/2", ... are also bis-numbers). Bis-numbers usually arise when a new house is build between existing houses with subsequent housenumbers. F.e. when a house is build between numbers 7 and 9, the new house will most likely get number 7A (since even numbers are reserved for the other side).

Apartment and bus-numbers are not part of the housenumber. The numbering of flats in an apartment, or postboxes on an apartment is not standardised and may take any alphanumeric form.

Legal

Data source site: https://download.agiv.be/Producten/Detail?id=447
Data license: http://agiv.be/gis/producten/?artid=2101
Type of license (if applicable): Flemish OpenData License (translation: AGIV_CRAB_Import/Free_open_data_licence_Flanders)
Link to permission (if required): Permission did not need to be obtained reading the license but it was explicitly confirmed by Laura D'heer at AGIV.
OSM attribution (if required): https://wiki.openstreetmap.org/wiki/Contributors#AGIV
ODbL Compliance verified: yes

Import Type

This import will not a machine-import. The entire address dataset is divided into small tasks per street/postalcode and merged with OpenStreetMap data manually.

Data Preparation

Data Reduction & Simplification

The first script, to transform the CRAB database into JSON files, will do the data reduction and simplification. The database is one big, unsorted list of addresses up to apartment-number. The first script will split the list per postal code and per street to simplify it. It will also merge all addresses where only the apartment-number is different. Most address points should be imported to OSM as separate objects, while the flat numbers have the same front-door, thus should be mentioned as a tag on the same object.

Every address in the list also has what AGIV calls "the most precise known location". This varies from frontdoor precision (for a few thousand addresses), to the municipality center. But most addresses are precise up to building or parcel level. We can't automatically achieve a better precision than this.

The CRAB datasource also lists a housenumberlabel. This label is the merge of overlapping nodes. F.e. when housenumbers 7 and 9 are linked to the same CRAB feature. The CRAB database will have them both as separated addresses, with their respective housenumber, but their housenumberlabel will be "7-9". This can sometimes be a mistake in CRAB (f.e. the precision was only up to street position, which causes overlapping), other times, "7-9" is the valid housenumber for that one address, and should be available like this in OSM. Knowing which one is valid often needs a survey, so they will be presented to the mapper as a separated job.

Tagging Plans

We will be adding the address to the building outline.

We will only tag the street, housenumber, and the list of flats/postboxes available at that address.

  • addr:housenumber : The housenumber correponding to the housenumber in the source-data.

Source: Huisnummer

  • addr:street : The street corresponding to the streetname in the source-data.

Source: Straatnm

  • addr:flats : A list of flats derived from all addresses with different flat numbers at this position

Source: Appartementnummer / Busnummer

Changeset Tags

source = AGIV CRAB

First data transformation

In the first step, the data will be converted from one huge list to manageable chunks.

There is a python script that extracts basic address information from the AGIV CRAB dataset and outputs the result. The entire script can be found here:

https://github.com/aptum/aptum.github.io/

The script finishes in about an hour, the Belgium Lambert 72 X-Y coordinates are transformed into WSG84 coordinates, and everything is grouped in a logical structure of pcode -> streetname -> housenumber.


First Data Transformation Results

The first result is a list of streets per postal code:

example: https://github.com/aptum/aptum.github.io/blob/master/data/2920.json

This list contains useful info for processing a particular postal code. Per street, it contains the bounding box of all addresses in that street (noted with the l, r, t and b keys), it contains also the name of the street (ASCII encodes with unicode escape characters). The sanitized name of the street (the name, brought to lowercase, and filtered to ASCII alphanumeric characters) is useful when you want to treat the streetname as variable or filename on a wide range of systems and languages.

Then, per street, there is a JSON with all addresses: https://github.com/aptum/aptum.github.io/blob/master/data/8840/roeselarestraat.json

The filename of this JSON is the same as the sanitized name of the previous JSON.

The JSON contains per address

  • housenumber: the housenumber that will most likely be directly imported to OSM
  • huisnrlabel: the housenumber label, this is equal to the housenumber in most cases, but differs when different addresses are assigned to the same position
  • lat, lon: the WSG84 coordinates of the best position known by AGIV
  • TODO: flat and bus numbers
  • source: the description of the AGIV precision (ranging from front-door precision to municipality center, building and parcel level precision are the most common)
  • street: name of the street as it appears in AGIV

Second data transformation

At the second data transformation, the JSON files described above are read, and compared live with OSM data (via overpass), and converted to the OSM XML format. This transformation happens on the fly, every time a mapper requests it.

The script is available here. https://github.com/aptum/aptum.github.io/blob/master/loadStreets.js

AGIV does not claim to have the official spelling of all streets. That spelling can only be found in the municipal council reports. As such, the script ignores common spelling differences s.a. abbreviations, or the difference between a hyphen and a space.

Second Data Transformation Results

This second data transformation creates a HTML table that allows you to easily load the missing addresses in JOSM.

The HTML page can be seen here: http://aptum.github.io/import.html

Enter a postal code (like 2920 - this one shows you some unicode support), check whether you want to compare with OSM data, or if you only want to see the raw CRAB data, and click on "update".

The page has other options to filter certain streets (to make the loading faster), and to mark addresses as non-matching when they are too far from each other (useful for post-import quality control).

The generated table has 5 columns, clicking on the elements causes the data to load in JOSM (requires the most recent JOSM to be running)

  • Streetname: loads the data from OSM in the relevant BBOX to the current layer
  • Total: loads all addresses from CRAB to OSM for that street
  • Missing: loads only the addresses that are missing in OSM, and do not overlap in CRAB
  • Missing overlapping: loads only the addresses that are missing in OSM, but overlap in CRAB. These require some extra work, potential a survey.
  • Wrong: Shows all points that are in OSM, but could not be matched to an address in CRAB. This may be a mistake in CRAB, or a mistake in OSM.

Data Merge Workflow

Team Approach

The tools will be available for everyone, with specific guidelines in order to achieve optimal quality. The dataset will be used more as an additional source, than as a direct dump dataset (comparable to using Bing imagery to map things).

Workflow

Using the final webpage mentioned above, this documents a workflow to achieve optimal results:

Without houses present in OSM

When the houses are not pre-mapped in OSM, drawing the houses with their housenumber works best as follows:

Make sure you have the terracer plugin, the buildings tools plugin and the AGIV ortophoto (the AGIV ortophoto layer is more precise, more up-to-date and clearer than Bing) layer installed, and your JOSM instance opened.

Under data -> Set Building size (CTRL+ALT+B), make sure to enable "use address nodes under buildings".

  • Load the OSM data of the street in JOSM by clicking on the streetname
  • Load the missing adresses in a new layer by clicking on the number
  • For each address node:
    • If it's unclear, or you just don't want to handle that house now, delete the address node. The comparison script will see that they're not imported, so you can just do it later on, after a survey, or when you have more time.
    • If you see (in the OSM data layer) that some odd house of that street has already been mapped, copy the address node and delete it from the address layer. Paste the address data on the house in the data layer, and improve the house layout as needed.
    • In case of a single unmapped house:
      • Move the address node so it's on top of the building (in the case it wasn't already)
      • Draw the general rectangle using the buildings tool plugin (b)
      • Now the building should have the right tags already.
      • Use the extrude tool to adjust the building parts that stick out of the main rectangle. (x)
    • In case of a row of unmapped houses:
      • Move the address nodes on top of the houses
      • Draw one big rectangle over a straight row of houses
      • Select the rectangle together with all address nodes, and a corner of the rectangle marking the lowest number
      • Terrace the houses using the terracing tool (Shift+t)
      • Make sure the spacing of the houses is correct by moving boundary nodes of the houses around
      • Use the extrude tool to add more detail to the buildings only after the terracing
    • In case it doesn't belong on a house:
      • Try to find out what it belongs to. It might belong to a parcel, so just tag it to a node. It might belong to a farmyard or industrial site, then tag it on the outline of the site, or on a site relation.
  • Search for all remaining address nodes (Query: "type:node 'addr:housenumber'=*"). Reassure yourself that the nodes belong to an to make sure that no nodes are remaining in the layer. If nodes are remaining, either map them to a house, or delete them.
  • Merge the address layer with the data layer
  • Upload the data, and start your next street

With houses already present in OSM

When the houses are already mapped, the task becomes quite different.

Make sure you have the AGIV ortophoto layer, and the conflate plugin installed.

  • Make sure your current layer is either empty, or there are no data layers.
  • Load the OSM data of the street in JOSM by clicking on the streetname
  • Load the missing addresses in a new layer by clicking on the number
  • For every building next to the street you're working on:
    • Improve the accuracy using the selection and extrusion tools
    • Make sure that the building is split per address node in the missing addresses layer (i.e. in a row of houses, every house is a separated closed way)
  • For every address node in the missing addresses layer
    • Move the address node on the matching building centroid if that wasn't the case already, or delete the address node when you can't find a matching building
  • Select all buildings in the data layer (Query: "building=*")
  • Select all address nodes in the missing addresses layer (CTRL+A)
  • Enable the conflation dialog, and press "Configure"
  • Click "Freeze" in the "Reference" panel with the missing addresses layer active.
  • Click "Freeze" in the "Subject" panel with the data layer active.
  • Click "Generate matches"
  • Assure that the "Reference only" tab does not contain any data. Else there is some node that couldn't be merged.
  • Pan over the new "Conflation" layer, and check that the arrows are connecting the right address node to the right building.
  • In the "Matches" tab, select all matches (click on a match and hit CTRL+A)
  • Make sure the data layer is active to see which point it's conflating
  • Click "Conflate"
  • For every conflation point, you'll see the tags shared by both objects (in white), the new tags (in green) and the conflicting tags (in red). Check the tags and click "apply" when everything looks good.
  • Delete the "missing addresses" layer and the "conflation" layer without uploading
  • Upload the data layer, and move on to the next street.


Mapping the difficult cases

  • TODO: find a good toolchain to export the info into a tool that can be used while surveying (walking papers, smartphone app, ...)


OUTDATED: A more detailed description, which can hopefully also be used as a tutorial, can be found here: WikiProject_Belgium/Using_AGIV_Crab_data.

Dedicated upload account

Since mappers will be mapping much more than just the addresses provided in the source dataset (building outlines will also be mapped), and in some cases, surveying is part of the job, this cannot be considered a normal import. It's more comparable to mapping stuff based on background imagery. Here the housenumbers are used as a background to map the buildings. Many users will also map things next to the housenumbers in the same session (because they surveyed something, or because they notice something on the imagery).

As such, we consider the requirement for a dedicated user account as a limitation for the contributors.

Conflation

The tools provide information about the available addresses, but individual mappers must decide to draw the building outlines, or merge it with existing building outlines.

QA

There will be a continious QA through the comparison tools. Every mapper will map and control the region he knows. Next to the comparison between OSM and CRAB, other tools s.a. Osmose and keep right! will also be used from time to time.


We have a unofficial agreement to report errors in the AGIV CRAB reference data when we find them. So in the end, CRAB and OSM should be complete and correct databases.