WikiProject Belgium/Building and address import/AIV GRB building import/Import plan

From OpenStreetMap Wiki
Jump to navigation Jump to search

Goals

In Flanders, we have an approved on-going import of addresses from an open data government database called the CRAB (Central Reference Address Database). However, that database does not contain building outlines. The GRB import is now also approved.

The goal of this project is to complete all building outlines together with their addresses in Flanders in a high-quality fashion. To do this, we will use another open data government database alongside the CRAB, namely the GRB (Large scale Reference Database). You can read more about the GRB at AIV_GRB_building_import/Background. We will import outlines and addresses from the GRB, and then check addresses with CRAB. We will not simply import all data in large chunks but drip feed data, assuring maximum OSM object history retention and combining the GRB data with other sources and our own judgement.

So this is a manually verified, "assisted mapping" import, with a strong focus on data quality.

If you want to join, be sure to read about the whole project, then go to AIV_GRB_building_import/Instructions to see how it works.

Schedule

The import wants to be a model import and does not have the objective to load the data into OSM as fast as possible, but as high quality as possible. As such there can be no end date, we will take as long as we need to. This may be until the end of 2018, 2019, or even later. Once all buildings have been added, we will update them when the GRB is updated.


Data preparation

Data import preparation is complete and can be re-run in an automated manner to sync with both OSM and government datasets.

Communication with the local community has been done through the Riot Chat channel, and been summarized on the Belgian subsection of the OSM forums: https://forum.openstreetmap.org/viewtopic.php?id=61597

The data can be previewed here: http://dataviewer.grbosm.site/ , by clicking 'enable info window' on the top left and hovering over buildings to see the underlying data taken into account.

As 'proof of concept' to show what the endresult could look like, this link http://tiles.grbosm.site/slide/app/index.html#16/50.9523/3.1193 gives a slide-over comparison: moving the slider at the top from left to right shows the difference between a dataset with or without the buildings.

Community buy-in

The import has been discussed very extensively on the Belgian Riot chat channel and on the talk-be mailing list. The import has also been discussed on the imports mailing list [1]. Most of the criticism there has been at least partially addressed.

Data import

Data import is ongoing and will be for a long time.

Data validation

Given this step is not automated, it's more about creating a 'how to' as guideline. This work can be continued in parallel with the data import step.

Import Plan Outline

The import happens in 3 main steps

  • Data preparation
  • Data insertion
  • Data validation

This means: take all the building data from the GRB dataset (layers 'GBG' and 'GBA'), add height data from the 3D GRB LOD1 dataset, compare data with the current OpenStreetMap data (building on current tagging in OSM, and the landuse that's currently mapped for the location), and make a suggestion on what tags the building should get. This step is done 'behind the scenes' for the entirety of Flanders, so end users will have access to this data. It will not be recalculated on the fly, but will be periodically updated.

A front-end tool will allowing the user performing the import to select a (limited) area for which he wishes to add the missing buildings. The data will be pulled from the pre-processed data from the first step, and inserted into OSM.

While the data is carefully prepared, it is still the importing party's responsability to VERIFY the data. This implies that, to ensure proper data validation, the data insertion part needs to be limited to small enough chunks to ensure it will be validated properly by humans.

Import data

Background

Data sources: GRB, CRAB and 3D GRB LOD1 DHMVII
Data licence: Flemish Gratis Open Data Licentie. GRB
Type of license (if applicable): custom, requires attribution (see #Licence below)
OSM attribution: Contributors#AIV, source on each changeset
ODbL Compliance verified: yes (done for separate CRAB import, this is the same licence)

The subsets to be imported from the GRB (the GRB calls them entities) are Gbg (buildings), Knw (man-made objects like bridges) and Gba (building attachments). A description of the GRB data is also available on this wiki page [2]

Licence

The GRB and CRAB are released under the Flemish Open Data Licence. It's a public domain with attribution style licence that is designed to be compatible with the UK Open Government Licence, the French Licence Ouverte, CC BY 3.0 and the Open Data Commons Attribution licence 1.0.

For the CRAB import it was established that this licence is compatible with the ODbL.

GRB

Licence grant on the old AGIV site (archived)

Licence grant on the Flemish government's website (archived) Unported translation by us:

Every natural person, each legal person and each group can use the GRB free of charge under the license “Free of charge open data license Flanders v1.02” (Gratis open data licentie Vlaanderen v1.02).
The sole condition is a requirement of attribution to the data set and its owner upon sharing or distributing (publishing etc.) the data. In concreto this means you should mention ‘Source: Large Scale Reference Database Flanders, AGIV[sic]’. You can read all conditions in the “Free of charge open data license Flanders”.

CRAB

Licence grant on the Flemish government's website (archived) Unported translation by us:

Every natural person, each legal person and each group can use the CRAB free of charge under the license “Free of charge open data license Flanders v1.0” (Gratis open data licentie Vlaanderen v1.0).
The sole condition is a requirement of attribution to the data set and its owner upon sharing or distributing (publishing etc.) the data. In concreto this means you should mention ‘Source: Informatie Vlaanderen’. You can read all conditions in the “Free of charge open data license Flanders”.

OSM data

The OSM data is created on the fly by the import tool built by Belgian community member Glenn Plas.

An example of the data returned by this tool can be obtained in GeoJSON by executing this command:

   curl 'http://grbtiles.byteless.net/postgis_geojson.php?bbox=352711.8458126,6581269.4005653,353825.55720112,6581655.7658565' --compressed

Import type

This is a semi-automated import, we want the data to be verified by a human and merged with existing data.

JOSM and the Replace Geometry tool from its UtilsPlugin2 play a crucial role in our workflow.

The data will be updateable through the use of IDs that are added to each imported building.

Using the web tool we exclude buildings that are already imported (using the IDs added to imported objects) and take diffs in the web browser by combining JOSM features with Overpass API.

Data preparation

A member of the Belgian community, User icon 2.svgGlenn (Glenn Plas on osm), created a dedicated web platform to prepare GRB data. See the mapper instructions GRBimport/Instructions.

The toolset is split up between a data parser and frontends. There is a dev frontend (coded without framework) and a ongoing production frontend (based on Laravel framework). The data handling side lives on itself and currently uses Terraform to launch a data cruncher on google cloud. The repository is here : https://github.com/gplv2/grb-postgis . All repos will eventually be migrated to the OSM-BE github account. This tool has 2 different branches, one is a doing the actual data conversion , the other one sets up a tile serving postgis database to create tiles to give an idea of the final result, it will remove all buildings in OSM and replace them with GRB buildings, like this one can generate tiles that can be used in a map to show what the final result would look like. The idea is not to have to run this yourself, but the code is open for all the good reasons to open this.

Other parts are : the JSON api side to export the data from postgis, the addressing tool to apply the .dbf address data directly in the database using update queries, the dev site and the prod site interface.

The source code of the GRB tools is available at gplv2/grb2pgsql and gplv2/grbtool (deployed at https://staging.grbosm.site/#/). Code for the CRAB tool is at gplv2/aptum.github.io (deployed at http://crab-import.osm.be/).


About the data

Red x.svg To do: Refer to Background

Data processing

The data processing combines data from three sources:

  1. OpenStreetMap (all building contours, landuses and their attributes)
  2. GRB data (from layers Gbg, Gba and Knw + the OIDN fields and the source date)
  3. 3D GRB data (data H_DTM_MIN, H_DTM_GEM, H_DSM_MAX, H_DSM_P99, HN_MAX and HN_P99)

In the context of 3D GRB data, DTM stands for "Terrain Model" and DSM for "Surface Model". The differences between DTM and DSM result in the building heights. The data source lists both the maximum value and a value beneath which 99% of the points are located. In turn those two allow to detect 'flat' versus 'pointy' roof structures.

Data reduction and simplification

In the text below we will use the term "export" to mean loading data from the GRB tool in JOSM for integrating with existing OSM data. "The tool removes" will mean that the tool doesn't export certain data.

Overnoding in the GRB source is tackled in the web based tool. Importers can do further simplification with the Simplify Area plugin for JOSM.

Objects from different GRB layers are automatically "glued" together with common nodes when appropriate when they are exported.

Decision tree for building types

Building on the 'extended dataset' created by combining the data from three sources, there is a decision tree in place to figure out which tag should be suggested for building=*

For refence, the decision tree is as follows (in pseudocode):

Red x.svg To do: INSERT DECISION TREE

The result of that analysis can be preview through http://grbtiles.byteless.net/grb.html, by toggling 'Enable Info Window' in the top bar and hovering over buildings.

Tagging plan

The import platform removes objects that clash with their OSM counterparts. As an example, GRB uses just two classifications of buildings: main buildings and non-main buildings. The tool will use a heuristic to guess the building type using the existing landuse it is in. Red x.svg To do: elaborate on which building tag is exported by default. The importing mapper is expected to check and correct the building type using common sense, aerial imagery and street-level imagery [3] [4].

The web tool exports objects with the following tags:

Importing mappers add other tags manually where needed. In particular the case of building passages has to be done manually, using the tag tunnel=building_passage on the way below.

Obligated tags

Only the streetname and housenumber are mandatory tags, as those are the only tags needed to have a complete and non-ambiguous address (together with the boundaries that should already be present in the data, and can be corrected or improved at any time).

  • addr:housenumber=*: The housenumber correponding to the housenumber in the source-data.

Source: Huisnummer

Source: Straatnm

Optional tags, provided by the tool

The mappers also get a lot of freedom to adapt the tagging to their own workflow or individual preferences. For the following ways of tagging, it's up to the actual mapper to decide whether or not (s)he wants to add it. The import tool allows the mapper to check some checkmarks, whether (s)he wants to add the following tags or not.

  • addr:flats=*: A list of flats derived from all addresses with different flat numbers at this position. This data is optional because:
    • There's no uniform notation, which makes the data not so usable.
    • The CRAB data is more likely to contain mistakes, so about every apartment requires a survey.
    • The data isn't needed to find an address (since all numbers should have the same entrance, and postboxes next to each other).

Source: Appartementnummer / Busnummer

  • addr:postcode=* and addr:city=*: The postal code and municipality of the address. This data is optional because:
    • It isn't needed since the boundaries are present
    • Not all mappers agree to put the municipality inside the addr:city=* tag, some prefer to put the name of the postal-code zone there, which is the name of the part-municipality in some cases.
    • It's duplicated data, making it harder to maintain.
    • Sometimes, it's handy to add the postal code and municipality to clarify border cases, or to use filters and queries in JOSM.
Optional tags, not provided by the tool
  • AssociatedStreet relation: Exact tagging for the relation is decided by the mapper. This is optional because:
    • We can't provide the relation in the import tool. The relation might already be partially present, and all member should be swapped (since we map building outlines and not nodes).
    • The data can already be derived from other tags and boundaries
    • Some users might want to add the data, to clarify border cases, or to use filters and queries in JOSM.

Source tags

On the changesets, mappers are required to include the item GRB in the source tag.

We decided not to put a source=* tag on each individual object. It doesn't honour the history of the object, and when the importing mapper has done its job well, it has combined GRB with other sources, so source=GRB would not be correct.

The tool does however export some tags to allow coupling the GRB and OSM objects. This is absolutely necessary to maintain a stable link between the two. Using them we can track which buildings have been imported. Additionally, when the GRB is updated, the web tool can easily see whether the corresponding OSM objects have been updated.

After careful consideration and many debates, the tags were chosen with a source:geometry namespace:

All of those values are required to uniquely identify a version of an object in the GRB. What they mean in the GRB is explained in the sections below.

source:geometry:date

As far as we could see, this date represents the last time an object was updated. We don't know whether this is the date of measurement or of the actual update itself. What we do know for sure is that this will change if buildings are being re-measured and/or the structure is being expanded or modified. GRB has regular updates so we can use this to distil a list of buildings that have been updated. The good thing about this tag is that it's human readable.

It helps future mappers that look at the object to get an idea of how recent the data is. It can help to avoid edits based on older aerial pictures.

source:geometry:ref

This ref uniquely identifies an object in GRB. It's a concatenation of the GRB entity (the GIS layer), and the OIDN. The entity is essential as the OIDN is only unique per layer in the GRB data set.

Automated edits to retag old work to new data model

After feedback from the imports mailing list, we decided to simplify the datamodel of the tags refering back to the GRB data. Before:

source:geometry:date=2009-12-07
source:geometry:entity=Gbg
source:geometry:oidn=2155715
source:geometry:uidn=2440819

After:

source:geometry:date=2009-12-07
source:geometry:ref=Gbg/2155715

This is being done in a series of corrections based on this script. In accordance to the automated edits guidelines, this was discussed on talk-be and on the GRB Matrix channel. First real edit. The automated edit will also remove any source=GRB tags on objects, because this is implied by the other tags and it is often factually incorrect: often not ALL the data on the building is GRB-sourced!

CRAB Red x.svg To do: move to mapper instructions or QA

After using the GRB tool, the mapper should check the imported addresses using the CRAB tool. That tool uses Overpass to retrieve existing OSM addresses and displays missing or wrongly positioned ones according to the CRAB, which contains higher quality addresses, but they are data points and not linked to the GRB.

The CRAB tool is open and available at http://crab-import.osm.be/. This import was previously approved.

Data merge workflow

Team approach

All experienced mappers can join the import effort. They will be monitored, see the section #Revert plans.

We will organize meetups to guide users in real life to do importing right.

References

Red x.svg To do: what do they mean with List all factors that will be evaluated in the import?

Workflow

See the AIV_GRB_building_import/Instructions page.

Revert plans

A designated team of experienced mappers will be monitoring each mapper's first 32 GRB import changesets, and doing spot checks on later ones. They will intervene promptly: immediately revert when the import's rules aren't followed, and banning people from the tool if they don't follow the import's rules.

Bad changesets will be reverted as soon as they are detected, using Frederik Ramm's revert scripts.

Conflation

Using the JOSM plugin Replace Geometry, way history of the existing buildings will be preserved.

Dedicated upload account

Since mappers will be mapping much more than just the addresses provided in the source dataset (building outlines will also be mapped), and in some cases, surveying is part of the job, this cannot be considered a normal import. It's more comparable to mapping stuff based on background imagery. Here the housenumbers are used as a background to map the buildings. Many users will also map things next to the housenumbers in the same session (because they surveyed something, or because they notice something on the imagery).

As such, we consider the requirement for a dedicated user account as a limitation for the contributors.

QA

There will be a continious QA through the comparison tools. Every mapper will map and control the region he knows. Next to the comparison between OSM and CRAB, other tools s.a. Osmose and keep right! will also be used from time to time. Importers receive clear instructions on how to deliver quality work, and are also instructed to use JOSM's validator before upload.

The GRB web tool applies rate limiting Red x.svg To do: add specifics. People failing to comply with the import's rules will be banned and their work reverted, as outlined in #Revert plans.

The end result will be better than both GRB and current OSM data.

When mistakes in source data are found, the AIV provides tools to notify them of those mistakes, so the mistakes can get corrected, and in the next data update, the differences between OSM and CRAB/GRB will be gone. The reaction time is dependent on the municipalities, but it's usually a few weeks.