GRBimport

From OpenStreetMap Wiki
Jump to: navigation, search

To repeat that: this is a draft page. Please don't take this as a reference yet!

/Background contains more background, such as more information about various entities mentioned, and the main issues we had to solve.

/Instructions for mappers are on a separate page.

/The import guidelines applied has a table of how the import guidelines are fulfilled.

You can find some meta-stuff about this page on its Talk page.

Goals

In Flanders, we have an approved on-going import of addresses from an open data government database called the CRAB (Central Reference Address Database). However, that database does not contain building outlines.

The goal of this project is to complete all building outlines together with their addresses in Flanders in a high-quality fashion. To do this, we will use another open data government database alongside the CRAB, namely the GRB (Large scale Reference Database). You can read more about the GRB at /Background#GRB. We will import outlines and addresses from the GRB, and then check addresses with CRAB. We will not simply import all data in large chunks but drip feed data, assuring maximum OSM object history retention and combining the GRB data with other sources and our own judgement.

So this is a manually verified, "assisted mapping" import, with a strong focus on data quality.

Schedule

The import wants to be a model import and does not have the objective to load the data into OSM as fast as possible, but as high quality as possible. As such there can be no end date, we will take as long as we need to. This may be until the end of 2018, 2019, or even later. Once all buildings have been added, we will update them when the GRB is updated.

Import data

Background

Data sources: GRB and CRAB
Data licence: Flemish Gratis Open Data Licentie. GRB
Type of license (if applicable): custom, requires attribution (see #Licence below)
OSM attribution: Contributors#AGIV, source on each changeset
ODbL Compliance verified: yes (done for separate CRAB import, this is the same licence)

The subsets to be imported from the GRB (the GRB calls them entities) are Gbg (buildings), Knw (man-made objects like bridges) and Gba (building attachments).

Licence

The GRB and CRAB are released under the Flemish Open Data Licence. It's a public domain with attribution style licence that is designed to be compatible with the UK Open Government Licence, the French Licence Ouverte, CC BY 3.0 and the Open Data Commons Attribution licence 1.0.

For the CRAB import it was established that this licence is compatible with the ODbL.

GRB

Licence grant on the old AGIV site (archived)

Licence grant on the Flemish government's website (archived) Unported translation by us:

Every natural person, each legal person and each group can use the GRB free of charge under the license “Free of charge open data license Flanders v1.02” (Gratis open data licentie Vlaanderen v1.02).
The sole condition is a requirement of attribution to the data set and its owner upon sharing or distributing (publishing etc.) the data. In concreto this means you should mention ‘Source: Large Scale Reference Database Flanders, AGIV[sic]’. You can read all conditions in the “Free of charge open data license Flanders”.

CRAB

Licence grant on the Flemish government's website (archived) Unported translation by us:

Every natural person, each legal person and each group can use the GRB free of charge under the license “Free of charge open data license Flanders v1.0” (Gratis open data licentie Vlaanderen v1.0).
The sole condition is a requirement of attribution to the data set and its owner upon sharing or distributing (publishing etc.) the data. In concreto this means you should mention ‘Source: Informatie Vlaanderen’. You can read all conditions in the “Free of charge open data license Flanders”.

OSM data

The OSM data is created on the fly by the import tool built by Belgian community member Glenn Plass.

An example of the data returned by this tool can be obtained in GeoJSON by executing this command:

   curl 'http://grbtiles.byteless.net/postgis_geojson.php?bbox=352711.8458126,6581269.4005653,353825.55720112,6581655.7658565' --compressed

Import type

This is a semi-automated import, we want the data to be verified by a human and merged with existing data.

JOSM and the Replace Geometry tool from its UtilsPlugin2 play a crucial role in our workflow.

The data will be updateable through the use of IDs that are added to each imported building.

Using the web tool we exclude buildings that are already imported (using the IDs added to imported objects) and take diffs in the web browser by combining JOSM features with Overpass API.

Data preparation

Import website

GRB

A member of our community, User icon 2.svgGlenn (Glenn Plas on osm), created a dedicated web platform to prepare GRB data. A huge thanks to Glenn for being the frontman of the import and for creating the tool.

Beta versions of this web-based tool have been open for testing by the general public for over a year. It is located at http://grbtiles.byteless.net/. Its internal documentation is located at http://grbtiles.byteless.net/docs/, but that's slightly outdated.

The GRB tool clearly said that the import wasn't yet authorized. In July 2017 we discovered that some people had been ignoring that warning, and even worse: some were removing OSM buildings and literally dumping GRB data into OpenStreetMap. This had been happening particularly grave in Ghent. General access to the tool was promptly closed and we started cleaning up, a massive amount of work. If the import is approved, the web platform will reopen with better monitoring tools to avoid this (see the #QA section below).

CRAB

After using the GRB tool, the mapper should check the imported addresses using the CRAB tool. That tool uses Overpass to retrieve existing OSM addresses and displays missing or wrongly positioned ones according to the CRAB, which contains higher quality addresses, but they are data points and not linked to the GRB.

The CRAB tool is open and available at http://crab-import.osm.be/. This import was previously approved.

Data reduction and simplification

In the text below we will use the term "export" to mean loading data from the GRB tool in JOSM for integrating with existing OSM data. "The tool removes" will mean that the tool doesn't export certain data.

Overnoding in the GRB source is tackled in the web based tool. Importers can do further simplification with the Simplify Area plugin for JOSM.

Objects from different GRB layers are automatically "glued" together with common nodes when appropriate when they are exported.

Tagging plan

The import platform removes objects that clash with their OSM counterparts. As an example, GRB uses just two classifications of buildings: main buildings and non-main buildings. The tool will use a heuristic to guess the building type using the existing landuse it is in. Red x.svg To do: elaborate on which building tag is exported by default. The importing mapper is expected to check and correct the building type using common sense, aerial imagery and street-level imagery.

The web tool exports objects with the following tags:

Importing mappers add other tags manually where needed. In particular the case of building passages has to be done manually, using the tag tunnel=building_passage on the way below.

Source tags

On the changesets, mappers are required to include the item GRB in the source tag.

We deciced not to put a source=* tag on each individual object. It doesn't honour the history of the object, and when the importing mapper has done its job well, it has combined GRB with other sources, so source=GRB would not be correct.

The tool does however export some tags to allow coupling the GRB and OSM objects. This is absolutely necessary to maintain a stable link between the two. Using them we can track which buildings have been imported. Additionally, when the GRB is updated, the web tool can easily see whether the corresponding OSM objects have been updated.

After careful consideration and many debates, the tags were chosen with a source:geometry namespace:

All of those values are required to uniquely identify a version of an object in the GRB. What they mean in the GRB is explained in the sections below.

We are also considering combining the entity and oidn tag, which gives just 2 tags needed to complete coupling.

source:geometry:date

As far as we could see, this date represents the last time an object was updated. We don't know whether this is the date of measurement or of the actual update itself. What we do know for sure is that this will change if buildings are being re-measured and/or the structure is being expanded or modified. GRB has regular updates so we can use this to distill a list of buildings that have been updated. The good thing about this tag is that it's human readable.

It helps future mappers that look at the object to get an idea how recent the data is. It can help to avoid edits based on older satellite pictures, aerials or other data sources.

source:geometry:entity

The entity is the layer we got the object from. It's essential as the oidn is only unique per layer in the GRB data set.

source:geometry:oidn

Together with the entity the OIDN uniquely identifies an object in GRB.

Data transformation

As mentioned before in the section #Import website, a member of our community developed a web platform to transform the GRB data to OSM data. It is not fool-proof, the importing mapper has to check everything.

Afterwards, the mapper should check the addresses using the CRAB tool. That tool displays missing or wrongly positioned addresses according to the CRAB. CRAB contains higher quality addresses, but they are data points and not linked to the GRB.

The source code of the GRB tools is available at Blacktocat.svg gplv2/grb2pgsql and Blacktocat.svg gplv2/grbtool (deployed at http://grbtiles.byteless.net/). Code for the CRAB tool is at Blacktocat.svg gplv2/aptum.github.io (deployed at http://crab-import.osm.be/).

Data merge workflow

Team approach

All experienced mappers can join the import effort. They will be monitored, see the section #Revert plans.

We will organize meetups to guide users in real life to do importing right.

References

Red x.svg To do: what do they mean with List all factors that will be evaluated in the import?

Workflow

See the /Instructions page.

Revert plans

A designated team of experienced mappers will be monitoring each mapper's first 32 GRB import changesets, and doing spot checks on later ones. They will intervene promptly: immediately revert when the import's rules aren't followed, and banning people from the tool if they don't follow the import's rules.

Bad changesets will be reverted as soon as they are detected, using Frederik Ramm's revert scripts.

Conflation

Using the JOSM plugin Replace Geometry, way history of the existing buildings will be preserved.

QA

Importers receive clear instructions on how to deliver quality work, and are also instructed to use JOSM's validator before upload.

The GRB web tool applies rate limiting Red x.svg To do: add specifics. People failing to comply with the import's rules will be banned and their work reverted, as outlined in #Revert plans.

The end result will be better than both GRB and current OSM data.