AltoAdige - Südtirol/OpenGISData House Number Import
This page is outdated. See OpenGisData House Number Import 2 for its successor.
The to be imported data originates from the autonomous province of Bolzano/Bozen, Italy. The data set contains about 130,000 addresses and covers the province completely.
History of this import
Unfortunately, this import has had a somewhat bad start: Things have gone wrong in the communication and documentation of this project.
Here is a short chronology of the history of this import:
Chronology of events
- On February 2013, the project first claimed to hand geographical data on to the OpenStreetMap community: “Vom TIS werden [die geografischen Daten] nun an die Open-Street-Map-Community weitergegeben”
- In May the project invited to a meeting where the actual intent to import house numbers by them into OSM was stated. On this occasion, the project was informed that it is crucially important to respect OSM's import guidelines.
- In June there was a second meeting concerning the import.
- On August 09 the project created the account OpenGISData and started to import the whole dataset ignoring the import guidelines.
- The project was immediately informed and after a short discussion it was decided to revert the import and restart it with proper discussion and documentation.
- In November, opengisdata.eu started a second attempt of discussing an import of the data: OpenGisData House Number Import 2.
The first import attempt
The import had to be reverted because of several shortcomings in the data, that were mostly caused by ignoring the import guidelines. It does only make sense to import data into OpenStreetMap if the project as a whole is in a better state after the import. Sheer quantity of data is unimportant if its (poor) quality renders it unmaintainable or locks the community to cleaning-up tasks for a long time.
Data issues with the first import
This is a list of some deficits the data of the first import had. This list may not complete and whether some of those issues may be bearable by the OSM community is left aside.
- The imported data did not have proper source tagging
- The import added duplicates for each of the ~6500 already existing addresses in OSM
- The import itself added a significant amount of duplicated address nodes (up to 30% of the total amount of data were duplicates)
- Names (such as street names) were all capitalized (e.g. "MEIERN" statt "Meiern")
- If the case of non localized (mostly German and Italian) names, the generic name is duplicated (e.g. "MEIERN - MEIERN")
- There is no differentiation between street based addresses and for example hamlet-based addresses (e.g. "Meiern" is actually a hamlet, not a street - an appropriate tagging for this address should use addr:place=* instead of addr:street=*)
- Each address got a cryptic project-ID that is practically unmaintainable by a mapper
- The order of languages in the generic name tags does not respect the local naming convention.
- Sometimes addresses contain abbreviated street names (e.g. "… STR." instead of "… Straße")
- No special handling of street names in Ladin
- postcodes have not been imported although they appear in the original data
- The import changesets were non properly tagged
- The used import software did create empty changesets
The to be imported data is the same as found on the GeoBrowser of the province of Bolzano/Bozen (layer "Adressen"/"Indirizzi").
AFAIK the data is provided and maintained by the individual communes and only aggregated and published by the province.
The data set contains the following information:
|frazione||Subdivision of the municipality|
All name fields are localized (German, Italian and Ladin where applicable).
Not all fields are set for all addresses. For example, the street name field is empty for place-based addresses (as it should be). And sometimes fields don't contain the appropriate information: The value for place-based addresses may be given in the street field, although no street with the given name exists.
The licensing of the data is unclear.
Currently, all the data provided by the province is only available under a restrictive non-open license. The province has communicated to be willing to publish most of its data as open data (CC0). And apparently, the OpenGisData.eu project has some kind of a permission to import the addressing data into OSM under the ODbL – this data has already been imported into OSM for a short time (the contributor did accept the CTs and the data was only removed because of quality issues).
The OpenGISData.eu project has developed a custom import tool for this import.
It was said, that this tool will be open-sourced, but until today it has not been published.
Further information about the functionality of this tool would be appreciated.
Data reduction & simplification
Because of the issues with the raw data, some substantial data preparation has to be done.
This is mainly still to be discussed, but some ideas can be found here.
Special care should be taken on how multilingual names are treated: The original data has names in German and Italian. For name=* a concatenated name "<Italian name> - <German name>" in cities and towns where Italian is dominant, and "<German name> - <Italian name>" should be used elsewhere (as outlined here). For names that are the same in both languages, the name will be in the non-concatenated form. Ladin names would have to be handled separately.
Data transformation results
A dedicated user OpenGISData has already been created for this import. (But the account is still missing further explanation and documentation.)
Data merge workflow