Import/Catalogue/Milan addresses import

From OpenStreetMap Wiki
Jump to navigation Jump to search

About

This page is about importing addresses in OSM planet file from the data provided by the Municipality of Milan (Italy).

The Municipality of Milan released their complete address data for State of the Map 2018. More info at Arriva State of the Map a Milano, il Comune rilascia oltre 60.000 numeri civici come open data (in Italian).

The import has been discussed in this Italian OSM mailing list thread. This wiki page is the result of consensus there.

Import Data

Background

Address format

House numbering follows the European scheme. An address is determined by its streetname and housenumber. Housenumber is also unique per street.

Housenumbers can include:

  • subordinates, noted with suffix letters (e.g. in "7a", subordinate "a" ); subordinates usually arise when a new house is built between existing houses with subsequent housenumbers
  • extensions, noted with a slash "/" followed by an integer; most cases occur when a single entrance is shared by different buildings.

Legal

Data source site: https://dati.comune.milano.it/dataset/ds634-numeri-civici-coordinate

Data license: https://geoportale.comune.milano.it/sit/toponomastica/

Type of license: CC-BY-2.5-IT

Waiver: https://geoportale.comune.milano.it/sit/toponomastica/

Addendum to CC BY 2.5 IT Licence with respect to following datasets: “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi”.

In case of reuse of “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi” datasets, the attribution by OpenStreetMap and its users through http://wiki.openstreetmap.org/wiki/Contributors is sufficient to provide attribution to Comune di Milano (City of Milano) in a manner that is “reasonable to the medium or means” in accordance with Section 4(b) of the CC BY 2.5 IT license.

In case of reuse of “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi” datasets, OpenStreetMap’s method of providing references to the original dataset and original license terms through http://wiki.openstreetmap.org/wiki/Contributors satisfies the requirements of Section 4(b) of the CC BY 2.5 IT license. OpenStreetMap users satisfy the requirements of Section 4(b) of the CC BY 2.5 IT license by referencing http://wiki.openstreetmap.org/wiki/Contributors in accordance with OpenStreetMap’s attribution requirements. Comune di Milano (City of Milano) waives any limitation in Section 4(a) of the CC BY 2.5 IT license on OpenStreetMap and its users using effective technological measures on OpenStreetMap data with the understanding that the Open Database License OdBL 1.0 requires open access or parallel distribution of OpenStreetMap data. In every case, this waiver has no impact on Comune di Milano (City of Milano)’s right or ability to distribute or license the above-mentioned datasets on any terms it wishes.

ODbL Compliance verified: yes

Attribution in the Contributors page is fine for data owner as stated above. It will be enough to add the following statement in the Contributors page: “Contains data provided by Comune di Milano released under CC-BY-2.5 IT license.”

Source data

Dataset identification string is "ds634" and has been downloaded from Numeri civici con coordinate geografiche page, Milano municipality website.

Import Type

The dataset will be cleaned and OSM-formatted by Openrefine; then it will be conflated with OSM conflator and published in a shared audit maps prior to upload.

Data Preparation

Operations applied to original dataset are listed in this operations file. Due to dataset large size (60k nodes), import shall be split on "MUNICIPIO" dataset field, which matches OSM admin_level=10 boundaries.

Tagging

The CSV file consists of a collection of punctual elements, one for each housenumber.

The following fields will be evaluated:

  • NUMERO: housenumber (e.g. 11 in 11A)
  • LETTERA: subordinate (e.g. A in 11A)
  • BARRA: subordinate for numbers (e.g. 1 in 11/1)
  • BARRA2: subordinate for numbers (e.g. 1 in 11/1)
  • NUMEROCOMPLETO: complete housenumber assembled with previous fields
  • STATOCIVICO: for pruning rows (2=present; 4=only in the database; 99=suppressed)
  • DATA_SOPPRESSIONE: for pruning rows (address suppression date)
  • LONG_WGS84
  • LAT_WGS84
  • CAP postal code addr:postcode
  • MUNICIPIO used for splitting import and, optionally, for setting addr:district tag
  • IDMASTER: official housenumber id, used for conflation and optionally for OSM loc_ref tag

Housenumber

addr:housenumber has been built lowercasing NUMEROCOMPLETO field.

Sample: "94n01", "93p01", "93p02", "90/10", "88p01", "94", "73a", "90/15".

Changeset Tags

Changeset will be tagged with:

  • source=Comune di Milano
  • source:license=CC-BY-2.5
  • type=import
  • url=https://wiki.openstreetmap.org/w/index.php?title=Import/Catalogue/Milan_addresses_import

Thus people will know the data has been imported following the guidelines and they will find this page for details.

Data Transformation

After the data preparation process, the following workflow has been performed on a subset (MUNICIPIO=5):

  • dataset pruned records have been converted in a json file;
  • Json file has been processed thru OSM conflator, using this profile;
  • Preview conflated data has been uploaded in an audit map for shared review.

Data Transformation Results

After completion of the audit process, the OSM XML upload candidate file will be available here TODO

Data Merge Workflow

Non-node objects

Address data in Italy must be placed exclusively on nodes because the housenumber identifies the external access that leads from the street to the housing units (houses, stores, offices, etc). Please read https://wiki.openstreetmap.org/wiki/IT:Addresses#Regole_specifiche_per_l.27Italia (in Italian) for more details. At present date, query result for housenumbers applied to polygons or multipolygons count 1134 matches. Distance from dataset nodes and polygon centroids can often be more than conflation 10 meters usable radius, causing several cases (tagged with fixme "suppressed or wrong position: please check") that will need post-import QA inspection.

Conflation

Conflation is performed by OSM Conflator. Objects tagged ad natural=tree and denotation=natural_monument will be extracted from OSM in a bounding box defined by source dataset. Conflator output shall generate a public audit map for visual review.

OSM objects to be conflated

The following query gathers OSM objects for "Municipio 5" Milan district:

[out:xml][timeout:25];
area[name="Municipio 5"]["old_name"="Zona 5"]["admin_level"=10]->.searchArea;
(
  nwr["addr:housenumber"](area.searchArea);
);
out meta qt center;

At present (March 2020) there are about 24k addresses already present in OpenStreetMap. In Municipio 5 subset, addresses are about 1k and exported data from query above (export.osm) will be piped to conflator.

Addresses and tags already present are merged by conflator using authoritative addr:housenumber and addr:street. Existing OSM unmatched addresses will be kept in order not to remove other useful tags (amenities, shops, etc).

Matching addrs

Any matching between input dataset and OSM element within a range (defined in profile.py) shall be considered and a proposal for change will be displayed in an audit map as a blue pin.

New addrs

Any input dataset address which has not OSM matches around the above range, will generate a proposal for a new OSM address and will be displayed in an audit map as a green pin.

Not in dataset

Existing OSM elements which don't have an input dataset match will generate a proposal for a fixme tag; text shall be 'this addr is missing from source dataset: please check'. They will be displayed in an audit map as a blue pin.

Conflator output example

pi@raspberrypi:~/OSM conflate -i municipio5.json --osm export.osm -v -c previewM5.json profile.py
08:37:53 Found 421 duplicates in the dataset 
08:37:53 Read 4876 items from the dataset 
08:37:53 Downloaded 1085 objects from OSM 
08:38:13 Matched 790 points 
08:38:13 Removed 401 unmatched duplicates 
08:38:13 Adding 3685 unmatched dataset points 
08:38:14 Deleted 0 and retagged 295 unmatched objects from OSM   
pi@raspberrypi:~/OSM

Conflator re-run

Once audit is completed, online data is downloaded from conflator project page (example) and reprocessed.

pi@raspberrypi:~/OSM conflate -i municipio5.json  -a audit_MI-M5.json -o M5.osm profile.py
[some echoes...]
pi@raspberrypi:~/OSM

Candidates

Municipio Audit published Post audit conflator run File
9 2020-06-01 2021-05-18 M9.osm

Team Approach

This import is managed and supervised by:

During the upload process, the subset import will be evaluated; possibly the batching criteria will be municipal district (Municipio, in Italian).


Reverting

In case of import anomalies, changeset(s) will be reverted using OSM reverter scripts or, if possible, the JOSM Reverter Plugin.

Post-import QA

Street names

After the import, addr:street names could be slightly different than current street names.

These differences should be caught using OSM Inspector (map already centered on Milan).

Unmarked streets

The result can be used to locate areas where streets are missing.

Missing roads will be created in JOSM using PCN 2012 areal images.

Unnamed streets

The result can be used to derive street names for unnamed streets when all the nodes along the street have the same addr:street value.

Missing road names will be identified using the OpenStreetMap NoName Map Overlay:tms:http://tile3.poole.ch/noname/{zoom}/{x}/{y}.png

OSM Inspector can also be used to find these streets.

Non-node objects

Since several polygon and multipolygon OSM address objects will be tagged as in wrong place, manual adaptation or deletion has to be performed.

See also

The email to the Imports mailing list was sent on 2020-04-04 and can be found in the imports mailing list archives.