Import/Catalogue/Address import for Milan

From OpenStreetMap Wiki
Jump to navigation Jump to search

About

This page talks about importing addresses using the data provided the Municipality of Milan (Italy).

The Municipality of Milan released their complete address data for State of the Map 2018. More info at Arriva State of the Map a Milano, il Comune rilascia oltre 60.000 numeri civici come open data (in Italian).

The import has been discussed on the Italian OSM mailing list. This wiki page is the result of consensus there. TO DO.

Import Data

Background

Address format

House numbering follows the European scheme.

An address is determined by its streetname and housenumber.

A housenumber is also unique per street.

Housenumbers can include a subordinate. These are noted with suffix letters (e.g. in "7a", "a" is the subordinate). Subordinates usually arise when a new house is build between existing houses with subsequent housenumbers. E.g. when a house is build between numbers 7 and 9, the new house will most likely get number 7a (since even numbers are reserved for the other side).

Postal codes are missing in the source data.

Legal

Data source site: https://geoportale.comune.milano.it/ATOM/SIT/Toponomastica/NumeriCivici_Service.xml

Data license: https://geoportale.comune.milano.it/sit/toponomastica/

Type of license: CC-BY-2.5-IT

Waiver: https://geoportale.comune.milano.it/sit/toponomastica/

Addendum to CC BY 2.5 IT Licence with respect to following datasets: “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi”.

In case of reuse of “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi” datasets, the attribution by OpenStreetMap and its users through http://wiki.openstreetmap.org/wiki/Contributors is sufficient to provide attribution to Comune di Milano (City of Milano) in a manner that is “reasonable to the medium or means” in accordance with Section 4(b) of the CC BY 2.5 IT license.

In case of reuse of “Numeri Civici”, “Toponimi (Viario)”, “Centroidi toponimi” datasets, OpenStreetMap’s method of providing references to the original dataset and original license terms through http://wiki.openstreetmap.org/wiki/Contributors satisfies the requirements of Section 4(b) of the CC BY 2.5 IT license. OpenStreetMap users satisfy the requirements of Section 4(b) of the CC BY 2.5 IT license by referencing http://wiki.openstreetmap.org/wiki/Contributors in accordance with OpenStreetMap’s attribution requirements. Comune di Milano (City of Milano) waives any limitation in Section 4(a) of the CC BY 2.5 IT license on OpenStreetMap and its users using effective technological measures on OpenStreetMap data with the understanding that the Open Database License OdBL 1.0 requires open access or parallel distribution of OpenStreetMap data. In every case, this waiver has no impact on Comune di Milano (City of Milano)’s right or ability to distribute or license the above-mentioned datasets on any terms it wishes.

OSM attribution: TO DO

ODbL Compliance verified: yes

Attribution in the Contributors page is fine for data owner as stated above. It will be enough to add the following statement in the Contributors page: “Contains data provided by Comune di Milano released under CC-BY-2.5 IT license.”

Import Type

HOW TO SPLIT THE DATA SET FOR CONFLATION?

The dataset will be loaded in JOSM and it will be conflated with existing OpenStreetMap data manually and prior to the upload.

Data Preparation

Tagging Plans

The data is provided as a CSV file named "Civici_20180718.csv" inside a ZIP file called "OpenData_Civici.zip"

It is available at: https://geoportale.comune.milano.it/ATOM/SIT/Toponomastica/NumeriCivici_Service.xml

The CSV file consists in a collection of punctual elements, one for each housenumber.

Each node has the following keys:

  • CODICE_VIA: official street id
  • NUMERO: housenumber (e.g. 11 in 11A)
  • LETTERA: subordinate (e.g. A in 11A)
  • BARRA: subordinate for numbers (e.g. 1 in 11/1)
  • BARRA2: additional subordinate, composed of 2 values: a letter and a 3 digit sequential number: vehicle entrances (P01), shops (N01), MM mezzanines (M01), kiosks (C01) and gardens (G01)
  • NUMEROCOMPLETO: housenumber with all the subordinates
  • RESIDENZIALE: 1 if it is a residential building; 0 otherwise
  • STATOCIVICO: if the address plate is present (2=present; 4=only in the database; 99=suppressed)
  • DATA_APPLICAZIONE: address plate application date
  • DATA_ATTIVAZIONE: address insertion date in the Municipal database
  • DATA_SOPPRESSIONE: address suppression date
  • ULTIMA_MODIFICA: latest update
  • GAUSSB_X: Gauss–Boaga (EPGS:3003??) projection longitude
  • GAUSSB_Y: Gauss–Boaga projection latitude
  • WGS84_X: WGS 84 (EPGS:32632) longitude
  • WGS84_Y: WGS 84 latitude
  • WEBMERCATOR_X: Mercator projection (EPSG:3785) longitude
  • WEBMERCATOR_Y: Mercator projection latitude
  • LATIT: WGS84LL (EPGS:4326) latitude
  • LONGIT: WGS84LL longitude
  • DATA_MODFINE: control field
  • IDMASTER: official housenumber id
  • PASSOCARRAIO: 1 if it is a vehicle entrance; 0 otherwise
  • LIVELLO: address level (0=ground level; 1=above ground; -1=under ground)

Please refer to "Istruzioni tecniche per l'utilizzo dei dati_CIVICI.pdf" inside the ZIP file for a complete description (in Italian).

The CSV file will be converted to a shapefile using QGIS. After that it will be converted to OSM XML using ogr2osm.

The tags that will be used in the final upload are loc_ref, addr:housenumber, addr:street, addr:city and fixme.

The tags will be filled in as follows:

  • loc_ref will contain IDMASTER. This is an official reference and it will be used for conflating data in future imports. In Milan, on each highway there is already a loc_ref tag for the same purpose.
  • addr:housenumber will contain NUMEROCOMPLETO converted to lowercase (for subordinates).
  • addr:street will contain the street name corresponding to CODICE_VIA. This match will be found using the loc_ref tag already present on highways in Milan in OpenStreetMap. If CODICE_VIA is not found, a fixme tag will be added for later inspection.
  • addr:city will contain Milano. Administrative boundaries are not accurate in Italy. Therefore we prefer to always specify the city.

Changeset Tags

Changeset will be tagged with:

  • source=Comune di Milano
  • source:license=CC-BY-2.5
  • type=import
  • url=https://wiki.openstreetmap.org/wiki/Import/Catalogue/Address_import_for_Milan

Thus people will know the data have been imported following the guidelines and they will find this page for details.

Data Transformation

ogr2osm will be used to convert the shapefile to OSM XML format using the above tagging plan.

ogr2osm translation file can be found at https://github.com/musuruan/osm_imports/blob/master/milano/civici.py

Data Transformation Results

OSM XML file: https://github.com/musuruan/osm_imports/blob/master/milano/Civici_20180718.osm

Data Merge Workflow

Addresses already in OSM will be extracted using the following Overpass query:


<osm-script>
<query into="comune" type="area">
  <has-kv k="admin_level" v="8"/>
  <has-kv k="name" v="Milano"/>
</query>
<union>
  <query type="node">
    <area-query from="comune"/>
    <has-kv k="addr:housenumber"/>
  </query>
  <query type="way">
    <area-query from="comune" />
    <has-kv k="addr:housenumber"/>
  </query>
  <item/>
  <recurse type="down"/>
</union>
<print mode="meta" />
</osm-script>

There are about 30,500 addresses already present in OpenStreetMap.

Address data in Italy must be placed exclusively on nodes because the housenumber identifies the external access that leads from the street to the housing units (houses, stores, offices, etc). Please read https://wiki.openstreetmap.org/wiki/IT:Addresses#Regole_specifiche_per_l.27Italia (in Italian) for more details.

Thus about a thousand addresses placed on buildings or other areas will be removed.

Addresses already present will be merged manually. Generally, existing addresses will be kept because they are of better quality than the ones provided by the City of Milan (TO BE VERIFIED).

Team Approach

This import is managed and supervised by:

A mapper should choose a municipal district (Municipio, in Italian), conflate the Open Data and the OpenStreetMap data, and then import the result.

This is a list of people who are working on the import, along with each of their import usernames:

Workflow

Prerequisites

Inside "OpenData_Civici.zip" there is a file called "Viario_20180718.csv". It is a Milan street directory. There is no field containing the highway name present in OSM (even though one field is named OPENSTREETMAP). OSM street names must follow these rules: https://wiki.openstreetmap.org/wiki/IT:Editing_Standards_and_Conventions#Nomi_delle_strade

CODICE_VIA is a unique identified of the street. This same id is also used for the street part of the address in the Civici_20180718.csv file.

Currently in Milan in OSM, the loc_ref tag is present and used on highways. It contains the AMAT road segment code and it has the following syntax: vvvv_aaaaa, where vvvv is the CODICE_VIA and aaaa is the road segment code. The loc_ref tags have been added during the integration project of Milan street data: https://wiki.openstreetmap.org/wiki/Agenzia_mobilit%C3%A0_ambiente_territorio

Therefore we can build a new street directory containing highway names currently used in OSM.

Step by step instructions:

  1. Download the latest OSM data files for Milan from https://osm-estratti.wmflabs.org/estratti/Lombardia/Milano/Milano
  2. Build a CSV file containing all streets in Milan. We place in the loc_ref only the CODICE_VIA: $ osmfilter --keep="highway=* and loc_ref=* and name=*" 015146---Milano.osm | osmconvert - --csv="name loc_ref" --csv-separator=";" |awk -F"_" '{print $1}' |sort |uniq >> Stradario_OSM.csv $ echo "loc_ref;name" > Stradario_OSM_20180810.csv && awk -F";" '{print $2";"$1}' Stradario_OSM.csv | sort | uniq >> Stradario_OSM_20180810.csv
  3. Run a python program (https://github.com/musuruan/osm_imports/blob/master/milano/build_street_dir.py) that merges the two street directories: $ ./build_street_dir.py

Merging is performed using the CODICE_VIA identifier. The same CODICE_VIA may be present on streets with different names in OSM. These are likely errors. To choose the best match among street names, a fuzzy match is performed. Match ratio is also added to the resulting file.

The output file can be found at: https://github.com/musuruan/osm_imports/blob/master/milano/Viario_OSM_20180718.csv

Issues that need to be addressed before starting this import:

  • 6 streets with STATO=3 (suppressed street name). It is likely that these streets are not up-to-date in OSM.
  • 164 streets with STATO=2 (active street name) but without a name in OSM. Either CODICE_VIA or street name is missing in OSM.
  • Streets with RATIO < 76 need to be revised because the name is probably wrong. Either CODICE_VIA or highway name are likely mistaken.
  • Street names should be reviewed to verify they follow the guidelines: https://wiki.openstreetmap.org/wiki/IT:Editing_Standards_and_Conventions#Nomi_delle_strade

Step by step instructions

TO BE DEFINED: How to choose an area to work on

Step by step instructions:

  1. Run ogr2osm to export the data in OSM XML: ogr2osm.py -e 32632 -t civici.py -f Civici_20180718.shp
  2. Open this file in JOSM
  3. Run above overpass query to export the existing addresses in another layer
  4. Merge these addresses, with the help of the JOSM Conflation Plugin
  5. Download OSM data for the same area
  6. Run JOSM validator and solve related issues
  7. Upload the changeset in OSM

The changesets will be small enough to be uploaded at once.

In case of import problem the changeset will be reverted using the JOSM Reverter Plugin

Conflation

See #Data Merge Workflow.

QA

Street names

After the import, addr:street names could be slightly different than current street names.

These differences should be caught using OSM Inspector (map already centered on Milan).

Unmarked streets

The result can be used to locate areas where streets are missing.

Missing roads will be created in JOSM using PCN 2012 areal images.

Unnamed streets

The result can be used to derive street names for unnamed streets when all the nodes along the street has the same addr:street value.

Missing road names will be identified using the OpenStreetMap NoName Map Overlay:tms:http://tile3.poole.ch/noname/{zoom}/{x}/{y}.png

OSM Inspector can also be used to find these streets.

See also

The email to the Imports mailing list was sent on YYYY-MM-DD and can be found in the archives of the mailing list at [1].