Import/Catalogue/Veneto House Numbers Import

From OpenStreetMap Wiki
Jump to: navigation, search

About

This page talks about importing house numbers using the data provided by the different municipalities of Regione Veneto (Italy). As of now, four major cities have released their data (Vicenza, Verona, Venezia, Treviso).

The import has been discussed on the regional OSM mailing list: October, November

The italian wiki page

Goals

This import goal is to use the dataset provided by local municipalities in order to improve the addresses available in OSM. It will not be a complete blind import: whereas possible, data will be checked by local mappers. Quality of the data has also been verified #Quality of the data.

Schedule

Imports will be performed after local community revision of both the data set and the actual data contained into OSM.

Vicenza house numbers will be the pilot dataset to be imported. Progress will be trackable in this table.

Import Data

Background

Address format

House numbering follows the Southern Europe scheme. Address in Regione Veneto is determined by its street name, housenumber, postal code; where present, subordinate and in-house values are included. Subordinate is mostly noted with suffix letter, but can be any alphanumeric (e.g. "A", "A3", "Z7"). Subordinates usually arise when a new house is build between existing houses with subsequent housenumbers. In-house is mostly noted with suffix numbers.

Legal

Import Type

The dataset will be imported on a municipality base.

Quality of the data

An overview of the dataset has been done with QGIS: the data is solid and consistent: existing OSM addresses (mapped by OSM users) falls really close to the corresponding address of the dataset (usually the offset is less than 5 meters).

Data Preparation

Several steps have to be executed in order to obtain data consistency thus reducing the amount of QA work to be done post import.

Name Standardization of the highways

The name standardization must follows ISTAT guidelines (these guidelines aims to standardized names all across Italy). This process is necessary to have consistency about the name=* used on the highways and the addr:street=* used on the address nodes. Some examples of this process:

  • wrong names: Via Fratelli Chiodi -> Via Fioralpino Chiodi
  • abbreviation expansion: Via O. M. Pagani -> Via Orazio Maria Pagani
  • missing parts: Via Faedo -> Via Alessandro Faedo
  • dates: Viale 10 Giugno -> Viale Dieci Giugno
  • mistypes: Via Francesco Gucciardini -> Via Francesco Guicciardini
  • wrong DUG: Contra' Paolo Lioy; Via Paolo Lioy
  • accents and apostrophe checks: Via Niccolo' Tommaseo -> Via Niccolò Tommaseo
  • correct names not following the guidelines (in case of persons name, it should be DUG + NAME + SURNAME): Via Pasqualigo Francesco -> Via Francesco Pasqualigo

Taking into account the fact that all the cities listed above (#Legal) have also released their correct odonyms (data license: IODL 2.0, ODbL Compliance verified: yes), the name standardization procedure will be semi-automatic (all the scripts used are being developed in Python and will be released after the correction of bugs):

  • download into a .osm file of all the highways of the considered municipality through the following Overpass query:
<osm-script output="xml">
  <id-query {{nominatimArea:<NAME OF THE CITY>}} into="area"/>
  <query type="way">
    <has-kv k="highway"/>
    <has-kv k="name"/>
    <area-query from="area"/>
  </query>
  <union>
    <item />
      <recurse type="way-node"/>   
  </union>
  <print mode="meta"/>
</osm-script>
  • extraction of the highway names;
  • creation of a dictionary key-value: each key represents the name of one highway extracted from OSM, the corresponding value is the name extracted from the odonyms of the considered city. The script uses approximate string matching techniques to create the association and its score (high scores indicates a likely match, the max score is 1 which indicates perfect match): each entry of the dictionary is manually checked to correct any association issues;
  • substitution of the value of name=* wherein necessary in the .osm file containing the highways;
  • upload these edits into a changeset (see Changeset Tags).

PS: a pilot test of this procedure has already been executed in Vicenza: Link to the changeset, Link to the log files.

House numbers datasets preparation

The data is presented as a shapefile which consists in a collection of punctual elements, one for each housenumber. Projection can be Gauss-Boaga or WGS84.

The fieldnames used in each datasets are different, based on the municipality. This is not a problem: it can be seen that each dataset provides a fieldname containing the street name, and a fieldname containing the house number. Subordinate and in-house values can be stored in the latter field, or into a separate field.

Prior to OSM XML conversion, because the street names are in the capital form (ie, VIALE ROMA) and there are some cases where abbreviations have been used, a dictionary have to be constructed through a procedure very similar to the one explained above: the key represents the address name used in the shapefile, the value represents the name extracted from the odonyms of the considered city. Once the dictionary has been accurately checked, the conversion to OSM XML will be performed through the utility ogr2osm: the translation file will be used to change the street name to the normalized one, and to transform the house number in the discussed format (#Address format).

Data Integration

The following Overpass query will be used to download the existing house numbers into a new osm file:

<osm-script>
  <query into="<Name of the City>" type="area">
    <has-kv k="admin_level" v="8"/>
    <has-kv k="name" v="<Name of the City>"/>
  </query>
  <union>
    <query type="node">
      <area-query from="<Name of the City>"/>
      <has-kv k="addr:housenumber"/>
    </query>
    <query type="way">
      <area-query from="<Name of the City>" />
      <has-kv k="addr:housenumber"/>
    </query>
    <query type="way">
      <area-query from="<Name of the City>" />
      <has-kv k="addr:interpolation"/>
    </query>
    <item/>
    <recurse type="down"/>
  </union>
  <print mode="meta" />
</osm-script>

All existing address nodes will be kept. To standardize the current addr:street=* values, the dictionary procedure (presented above) will be performed to this file. Missing tags listed in #Tagging Plan will be added wherein necessary. JOSM's Conflation plugin will be used to conflate the existing data with the new dataset. At this point, everything is ready to be imported into OSM.

In case of import problem the changeset will be reverted using the JOSM Reverter Plugin. After the import, OSM Inspector will be used to locate missing street names or missing roads. Missing roads will be created in JOSM using PCN 2012 areal images.

Tagging Plan

The following tags will be used on each address node:

  • addr:housenumber: formatted accordingly to #Address format;
  • addr:street: formatted accordigly to #Name Standardization of the highways;
  • addr:postcode: CAP. Luckily, this number is a constant for each city, exept Verona (see the PS note below);
  • addr:city: name of the considered city;
  • addr:country: IT

PS: Verona represents a non-trivial addr:postcode=* case: the postcode is not constant. Each street name has its value. The dataset contains all the values to be used. If any of the existing OSM addressess do not have a addr:postcode=*, it will be provided with the correct one during the #Data Integration procedure.

Changeset Tags

The changesets regarding the correction of the name=* used on highways will be tagged with:

The changesets regarding the house numbers import will be tagged with:

Dedicated upload account

The account Veneto_Civici_Import has been created to upload all the changesets.

Team Approach

Given the big amount of work to be done, hopefully this will become a team work. Import will be managed by the following OSM users:

QA

Unmarked streets

The result can be used to locate areas where streets are missing.

Missing roads will be created in JOSM using PCN 2012 areal images.

Unnamed streets

The result can be used to derive street names for unnamed streets when all the nodes along the street has the same addr:street value. Missing road names will be identified using the OpenStreetMap Inspector, with the Highways view: