Import/Catalogue/AddressImport RAFVG

From OpenStreetMap Wiki
Jump to navigation Jump to search

About

This page talks about importing addresses using the data provided by Regione Autonoma Friuli Venezia-Giulia (RAFVG) (Italy), which includes 196 municipalities, roughly 433K nodes.

The import has been discussed on the regional OSM mailing list. This wiki page is the result of consensus there.

Goals

This import goal is to use the dataset provided by RAFVG in order to improve the addresses available in OSM. It will not be a complete blind import: whereas possible, data will be checked by local mappers.

Schedule

Imports will be performed after local community revision of shared .osm files linked on the italian wiki page Elenco dei Comuni table; progress will be trackable herein.

Two pilot imports have been performed. Municipalities were choosen because of their limited size.

Stregna

Changeset 25763431; approx. 300 nodes.

Issues raised due to missing on-the-ground highway names. Solution is being discussed about using OSM tag addr:place instead of addr:street

Sacile

Changeset 28443255; approx 5.000 nodes

An issue raised by OSM Inspector is due to upper/lowercase. Nominatim can handle, anyway, but QA tool requires exact matching in case.

Other issue due to error in wiki and missing discussion; consequently tag value separator ";" has been used instead of "/". Wiki corrected, TODO separator replacement.

Import Data

Background

Address format

House numbering follows the European scheme.

Address in RAFVG is determined by its streetname, housenumber; where present, subordinate and in-house values are included.

Subordinate is mostly noted with suffix letter, but can be any alphanumeric (e.g. "A", "A3", "Z7"). Subordinates usually arise when a new house is build between existing houses with subsequent housenumbers. In-house is mostly noted with suffix numbers.

RAFVG dataset table structure is detailed in the italian wiki page.

The postal codes (AKA codici di avviamento Postale, CAP) are not included in RAFVG dataset; each address will inherit its municipality postcode as from national Indirizzi della PA IODL 2.0; derived filtered content has been quoted in the italian wiki page. Sole exception is Trieste municipality, which spans more than one postcode, will be imported in a separate process.

Legal

Import Type

The dataset will be imported on a municipality base.

Due to upload constraint, high density areas will be splitted where needed [TBD 50K node constraint?].

Prior to upload, each .osm will be published in the italian wiki page to be manually checked by local team.

Data Preparation

The data is presented as a shapefile. This shapefile consists in a collection of punctual elements, one for each housenumber. Projection is Gauss-Boaga.

Prior to OSM XML conversion, some issues require intermediate shapefile's to be generated. Actions needed are:

  • re-projection to WGS84 CRS (ogr2ogr)
  • geometry type from multi to single (ogr2ogr)
  • municipality extraction (ogr2org)
  • record standardization (TBD SQL tool)

Record standardization

To minimize conflation work due to no-match errors with existing OSM odonyms, a replacement is being performed. Most of replacements are:

  • first name expansion (i.e "P. DIACONO" or "DIACONO P." > "PAOLO DIACONO")
  • abbreviation expansion (i.e. "P.LE" > "PIAZZALE")
  • latin numbering conversion (i.e. "VII" > "SETTE")
  • accent and apostrophe checking

Replacements have been previously compiled by team work and concern both SPECIE and DENOMINAZI. Replacement table will be published.

Tagging Plan

Each node has the keys (in bold used fields):

  • COD_ISTAT: municipality code, [www.istat.it/ ISTAT] defined
  • NOME_COMUN: municipality name
  • ID_STRADA: street ID
  • SPECIE: Toponym type (AKA "Denominazione Urbanistica Generica")
  • DENOMINAZI: street name
  • NUM_CIV: house number
  • BARRATO: house number, subordinate
  • INTERNO: house number, in-house
  • DATA_AGG: record modification date
  • DATA_INS: record creation date
  • X: WGS84/ETRS89 longitude
  • Y: WGS84/ETRS89 latitude
  • ID1: unique id (unique, used for indexing)

The intermediate shapefile will be converted to OSM XML using ogr2osm. Ogr2osm translation file will manage:

  • selective uppercase (i.e. "VIA DEL LAVORO" > "Via del Lavoro", "VIA GIACOMO LEOPARDI" > "Via Giacomo Leopardi")
  • tag mapping

Tag mapping for final upload will assign:

  • addr:housenumber < NUM_CIV
  • addr:street < SPECIE | DENOMINAZI
  • addr:postcode (feeded by Elenco dei Comuni)
  • addr:city < NOME_COMUN
  • source = "RAFVG"

Conflation

Will be performed thru JOSM. Existing OpenStreetMap data will be merged thru semi-automatic conflation plugin. Such procedure has been detailed in the italian wiki page

Dedicated upload account

The account RAFVG import will be used to upload revised .osm files.

Changeset Tags

Changeset will be tagged with:

Data Translation

Ogr2osm will be used to convert the shapefile to OSM XML format using the above tagging plan.

Source scripts for ogr2osm will be stored at https://github.com/rafvgimport/translations

Data Transformation Results

OSM XML files repository: https://github.com/rafvgimport/osm

Data Merge Workflow

Addresses already in OSM will be extracted using the Overpass query herein defined.

Addresses already present will be kept.

Team Approach

Import will be managed by the following OSM users:

  • Cascafico
  • marcodena
  • Stefano Salvador
  • Marco_T
  • damjang
  • Bredy

Workflow

Step by step instructions:

  1. Run ogr2ogr to reproject and extract nodes inside municipality
  2. Perform record standardization (QGIS, SQLITE or similar)
  3. Run ogr2osm to export the data in OSM XML
  4. Run overpass query to export the existing addresses
  5. Merge 2 and 3 addresses in JOSM
  6. Upload the changeset in OSM

The changeset should be small enough to be uploaded at once.

In case of import problem the changeset will be reverted using the JOSM Reverter Plugin

Conflation

See #Data Merge Workflow.

QA

Street names

After the import, addr:street names could be slightly different than street names.

These differences should be catched using OSM Inspector.

Unmarked streets

The result can be used to locate areas where streets are missing.

Missing roads will be created in JOSM using PCN 2012 areal images.

Unnamed streets

The result can be used to derive street names for unnamed streets when all the nodes along the street has the same addr:street value.

Missing road names will be identified using the OpenStreetMap NoName Map Overlay:
tms:http://tile3.poole.ch/noname/{zoom}/{x}/{y}.png

OSM Inspector can also be used to find these streets.