Import/Catalogue/ItalyFuelStations

From OpenStreetMap Wiki
Jump to: navigation, search

About

This page is about importing fuel stations published by Ministero dello Sviluppo Economico (MISE), which dataset includes the italian territory, roughly 20K nodes. Offical dataset name is Prezzi praticati e anagrafica degli impianti.

Dataset will be adapted in order to generate OSM files suitable to be imported in planet.osm. It will not be a complete blind import: source data will be checked in sample areas by mappers through an audit support map.

The import is being discussed on the national OSM mailing list. This wiki page is the result of consensus there.

Goals

This import aims to have a complete and updated set of italian fuel stations. Since MISE source is issuing daily updates (defined herein as "Quotidianamente vengono pubblicate in questa sezione le informazioni in vigore alle ore 8 del giorno precedente a quello di pubblicazione."); hence planning a regular OSM files update will be advisable.

Schedule

First import will be performed after community audit on shared osm changes map. Progress will be trackable in project page.

A pilot import will be performed. Because of its limited size, region Friuli Venezia Giulia has been choosen.

Import Data

Background

Source dataset quality is good and in all sampled locations, spatially accurate; fuel station name is often not homogeneous for non-motorway stations: import of this field is temporarly excluded.

Record format

MISE dataset table structure as detailed in MISE specific page and summarized below:

Multiplication table
Field Name Description:it Description:en Example
1 Id impianto codice numerico progressivo attribuito dal sistema per l'identificazione dell'impianto automatically assigned sequential code 12092
2 Gestore la ragione sociale dell'impresa che gestisce il punto vendita business name PIPPO FRANCO SRL
3 Bandiera l'insegna del distributore (può essere il marchi generico "Pompe Bianche" ad indicare non legato alle maggiori brand) station banner (may be "Pompe Bianche", AKA Independent) Shell
4 Tipo Impianto contaddistingue la tipologia di strada sulla quale è collocato il distributore: le tipologie individuate sono tre: autostrada, strada statale, altro (al quale appartengono tutte le altre tipologie di strade) three station types: motorway, national road, any other ALTRO
5 Nome Impianto è il nome indicato dal gestore per identifcare il suo impianto name set by operator to identify his/her station AGICAM2000
6 Indirizzo indirizzo, numero civico e CAP address, housenumber, postcode VIA MAZZINI 18 33040
7 Comune nome del Comune in cui è collocato l'impianto. E' il secondo criterio di ordinamento dei dati contenuti nel file municipality name where station is located (admin level 8) PORPETTO
8 Provincia nome della Provincia in cui è collocato l'impianto. E' il primo criterio di ordinamento dei dati contenuti nel file (admin level 6) GORIZIA
9 Latitudine* coordinata corrispondente espressa in gradi decimali latitude in decimal degrees 45.838383
10 Longitudine* coordinata corrispondente espressa in gradi decimali longitutude in decimal degrees 13.55575

(*) Note: coordinates are added by fuel station operator and are not always verified.

Legal

Import Type

The dataset will be imported on a regional base (Admin Level 4). Prior to upload, each osm changes file will be published and linked in the italian wiki page to be manually checked by local teams.

Data Preparation

The data is presented as csv. Comma separated values consist in a collection of punctual elements, one for each fuel station. Projection is not defined.

Prior to OSM XML conversion, some issues require intermediate csv's and json's to be generated. Actions needed are:

  • csv - void values removal
  • csv - field sorting
  • csv - add source date - optional (will be assigned to changesets)
  • csv - add headers
  • csv - original semicolon separator replaced by comma (for umap compatibility) - optional
  • csv to json tool for conflation feeding

String mods

Strings will be processed with "title" function (first char in uppercase). Some minor adjustments could be needed (ie: "F.Lli" > "F.lli")

Preliminary csv check

It has been accomplished through a regional (admin level 4) umap linked in the italian wiki page which layer composition is: Mapbox aerial image, existing amenity=fuel (Overpass-turbo query), to-be-imported amenity=fuel. Such umap could be optionally used for manual check on sample areas and/or undeline evident coordinate errors.

It is accoplished through osm changes map.

Tagging Plan

OSM addr:postcode and OSM addr:city will be extracted from Indirizzo and Comune fields, respectively. Extraction of OSM addr:street and OSM addr:housenumber will be attempted from filed Indirizzo where possible, otherwise assigned to OSM description tag.

Each node is defined by the following keys (used fields in bold):

  • idImpianto: ref:mise (unique)
  • Gestore: operator
  • Bandiera: brand
  • Tipo Impianto: n/a
  • Nome Impianto: due to not standardized entries, possibly will be mapped as alt_name
  • Indirizzo: addr:postcode (extracted, not used)
  • Comune: addr:city (not used)
  • Provincia: n/a (for pre-filtering, not used)
  • Latitude: used by conflator
  • Longitude: used by conflator

Conflation

Fuel stations already in OSM will be extracted using the Overpass query amenity=fuel, which bounding box is region based using Geofabrik online calculator.

Conflation is performed by OSM Conflator. Existing OpenStreetMap data within a range is merged and tags will be replaced accordingly to conflator parameter file.

Fuel stations already present in OSM out of range aren't involved. After initial imports, evaluations will be made about tagging them as "disused:amenity=fuel".

Conflator output sample

pi@raspberrypi:~/OSM $ conflate -i convertedcsv.json -v --changes changes.json -o result.osm -c preview.json profile-fuelfvg.py
14:34:05 Loading profile <_io.TextIOWrapper name='profile-fuelfvg.py' mode='r' encoding='UTF-8'>
14:34:05 Dataset points duplicate each other: 39713 and 5847
14:34:05 Dataset points duplicate each other: 40559 and 37930
14:34:05 Dataset points are too similar: 40654 and 36026
14:34:05 Dataset points duplicate each other: 25713 and 41030
14:34:05 Dataset points are too similar: 16229 and 40858
14:34:05 Found 5 duplicates in the dataset
14:34:05 Read 486 items from the dataset
14:34:05 Overpass query: [out:xml][timeout:120];(node["amenity"="fuel"](45.5809,12.3214,46.648,13.9187);way["amenity"="fuel"](45.5809,12.3214,46.648,13.9187);relation["type"="multipolygon"]["amenity"="fuel"](45.5809,12.3214,46.648,13.9187);); out meta qt center;
14:34:15 Downloaded 825 objects from OSM
14:34:17 Matched 467 points
14:34:17 Removed 4 unmatched duplicates
14:34:17 Adding 15 unmatched dataset points
14:34:17 Deleted 0 and retagged 358 unmatched objects from OSM

Dedicated upload account

The account Attila-import will be used to upload regional community revised .osm files.

Changeset Tags

Changeset will be tagged with:

Team Approach

Import will be managed by the following OSM users:

  • Cascafico

Workflow

Step by step operations:

  1. wget MISE csv dataset
  2. perform record standardization (TBD) (awk, sed or similar)
    1. trim field spaces
    2. First Char Uppercase
    3. quotes (") removal
    4. commas (,) removal
    5. postcode split from irregular addresses
  3. regional filtering, province based (admin level 6)
  4. convert csv to json via online converter or similar script
  5. run conflator
  6. publish osm changes map for audit
  7. publish osm files (further check)
  8. Upload changeset in OSM

Regional changesets should be small enough to be uploaded at once.

In case of import problem changeset involved will be reverted using proper reverter

QA

TBD