Import/Catalogue/ItalyFuelStations

From OpenStreetMap Wiki
Jump to navigation Jump to search

About

This page is about importing fuel stations published by Ministero dello Sviluppo Economico (MISE), which dataset includes the italian territory, roughly 20K nodes. Offical dataset name is Prezzi praticati e anagrafica degli impianti.

Dataset will be adapted in order to generate OSM files suitable to be imported in planet.osm. It will not be a complete blind import: source data will be checked in sample areas by mappers through an audit support map.

The import is being discussed on the national OSM mailing list. This wiki page is the result of consensus there.

Goals

This import aims to have a complete and updated set of italian fuel stations. Since MISE source is issuing daily updates (defined herein as "Quotidianamente vengono pubblicate in questa sezione le informazioni in vigore alle ore 8 del giorno precedente a quello di pubblicazione."); hence planning a regular OSM files update will be advisable.

Schedule

First import will be performed after community audit on shared osm changes map. Audit progress will be trackable in project page.

A pilot import will be performed. Because of its limited size, region Friuli Venezia Giulia has been chosen.

Import Data

Background

Source dataset quality is good and in all sampled locations, spatially accurate; fuel station name (Nome Impianto) and address (Indirizzo) are often not homogeneous: import of these fields is temporarly excluded.

Record format

MISE dataset table structure as detailed in MISE specific page and summarized below:

MISE Anagrafica Carburanti - record format
Field Name Description:it Description:en Example
1 Id impianto codice numerico progressivo attribuito dal sistema per l'identificazione dell'impianto automatically assigned sequential code 12092
2 Gestore la ragione sociale dell'impresa che gestisce il punto vendita business name PIPPO FRANCO SRL
3 Bandiera l'insegna del distributore (può essere il marchi generico "Pompe Bianche" ad indicare non legato alle maggiori brand) station banner (may be "Pompe Bianche", AKA Independent) Shell
4 Tipo Impianto contaddistingue la tipologia di strada sulla quale è collocato il distributore: le tipologie individuate sono tre: autostrada, strada statale, altro (al quale appartengono tutte le altre tipologie di strade) three station types: motorway, national road, any other ALTRO
5 Nome Impianto è il nome indicato dal gestore per identifcare il suo impianto name set by operator to identify his/her station AGICAM2000
6 Indirizzo indirizzo, numero civico e CAP address, housenumber, postcode VIA MAZZINI 18 33040
7 Comune nome del Comune in cui è collocato l'impianto. E' il secondo criterio di ordinamento dei dati contenuti nel file municipality name where station is located (admin level 8) PORPETTO
8 Provincia nome della Provincia in cui è collocato l'impianto. E' il primo criterio di ordinamento dei dati contenuti nel file (admin level 6) GORIZIA
9 Latitudine* coordinata corrispondente espressa in gradi decimali latitude in decimal degrees 45.838383
10 Longitudine* coordinata corrispondente espressa in gradi decimali longitutude in decimal degrees 13.55575

(*) Note: coordinates are added by fuel station operator and are not always verified.

Legal

Import Type

The dataset will be imported on a regional base (Admin Level 4). Prior to upload, each osm changes file will be published and linked in the italian wiki page to be manually checked by local teams.

Data Preparation

The data is presented as csv. Comma separated values consist in a collection of punctual elements, one for each fuel station. Projection is not defined.

Prior to OSM XML conversion, some issues require intermediate csv's and json's to be generated. Actions needed are:

  • csv - void values removal
  • csv - field sorting
  • csv - add source date - optional (will be assigned to changesets)
  • csv - add headers
  • csv - original semicolon separator replaced by comma (for umap compatibility) - optional
  • csv to json tool for conflation feeding

String mods

Strings will be processed with "title" function (first char in uppercase). Some minor adjustments could be needed (ie: "F.Lli" > "F.lli")

Preliminary csv check

It has been accomplished through a regional (admin level 4) umap linked in the italian wiki page which layer composition is: Mapbox aerial image, existing amenity=fuel (Overpass-turbo query), to-be-imported amenity=fuel. Such umap could be optionally used for manual check on sample areas and/or undeline evident coordinate errors.

It is accoplished through osm changes map.

Tagging Plan

OSM addr:postcode and OSM addr:city extraction from Indirizzo and Comune fields are skipped, since non-homogeneous entries need refining yet not planned. OSM addr:street and OSM addr:housenumber extraction from filed Indirizzo either.

Each object will be tagged by the following keys (used fields in bold):

  • idImpianto: ref:mise (unique)
  • Gestore: operator
  • Bandiera: brand (special case "Pompe Bianche" defined below)
  • Tipo Impianto: n/a
  • Nome Impianto: due to not standardized entries, possibly will be mapped as alt_name
  • Indirizzo: addr:postcode (extracted, not used)
  • Comune: addr:city (not used)
  • Provincia: n/a (for pre-filtering, not used)
  • Latitude: used by conflator
  • Longitude: used by conflator

Pompe Bianche

Source institution (MISE) assigns brand "Pompe Bianche" to those operators who are not franchised to any national or international trademark. Following this tagging discussion, the below tags have been rejected:

  • brand=Pompe Bianche (source data)
  • description=Pompe Bianche
  • nobrand=yes

Conflation

Fuel stations already in OSM will be extracted using the Overpass query amenity=fuel, which bounding box is region based using Geofabrik online calculator.

Conflation is performed by OSM Conflator. Existing OpenStreetMap data within a range is merged and tags will be replaced accordingly to conflator parameter file.

Fuel stations already present in OSM out of range aren't involved. After initial imports, evaluations will be made about tagging them as "disused:amenity=fuel".

Conflator output example

pi@raspberrypi:~/OSM $ conflate -i convertedcsv.json -v --changes changes.json -o result.osm -c preview.json profile-noaddr.py
14:34:05 Loading profile <_io.TextIOWrapper name='profile-fuelfvg.py' mode='r' encoding='UTF-8'>
14:34:05 Dataset points duplicate each other: 39713 and 5847
14:34:05 Dataset points duplicate each other: 40559 and 37930
14:34:05 Dataset points are too similar: 40654 and 36026
14:34:05 Dataset points duplicate each other: 25713 and 41030
14:34:05 Dataset points are too similar: 16229 and 40858
14:34:05 Found 5 duplicates in the dataset
14:34:05 Read 486 items from the dataset
14:34:05 Overpass query: [out:xml][timeout:120];(node["amenity"="fuel"](45.5809,12.3214,46.648,13.9187);way["amenity"="fuel"](45.5809,12.3214,46.648,13.9187);relation["type"="multipolygon"]["amenity"="fuel"](45.5809,12.3214,46.648,13.9187);); out meta qt center;
14:34:15 Downloaded 825 objects from OSM
14:34:17 Matched 467 points
14:34:17 Removed 4 unmatched duplicates
14:34:17 Adding 15 unmatched dataset points
14:34:17 Deleted 0 and retagged 358 unmatched objects from OSM

Dedicated upload account

The account attilaimport will be used to upload community revised .osm files.

Changeset Tags

Changeset will be tagged with:

Team Approach

Import will be managed by the following OSM users:

  • Cascafico

Workflow

Step by step operations:

  1. wget MISE csv dataset
  2. perform record standardization (TBD) (awk, sed or similar)
    1. trim field spaces
    2. First Char Uppercase
    3. quotes (") removal
    4. commas (,) removal
    5. postcode split from irregular addresses
  3. regional filtering, province based (admin level 6)
  4. convert csv to json via online converter or similar script
  5. run conflator
  6. publish osm changes map for audit
  7. publish osm files (further check)
  8. Upload changeset(s) in OSM

Priority has been assigned to upload new objects (fuel stations not detected by conflator), then subsets on a regional scale or other criteria.

In case of import problem changeset involved will be reverted using proper reverter

Uploaded

changeset objects JOSM selection criteria notes
61927015 4,435 new and -brand="Pompe Bianche" New objects only, with brand not equal to "Pompe Bianche"
61948858 1,181 new and brand="Pompe Bianche" New objects only, with brand="Pompe Bianche" removed
61983827 10,000 issues with creation of thousands of duplicate fuel stations, uploaded with wrong osm user account
61984835 uploaded with wrong osm user account
61985256 uploaded with wrong osm user account

QA

Some problems has been detected after upload:

Widespread:

  • new nodes imported twice (duplicate selection query)
  • redundant tag "source:date", already assigned as changeset(s) tag
  • rejected "nobrand=yes" tag imported anyway
  • rejected "name=Pompe Bianche" tag imported anyway
  • brand changed from "IP" (verifable on the ground) to "Api-Ip"
  • inconsistent changes (operator changed, name containing old operator not changed)

Limited:

  • OSM editings corrupted during auditing start-freeze interval

Source dataset sync

The possibility to synchronize source dataset with OSM is being evaluated.

Test cases

ref:mise 2018-04-17 2018-08-28 survey
33288 ok removed ok
xxxxx

Adding gas fuel keys

MISE issues daily updates about fuel prices publishing "Prezzo alle 8" csv file. Below two sample records.

ref:mise fuel type price self report date
30709 Benzina 1.627 0 25/08/2018 12:07:14
30721 Metano 0.888 0 29/08/2018 07:35:21

Keywords "metano" and "gpl" are used to set fuel:cng and fuel:lpg respectively. Audit has been set for resulting 824 POIs.