OpenData Puglia Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

OpenData Puglia Import is an import of different CSV datasets produced by Apulia region (regione Puglia) in Italy covering places of cultural and turistic interest. The import is currently (Feb 12, 2018) just planned and not executed yet.

The import task has been discussed in talk-it@openstreetmap.org list (nov 2017 - dec 2017).

Goals

The goal of this project is to import interesting data in OSM. The data only relate to Apulia region in Italy. Datasets contain data from two important website: Apulia Digital Libray (DL) and viaggiareinpuglia.it (VIP) and they mainly represent churches, manor farm, tourist attractions, paintings, etc.

Schedule

Jan 2018 - Ongoing

The import will be performed by a dedicate account User: InnoPuglia_Import

Import Data

Background

Data source site: Dataset

Data license: CC0 v1.0

Type of license (if applicable): CC0 v1.0.

ODbL Compliance verified: -

OSM Data Files

When conversion for a dataset is done, the list will be updated.


Dataset 8: masserie_clean_csv2osm.osm

Import Type

Bulk import of missing data on OSM. Data redundancy is performed with JOSM.

Data Preparation

Data Reduction & Simplification

After a manual dataset clean up from incomplete and wrong data, the result will be processed by a python script.

This Script will extract all data and process the dataset row by row .

Not all collumns will be extracted but only relevant ones containing data, such as name, description, category, Lat, Lon and website (representing a reference to the original DL or VIP Web portal).

Tagging Plans

This section locatea subset of data for each dataset that are processed for osm import.


Dataset 1: Luoghi di interesse turistico, culturale, naturalistico

Common Tags are:

Headers in dataset -----> OSM matching key(s)

nomeAttrattore --> name

risorsaTerritoriale --> *

latitudine --> latitude

longitudine -->longitude

sitoWeb --> website



Dataset 2: Uffici Informazione e Accoglienza Turistica

Common Tags are:

Headers in dataset -----> OSM matching key(s)

nome --> name

latitudine --> latitude

longitudine -->longitude

sitoWeb --> website

email --> email


Additional tags: information = office, tourism = information


Dataset 3: Strutture Ricettive 2016

Common Tags are:

Headers in dataset -----> OSM matching key(s)

denominazione --> name

tipologia --> *

latitudine --> latitude

longitudine -->longitude

sitoWeb --> website

email --> email


Dataset 4 :Digital Library - Collezione "Opuscoli Biblioteca comunale Barletta"

Common Tags are:

Headers in dataset -----> OSM matching key(s)

Titolo del bene rappresentato --> name

Categoria --> *

Latitudine --> latitude

Longitudine -->longitude

Scheda Puglia Digital Library --> website


Dataset 5:Collezione "Cinema 150 anni"

Common Tags are:

Headers in dataset -----> OSM matching key(s)

Titolo del bene rappresentato --> name

Categoria_1--> *

Categoria_2--> *

Latitudine --> latitude

Longitudine -->longitude

Scheda Puglia Digital Library --> website


Dataset 6:Digital Library - Collezione "Fondo manoscritti biblioteca comunale Barletta"

Common Tags are:

Headers in dataset -----> OSM matching key(s)

Titolo del bene rappresentato --> name

Categoria_1--> *

Categoria_2--> *

Latitudine --> latitude

Longitudine -->longitude

Scheda Puglia Digital Library --> website


Dataset 7: Digital Library - Collezione "Habitus percorsi tra costume e architettura"

Common Tags are:

Headers in dataset -----> OSM matching key(s)

Titolo del bene rappresentato --> name

Categoria_1--> *

Categoria_2--> *

Latitudine --> latitude

Longitudine -->longitude

Scheda Puglia Digital Library --> website


Dataset 8:Digital Library - Collezione "Masserie di Puglia"

Common Tags are:

Headers in dataset -----> OSM matching key(s)

Titolo del bene rappresentato --> name

Categoria_1--> *

Categoria_2--> *

Latitudine --> latitude

Longitudine -->longitude

Scheda Puglia Digital Library --> website

. . .


More dataset coming soon (when avaible)


(*) This is a column which cannot be uniquely assigned to a key in OSM. For every value an appropriate tag is found by using a dictionary (I'll explain better in Data Trasformation section)

Changeset Tags

source = Apulia Open Data

source:website = http://www.dataset.puglia.it/

source:date = *

comment = Semi-automatic import of different points of interest** related to Apulia region, Italy

website: https://wiki.openstreetmap.org/wiki/OpenData_Puglia_Import


(*)Dataset data is considered + last update

(**) this is replaced with main content about the dataset.

Data Transformation

A python script will proccess the dataset ( a csv refined using open refine) and export it in several output, such as OSM XML or a CSV well formatted for csv2osm script.

Some example of data trasformation on risorsaTerritoriale\Categoria*\tipologia column


Masserie ---> place = hamlet

Torri ----> man_made = tower

Chiese e cattedrali ----> building = church, building = cathedral

musei ---> tourism=museum

Basiliche e santuari ---> amenity=place_of_worship

Bed & breakfast ---> guest_house=bed_and_breakfast

Affittacamere ---> guest_house=bed_and_breakfast

Case e appartamenti vacanza ---> tourism=apartment

Alloggi agrituristici ---> guest_house=agritourism

Alberghi ---> tourism = hotel

Campeggi ---> tourism = camp site

Case per ferie ---> tourism=apartment

Residenze tur. alberghiere ---> tourism = hotel

Villaggi turistici -- > place = village, tourism=*


We refer to declared dataset 8 - Digital Library - Collezione "Masserie di Puglia"

This dataset contain a collection of pictures related to different physical point/attraction.

After a (Open) refine we get a "clean" dataset as this one: Dataset_masserie_clean.csv

We can note that the refined dataset contain just the physical point "masseria - name of the attraction" but at same time keep the references to all the "wiki-sheets" related to it ( html links separated by -|-)

At this point we need to transform some data in the dataset in order that these info match osm tag.

For this purpose a custom script is used: jcsv2osm*

The output produced are: a csv ready to be processed with csv2osm or direcly an osm file.


(*)About the use of the script we invite you to read the readme.md in github repo.

Data Transformation Results

csv ready to be processed with csv2osm: Dataset-masserie_csv2osm.csv

that converted with csv2osm produce this osm file: masserie_clean_csv2osm.osm

or get directly an osm file: masserie_clean.csv.osm

Note, csv2osm in this case have been customized: csv2osm_custom

Data Merge Workflow

Team Approach

It is done solo

Workflow

* Clean and convert datasets in osm file as described before.

* Use Josm for merge POI (conflation)

* Update the wikipage with new dataset and planning tags and about the progress of old uploads.

* Inform the comunitythat the upload is done.

Conflation

This step is done with JOSM conflation plugin. If the data already exist, only information about website are considered.

QA