Import/Catalogue/Grotte-RAFVG

From OpenStreetMap Wiki
Jump to navigation Jump to search

About

This page is about importing cave entrances dataset published by regione Friuli Venezia Giulia (RAFVG), Italy.

Dataset shall be adapted in order to generate OSM files suitable to be imported in planet.osm. It shall not be a blind import: source data shall be checked by mappers thru audit support maps.

The import is being discussed on the regional OSM mailing list. The import will be the result of consensus there.

Goals

This import aims to have a and updated set of cave entrances for regional territory. Cave entrances shall be filtered by presence of a physical marker in immediate surroundings.

Schedule

This import shall be perfomed on a regional (admin_level=4) base. Audit progress will be trackable in project page. Import size (~2500 POIs) should take 8 week to be accomplished.

Import Data

Background

Source dataset contains 8744 records, each defining a cave entrance; each record has a reference number assigned by Catasto Speleologico Regionale (CRS); a referece number can be shared by more cave entrances (if they give access to the same cave). After sampling some records, defined geo coordinates seem accurate. During audit process, few minor spatial errors may be detected; since OSM objects are considered authoritative in conflation process, in case of mismatch, event shall be manually recorded in object fixme tag.

Metadata

Legal

  • Licence definition page: Uso dati
  • Data license: IODLv2 as defined in data source page above, addenda Allegato A.

Record format and tagging plan (draft)

According to wiki the following tagging shall be applied.

RAFVG Ingressi Grotte - record format
Name Description:it Description:en Example Notes Tagged as
CATASTO_RE codice CSR CSR id 3478 cave:ref
NOME_PRINC nome della grotta cave's name Mala Jama name
NOME_INGR nome dell'ingresso (se più di uno) entrance name (if more than one) Ingresso 2 appended to cave:ref
QUOTA quota dell'ingresso entrance elevation 108.00 Trailing zeroes will be removed ele
SVIL_PLAN sviluppo planimetrico sprawl 403.00 Trailing zeroes will be removed cave:length_plan

TBD cave:size

DISLIVELLO dislivello elevation gain/drop 24.00 Trailing zeroes will be removed cave:depth

TBD cave:size

TIPO_INGRE tipo di ingresso entrance type Orizzontale description
MORF_INGRE morfologia ingresso entrance morphology Galleria description
TARGHETTA presenza targhetta marker presence yes cave:plate
url url scheda del catasto url to item in datasource archive https://catastogrotte.regione.fvg.it/scheda/10-abc derived from json dataset version url

Import Type

The dataset will be imported on a regional base (OSM admin_level=4). Prior to upload, osm preview file will be published and linked in this page to be manually checked by local teams.

Data Preparation

The data is presented as ESRI shp file in a collection of point elements, one for each cave entrance. Input dataset shall be converted from ESRI shp to csv via Qgis and feeded to Openrefine.

Refining

Prior to osm>json conversion, some issues require refining operations (OpenRefine), documented herein. A summary of actions performed thru OpenRefine:

  • description built concatenating TIPO_INGRE, MORF_INGRE
  • some char case fixing and decimal removal
  • local id (required by conflation process)
  • filtering by TARGHETTA field

Exporting

Conflator input requires json format. Dataset conversion to json is performed thru OpenRefine template documented herein. Further validation on output json files can be performed thru jsonlint (npm -g install jsonlint).

Up to 2.8 version, Openrefine doesn't manage null values; workaround to remove lines containing nulls:

pi@raspberrypi:~/OSM sed -i -e '/ : null/d' <Openrefine-output-file>.json

Conflation

Conflation is performed by OSM Conflator. Objects tagged as "natural"="cave_entrance" will be extracted from OSM by a specific overpass-turbo query. Matching OpenStreetMap data within a range is merged and tags will be added or proposed for change accordingly to conflator parameter file. Non-matching OSM objects will be marked with the note tag: "this cave entrance has not matches with RAFVG dataset within a 20m radius", for future surveys.

Json file resulting from conflation shall be community revised on an audit map. Upon audit completion, an osm file shall be generated by further conflator run.

Conflator output example

pi@rpi3: conflate -i 2021-05-04.json -v -c preview-2021-05-04.json --osm cave_entrance.osm profile.py 
10:30:46 Loading profile <_io.TextIOWrapper name='profile.py' mode='r' encoding='UTF-8'>
10:30:47 Dataset points duplicate each other: 178 and 179
10:30:48 Dataset points are too similar: 365 and 366
10:30:48 Dataset points are too similar: 408 and 409
10:30:48 Dataset points are too similar: 478 and 479
10:30:48 Dataset points are too similar: 787 and 788
10:30:51 Found 14 duplicates in the dataset
10:30:51 Read 2573 items from the dataset
10:30:51 Downloaded 1106 objects from OSM
10:31:02 Matched 638 points
10:31:02 Removed 13 unmatched duplicates
10:31:02 Adding 1922 unmatched dataset points
10:31:03 Deleted 0 and retagged 468 unmatched objects from OSM

Conflator re-run after audit

pi@rpi3: conflate -i 2021-05-04.json -a audit_FVG-CAVES.json -o caves-ready.osm --osm cave_entrance.osm profile.py
TBD

Upload

Dedicated upload account

The account cascafico will be used to upload community revised .osm files.

Changeset Tags

Changeset will be tagged with:

Team Approach

Import will be managed by the following OSM users:

  • Cascafico

Workflow

Step by step operations:

  1. dataset download
  2. shp to csv OpenRefine conversion
  3. OpenRefine operations
  4. OpenRefine json export
  5. run conflator
  6. audit map announcement & publication
  7. wait for community validation
  8. conflation re-run
  9. OSM candidate publication
  10. Upload changeset in OSM

In case of import problems, changeset involved will be reverted using proper reverter

OSM Candidate file

TBD.osm

QA

In case some problems will be detected after upload:

Widespread:

  • TBD

Limited:

  • TBD