Madrid Drinking Water Import

From OpenStreetMap Wiki
Jump to: navigation, search

Goals

The goal is to import drinking fountains from the Madrid city council database, see: Import_Ayuntamiento_Madrid (in spanish)

In Madrid there are 562 drinking fountains already in OSM. As the nodes to import have no useful information other than the coordinates, it makes no sense to merge them. So we'll not import those.

Schedule

  1. July 4 2016: We started the preparation and the discussion. See the discussion in the talk-es list (in spanish)
  2. July 29: Sent the proposal to the imports list.
  3. August 1: Import started
  4. September 25: QA finished
  5. September 28: Report sent to talk-es and to Madrid city council open data site.

Import Data

Data description

The original dataset is csv format. It includes a total of 1,553 spots.

We'll use the csv file for processing. You can download the data here.

Background

ODbL Compliance verified: YES

The license can be consulted here (in Spanish). It requires attribution of the source, so all nodes and the changeset will include source=Ayuntamiento de Madrid. The Madrid City Council (Ayuntamiento de Madrid) is already on the contributors list.

After sending the report with the problems and errors found during the drinking water import, including a section about the license, got a response from the "Subdirección General de Transparencia del Ayuntamiento de Madrid" (the data owner), and they confirmed me that we comply with their attribution clause using their data in OSM if we clearly specify the source and the source:date, the way it was done on the Madrid_Drinking_Water_Import. And they do like the work we are doing :) Kresp0 (talk) 13:27, 16 October 2016 (UTC)

Compatibility with the ODbL was already discussed in the imports and talk-es mailing lists.

Cumplimiento de la cláusula de atribución de los datos abiertos del Ayto. de Madrid en OSM.pdf

Import Type

The import will be done manually, using the Tasks Manager, so it can be assigned to different volunteers.

Data Preparation

Data Reduction & Simplification

As mentioned before, the data file we have chosen is in csv format. Some corrections are part of the data conversion script. Some of the fields aren't relevant and will be ignored.

Tagging Plans

Here are the original fields, their meaning and how they will be converted to the resulting OSM file:

Tagging Conversion
Original Fields Meaning OSM tag Comments
DISTRITO City district
CODIGO Internal reference number
ZONA VERDE/VIA PUBLICA If the fountain is on the street or a park. We need to know if it is "PARQUE HISTORICO SINGULAR O FORESTAL" because then a different coordinate system is used.
DIRECCION Street, number. Sometimes contains only the street. Would be addr:street=* and addr:housenumber=* or add:full=*
COMPLEMENTO DE DIRECCIÓN Street, number. Sometimes contains only the street. Would be addr:street=* and addr:housenumber=*. or add:full=*
COORD. X X coordinates (EPSG:25830 and EPSG:23030) X those with the tag "PARQUE HISTORICO SINGULAR O FORESTAL" uses EPSG:25830 while the rest uses EPSG:23030. conversion to EPSG:4326 needed
COORD. Y Y coordinates (EPSG:25830 and EPSG:23030) Y those with the tag "PARQUE HISTORICO SINGULAR O FORESTAL" uses EPSG:25830 while the rest uses EPSG:23030. conversion to EPSG:4326 needed

To all the nodes, we will add the following tags:

Changeset Tags

We will use the following changeset tags:

Data Transformation

Data is in csv format. We already:

  1. Downloaded the data
  2. Splitted the original csv file into 2, each with its own coordinate system, using grep: grep "PARQUE HISTORICO" fuentes.csv > fuentes-25830.csv ; grep -v "PARQUE HISTORICO" fuentes.csv > fuentes-23030.csv
  3. Reprojected from the 2 different coordinate systems (EPSG:25830 and EPSG:23030) to EPSG:4326 using ogr2ogr: ogr2ogr -s_srs "+init=epsg:23030 +nadgrids=./peninsula.gsb +wktext" -t_srs EPSG:4326 destino.shp origen.shp
  4. Converted to OSM format with the opendata plugin in JOSM
  5. Removed all the original tags, added the OSM tags (see Madrid_Drinking_Water_Import#Tagging Plans) using JOSM
  6. Generated a polygon from the nodes using the convex hull tool in QGIS and used it to generate the working area on the TM
  7. Filtered the nodes to import that are already near a mapped node in OSM using the osmsplitdup script
  8. Manually confirmed that all nodes marked as duplicated nodes are indeed duplicated using JOSM with the ToDo plugin.
  9. Splitted the file with the possibly non-duplicated nodes to have one file for each task on the TM using the osmboxes script

Here is the code used to download, split and reproject the data:

# Descarga el archivo csv con los datos de las fuentes de agua potable
  wget http://datos.madrid.es/egob/catalogo/300051-0-fuentes.csv -O "Inventario Fuentes.csv"

# Separa las fuentes según el sistema de coordenadas usado:
  echo "DISTRITO;CODIGO;ZONA VERDE/VIA PUBLICA;name;COMPLEMENTO DE DIRECCION;x;y" > fuentes-25830.csv
  cp fuentes-25830.csv fuentes-23030.csv
  grep "PARQUE HISTORICO" "Inventario Fuentes.csv" >> fuentes-25830.csv
  grep -v "PARQUE HISTORICO" "Inventario Fuentes.csv" | grep -v "DIRECCI" >> fuentes-23030.csv

# Convierte la codificación de caracteres a UTF-8:
  iconv -f ISO-8859-15 -t UTF-8 fuentes-25830.csv > f ; mv f fuentes-25830.csv
  iconv -f ISO-8859-15 -t UTF-8 fuentes-23030.csv > f ; mv f fuentes-23030.csv

# Descarga y descomprime la rejilla para cambio de Datum entre ED50 y ETRS89 para la península del Instituto Geográfico Nacional:
  wget https://www.ign.es/ign/resources/herramientas/PENR2009.zip
  unzip PENR2009.zip

# Prepara la reproyección de EPSG:25830 a EPSG:4326:
  echo '<OGRVRTDataSource>
  <OGRVRTLayer name="fuentes-25830">
  <SrcDataSource>fuentes-25830.csv</SrcDataSource>
  <GeometryType>wkbPoint</GeometryType>
  <LayerSRS>+init=epsg:25830 +wktext</LayerSRS>             
  <GeometryField encoding="PointFromColumns" x="x" y="y"/>
  <Field name="name" src="name" />
  </OGRVRTLayer>
  </OGRVRTDataSource>' > fuentes-25830.vrt

# Prepara la reproyección de EPSG:23030 a EPSG:4326 con la rejilla:
  echo '<OGRVRTDataSource> 
  <OGRVRTLayer name="fuentes-23030"> 
  <SrcDataSource>fuentes-23030.csv</SrcDataSource> 
  <GeometryType>wkbPoint</GeometryType> 
  <LayerSRS>+init=epsg:23030 +nadgrids=./PENR2009.gsb +wktext</LayerSRS> 
  <GeometryField encoding="PointFromColumns" x="x" y="y"/> 
  <Field name="name" src="name" />
  </OGRVRTLayer>
  </OGRVRTDataSource>' > fuentes-23030.vrt

# Reproyecta usando ogr2ogr
  ogr2ogr -lco GEOMETRY=AS_XY -overwrite -f CSV -t_srs EPSG:4326 fuentes-23030-reproyectado.csv fuentes-23030.vrt
  ogr2ogr -lco GEOMETRY=AS_XY -overwrite -f CSV -t_srs EPSG:4326 fuentes-25830-reproyectado.csv fuentes-25830.vrt

  cp fuentes-23030-reproyectado.csv fuentes.csv
  grep -v 'X,Y,name,' fuentes-25830-reproyectado.csv >> fuentes.csv

  echo '<OGRVRTDataSource>
  <OGRVRTLayer name="Fuentes de Madrid">
  <SrcDataSource relativeToVRT="1">.</SrcDataSource>
  <SrcLayer>fuentes</SrcLayer>
  <GeometryType>wkbPoint</GeometryType>
  <LayerSRS>WGS84</LayerSRS>
  <GeometryField encoding="PointFromColumns" x="x" y="y"/>
  </OGRVRTLayer>
  </OGRVRTDataSource>' > fuentes.vrt

  ogr2ogr -f "ESRI Shapefile" . fuentes.csv && ogr2ogr -f "ESRI Shapefile" . fuentes.vrt

Data Import Workflow

Team Approach

Import will be undertaken by experienced OSM volunteers, following a strict workflow.

References

The import will be discussed in the Talk-Es list and in the Imports list.

Workflow

As most (if not all) of the volunteers will be Spanish speaking, they can follow the detailed workflow in Spanish language on the TM.

The workflow will be as follows:

  • 1. Install the TODO list JOSM plugin if the volunteer don't have it.
  • 2. Create, if we don't have one yet, an import specific user account, like username_madrid_drinking_water_import (you will need a different email account from the account you used for your regular OSM account). Change the OSM username to that specific account in the JOSM preferences.
  • 3. Configure JOSM to allow remote access.
  • 4. Login into the Tasks Manager.
  • 5 Select one square to work on.
  • 6 Open the OSM data and the drinking_water data by clicking on the JOSM button and the link on the extra instructions.
  • 7 Select all nodes on the drinking_water layer and add them to the ToDo list.For each node we first check its correctness, correcting any errors or typos we may still encounter. If the node is clearly wrong or suspicious of being wrong, it won't be imported in the first place, and it will be added to the comments, so it can be checked afterwards by other mappers to take a decision about it. The mapper will delete that node to be sure it won't be imported.
  • 6. If the fountain is already on OSM, we'll not import it. We already manually reviewed and filtered 467 duplicates.

Reverse plan

In case of any trouble, JOSM reverter will be used.

After the import

Updates

Each 3 months I'll download the dataset, do a diff and update the data to OSM manually. Until I get bored. Kresp0 (talk) 07:53, 24 July 2016 (UTC)

Report with problems and errors

I made a report with all the problems and errors detected and sent to the Madrid city council Open Data administrators. Kresp0 (talk) 09:20, 28 September 2016 (UTC)

Read the full report in spanish: Problemas y errores en el conjunto de datos "Fuentes de agua potable" del Ayto. de Madrid

It includes:

  • Introduction: explaining the import to OSM
  • Coordinate systems: 2 of them mixed with no documentation
  • Incorrect coordinates
  • Drinking water tagged as non potable in OSM (drinking_water=no) and ornamental only (amenity=fountain)
  • Nonexistent fountain
  • Drinking water points mapped on OSM but missing on the city council dataset
  • License: would be nice to have a signed letter with explicit permission
  • Conclusion: thanks to the Transparency Project (city council) and to OSM volunteers. Some ideas to use OSM in the city council.