User:Yopaseopor

From OpenStreetMap Wiki
Jump to: navigation, search

Goals

The goal is to import the housenumbers from the Barcelona city council database.

In Barcelona there are 32782 already in OSM. We will merge to these nodes all the useful information, deleting duplicities.

Schedule

  1. December 12 2016: We started the preparation , the mixture of the two files and the discussion. See the discussion in the talk-es list (in spanish)
  2. XXXX: Sent the proposal to the imports list.
  3. XXXX: Import started
  4. XXXX: QA finished
  5. XXXX: Report sent to talk-es and to Barcelona city council open data site.

Import Data

Data description

The original dataset is a mixture of two files. Each file has the important data. The final goal is the 171,642 housenumbers with the name of the 4620 names of the streets.

-Carrerer de la ciutat de Barcelona: http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/taulacarrers/

-Direccions postales elementales: http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/tauladirele/

We'll use the csv for processing. You can download the data here.

Background

ODbL Compliance verified: YES

The license can be consulted here (in Spanish). It requires attribution of the source, so all nodes and the changeset will include source=Ajuntament de Barcelona. The Barcelona City Council (Ajuntament de Barcelona) is already on the contributors list.

As the ask for use OSM in a council project (decidim.Barcelona) , we will ask for a explicit permission to the data owner (Ajuntament de Barcelona) and they confirmed me that we comply with their attribution clause using their data in OSM if we clearly specify the source and the source:date, the way it will be done.

Compatibility with the ODbL was already discussed in the imports and talk-es mailing lists.

Import Type

The import will be done manually,for making the conflation between OSM data and AJ BCN data, using the Tasks Manager, so it can be assigned to different volunteers.

Data Preparation

Data Reduction & Simplification

As mentioned before, the data file we have chosen is in csv format. Some corrections are part of the data conversion script. Some of the fields aren't relevant and will be ignored.

Tagging Plans

Here are the original fields, their meaning and how they will be converted to the resulting OSM file:

TAULA_DIRELE

Tagging Conversion
Original Fields Meaning OSM tag Comments
CODI_VIA City street code ajbcn:street_id We will import this code for updating processes.
NUMPOST Internal reference number addr:housenumber=*
LLEPOST Complement with letter addr:housenumber=* When It is complementary from NUMPOST
TIPUSNUM Type of number Not importable
DTE District Not importable
DIST_POST Internal council postal district addr:postcode=*
SECC_CENS Not importable
SECC_EST Not importable
BARRI Neighbourhood Not importable
ED50_COORD_X Not importable
ED50_COORD_Y Not importable
ETRS89_COORD_X X coordinates (EPSG:25831) X Not importable
ETRS89_COORD_Y Y coordinates (EPSG:25831) Y Not importable
ORD Not importable

CARRERER

Tagging Conversion
Original Fields Meaning OSM tag Comments
CODI_CARRER City street code ajbcn:street_id We will import this code for updating processes.
CODI_CARRER_INE Internal reference number from the INE (Spanish National Stats Institute) Not importable
TIPUS_VIA Type of street Not importable
NOM_OFICIAL Official name of the street addr:street=*
NOM_CURT Short name Not importable
NRE_MIN Minimum number of housenumber in this street Not importable
NRE_MAX Maximum number of housenumber in this street Not importable

To all the nodes, we will add the following tags:

Changeset Tags

We will use the following changeset tags:

Data Transformation

Data is in csv format. We will:

  1. Download the data with this script https://github.com/kresp0/direccionesbcn2osm.sh
    #!/bin/bash
    # Descarga, transforma y reproyecta las direcciones de Barcelona
    # a partir los datos abiertos del Ajuntament al formato XML de OSM.
    # Santiago Crespo 2016 
    # https://creativecommons.org/publicdomain/zero/1.0/
    
    OUT_FILE=direcciones-BCN.osm
    TMPDIR=/tmp/direcciones-bcn
    ORIG_PWD="$PWD"
    
    rm -rf $TMPDIR
    mkdir $TMPDIR
    cd $TMPDIR
    
    # Download the rdf with the source:date information:
    wget "http://opendata.bcn.cat/opendata/es/catalog/SECTOR_PUBLIC/data-catalog/0/RDF" -O rdf
    FECHA_CALLES=`grep -A 3 CARRERER rdf | grep "dct:modified" | awk -F '>' '{print $2}' | awk -F 'T' '{print $1}'`
    FECHA_DIRECCIONES=`grep -A3 INFRAESTRUCTURES/TAULA_DIRELE rdf | grep "dct:modified" | awk -F '>' '{print $2}' | awk -F 'T' '{print $1}'`
    
    if [ "$a" != "$b" ]; then
      echo "ERROR: FECHA_CALLES y FECHA_DIRECCIONES no coinciden!"
      echo "FECHA_CALLES = $FECHA_CALLES"
      echo "FECHA_DIRECCIONES = $FECHA_DIRECCIONES"
      echo "No sé que poner en source:date"
      exit 1
    fi
    
    # Download the csv file with the addresses
    wget http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/tauladirele/ -O direcciones.html
    wget `grep csv direcciones.html | grep http | awk -F '"' '{print "http://opendata.bcn.cat"$2}'` -O TAULA_DIRELE.csv
    
    perl -pe 's/ETRS89_COORD_X/x/g' TAULA_DIRELE.csv | perl -pe 's/ETRS89_COORD_Y/y/g' > t ; mv t TAULA_DIRELE.csv
    
    # Reproject from EPSG:25831 to EPSG:4326:
    echo '<OGRVRTDataSource>
      <OGRVRTLayer name="TAULA_DIRELE">
      <SrcDataSource>TAULA_DIRELE.csv</SrcDataSource>
      <GeometryType>wkbPoint</GeometryType>
      <LayerSRS>+init=epsg:25831 +wktext</LayerSRS>             
      <GeometryField encoding="PointFromColumns" x="x" y="y"/>
      </OGRVRTLayer>
      </OGRVRTDataSource>' > direcciones-bcn.vrt
    
    ogr2ogr -lco GEOMETRY=AS_XY -overwrite -f CSV -t_srs EPSG:4326 DIRECCIONES-BCN.csv direcciones-bcn.vrt
    
    # Download the csv file with the complete street names
    wget http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/taulacarrers/ -O calles.html
    wget `grep csv calles.html | grep http | awk -F '"' '{print "http://opendata.bcn.cat"$2}'` -O CARRERER.csv
    
    # Remove the first line
    tail -n +2 DIRECCIONES-BCN.csv > t ; mv t DIRECCIONES-BCN.csv
    tail -n +2 CARRERER.csv > t ; mv t CARRERER.csv
    
    # Add headers
    echo '<?xml version="1.0" encoding="UTF-8"?>' > $OUT_FILE
    echo '<osm version="0.6" generator="direccionesbcn2osm.sh 1.0">' >> $OUT_FILE
    
    COUNTER=0
    
    while IFS=$';' read -r -a VIA; do
      echo "Procesando: ${VIA[3]}"
    
      while IFS=$',' read -r -a DIRECCIONES; do
    # Si CODI_VIA es = CODI_CARRER
        if [ "${VIA[0]}" = "${DIRECCIONES[2]}" ]; then
          let COUNTER=COUNTER-1
          echo '  <node id="'$COUNTER'" lat="'${DIRECCIONES[1]}'" lon="'${DIRECCIONES[0]}'">' >> $OUT_FILE
          echo '    <tag k="ajbcn:street_id" v="'${DIRECCIONES[2]}'"/>' >> $OUT_FILE
    ########## TODO: JUNTAR NÚMERO Y LETRA SI TIENE LETRA
          echo '    <tag k="addr:street" v="'${VIA[3]}'"/>' >> $OUT_FILE
          NUMERO=$(echo ${DIRECCIONES[3]} | sed 's/^0*//') # Remove leading zeroes
          echo $NUMERO
    	  echo '    <tag k="addr:housenumber" v="'$NUMERO'"/>' >> $OUT_FILE
    	  echo '    <tag k="addr:postcode" v="080'${DIRECCIONES[7]}'"/>' >> $OUT_FILE
          echo '    <tag k="source" v="Ajuntament de Barcelona"/>' >> $OUT_FILE
          echo '    <tag k="source:date" v="'$FECHA_DIRECCIONES'"/>' >> $OUT_FILE
    #      echo '    <tag k="source" v="Infraestructura de dades espacials de l\'Ajuntament de Barcelona - Geoportal"/>' >> $OUT_FILE
    #      echo '    <tag k="source" v="Carto BCN / Ajuntament de Barcelona"/>' >> $OUT_FILE
          echo '  </node>' >> $OUT_FILE
        fi
      done < $TMPDIR/DIRECCIONES-BCN.csv
    
    done < $TMPDIR/CARRERER.csv
    
    echo '</osm>' >> $OUT_FILE
    
  2. Removed all the original tags, added the OSM tags using JOSM
  3. Generated a polygon from the nodes using the convex hull tool in QGIS and used it to generate the working area on the TM
  4. Filtered the nodes to import that are already near a mapped node in OSM using the osmsplitdup script
  5. Manually confirmed that all nodes marked as duplicated nodes are indeed duplicated using JOSM with the ToDo plugin.
  6. Splitted the file with the possibly non-duplicated nodes to have one file for each task on the TM using the osmboxes script

Data Import Workflow

Team Approach

Import will be undertaken by experienced OSM volunteers, following a strict workflow.

References

The import will be discussed in the Talk-Es list and in the Imports list.

Workflow

As most (if not all) of the volunteers will be Spanish speaking, they can follow the detailed workflow in Spanish language on the TM.

The workflow will be as follows:

  • 1. Install the TODO list JOSM plugin if the volunteer don't have it.
  • 2. Create, if we don't have one yet, an import specific user account, like username_bcn_housenumbers_import (you will need a different email account from the account you used for your regular OSM account). Change the OSM username to that specific account in the JOSM preferences.
  • 3. Configure JOSM to allow remote access.
  • 4. Login into the Tasks Manager.
  • 5. Select one square to work on.
  • 6. Open the OSM data and the drinking_water data by clicking on the JOSM button and the link on the extra instructions.
  • 7. Select all nodes on the drinking_water layer and add them to the ToDo list.For each node we first check its correctness, correcting any errors or typos we may still encounter. If the node is clearly wrong or suspicious of being wrong, it won't be imported in the first place, and it will be added to the comments, so it can be checked afterwards by other mappers to take a decision about it. The mapper will delete that node to be sure it won't be imported. Also we will conflate the node data with the existing if there is already one node or create a new one if not. Don't replace existing hand mapped information without on-the-ground verification.
  • 8. If the housenumber is already on OSM, we'll not import it but we will check the extra information to add to this node.
  • 9. After the import is done, there will be a review process using the TM.

Updates

Each year I'll download the dataset, do a diff and update the data to OSM manually. Until I get bored. yopaseopor (talk) 23:29, 11 December 2016 (UTC)

Reverse plan

In case of any trouble, JOSM reverter will be used.

After the import

Report with problems and errors

I will make made a report with all the problems and errors detected and sent to the Barcelona city council Open Data administrators. yopaseopor (talk) 23:29, 11 December 2016 (UTC)