BCN Addresses Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

Goals

The goal is to import the addresses from the Barcelona city council database (addr:housenumber, addr:street, addr:postcode).

In Barcelona there are 32782 addresses already in OSM, we'll merge them.

Schedule

  1. December 12 2016: We started the preparation and the discussion. See the discussion in the talk-es list (in spanish)
  2. XXXX: Proposal sent to the imports list.
  3. XXXX: Import started.
  4. XXXX: QA finished.
  5. XXXX: Report sent to talk-es and to Barcelona city council open data site.

Import Data

Data description

We'll mix two files: one to get the street names and the other to get the housenumbers and postcodes. There are 171,637 housenumbers with 4,099 different street names.

-Carrerer de la ciutat de Barcelona: http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/taulacarrers/

-Direccions postales elementales: http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/tauladirele/

We'll use the csv files for processing.


Addresses from the Barcelona city council.
Addresses already in OSM (Dec 17, 2016).





Background

ODbL Compliance verified: YES

The license is CC BY 3.0 (in Spanish). It requires attribution of the source, so all nodes and the changeset will include source=Ajuntament de Barcelona. The Barcelona City Council (Ajuntament de Barcelona) is already on the contributors list.

The city council asked for help using OSM in a council project (decidim.Barcelona). We'll ask for a explicit permission to the data owner (Ajuntament de Barcelona) to get the confirmation that we comply with their attribution clause using their data in OSM if we clearly specify the source and the source:date.

Compatibility with the ODbL was already discussed in talk-es mailing list.

Import Type

The import will be done manually. The conflation between OSM data and AJ BCN data will be done using the Tasks Manager, so it can be assigned to different volunteers.

Data Preparation

Data Reduction & Simplification

As mentioned before, the data files we have chosen is in csv format. Some of the fields aren't relevant and will be ignored.

Tagging Plans

Here are the original fields, their meaning and how they will be converted to the resulting OSM file:

Tagging Conversion for the TAULA_DIRELE.csv file
Original Fields Meaning OSM tag Comments
CODI_VIA Street ID ajbcn:street_id=* We will import this code for the updating processes.
NUMPOST Housenumber addr:housenumber=*
LLEPOST Housenumber complement with letter addr:housenumber=* When it is complementary from NUMPOST
TIPUSNUM Type of number Not importable
DTE District Not importable
DIST_POST Postal district addr:postcode=* We'll add the first 3 digits, that are the same for all the city (080)
SECC_CENS Not importable
SECC_EST Not importable
BARRI Neighbourhood Not importable
ED50_COORD_X Not importable
ED50_COORD_Y Not importable
ETRS89_COORD_X X coordinates (EPSG:25831) X Reprojection to EPSG:4326 needed
ETRS89_COORD_Y Y coordinates (EPSG:25831) Y Reprojection to EPSG:4326 needed
ORD Not importable


Tagging Conversion for the CARRERER.csv file
Original Fields Meaning OSM tag Comments
CODI_CARRER Street ID ajbcn:street_id=* We will import this code for the updating processes.
CODI_CARRER_INE Internal reference number from the INE (Spanish National Stats Institute) Not importable
TIPUS_VIA Type of street Not importable
NOM_OFICIAL Official name of the street addr:street=*
NOM_CURT Short name Not importable
NRE_MIN Minimum number of housenumber in this street Not importable
NRE_MAX Maximum number of housenumber in this street Not importable

To all the nodes, we will add the following tags:

Changeset Tags

We will use the following changeset tags:

Data Transformation

Data is in csv format. We will:

  1. Download, mix, transform and reproject the data with this script https://github.com/kresp0/direccionesbcn2osm.sh
#!/bin/bash
# Descarga, transforma y reproyecta las direcciones de Barcelona
# a partir los datos abiertos del Ajuntament al formato XML de OSM.
# Santiago Crespo 2016 
# https://creativecommons.org/publicdomain/zero/1.0/

OUT_FILE=direcciones-BCN.osm
TMPDIR=/tmp/direcciones-bcn
ORIG_PWD="$PWD"

rm -rf $TMPDIR
mkdir $TMPDIR
cd $TMPDIR

# Download the rdf with the source:date information:
wget "http://opendata.bcn.cat/opendata/es/catalog/SECTOR_PUBLIC/data-catalog/0/RDF" -O rdf
FECHA_CALLES=`grep -A 3 CARRERER rdf | grep "dct:modified" | awk -F '>' '{print $2}' | awk -F 'T' '{print $1}'`
FECHA_DIRECCIONES=`grep -A3 INFRAESTRUCTURES/TAULA_DIRELE rdf | grep "dct:modified" | awk -F '>' '{print $2}' | awk -F 'T' '{print $1}'`

if [ "$a" != "$b" ]; then
  echo "ERROR: FECHA_CALLES y FECHA_DIRECCIONES no coinciden!"
  echo "FECHA_CALLES = $FECHA_CALLES"
  echo "FECHA_DIRECCIONES = $FECHA_DIRECCIONES"
  echo "No sé que poner en source:date"
  exit 1
fi

# Download the csv file with the addresses
wget http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/tauladirele/ -O direcciones.html
wget `grep csv direcciones.html | grep http | awk -F '"' '{print "http://opendata.bcn.cat"$2}'` -O TAULA_DIRELE.csv

perl -pe 's/ETRS89_COORD_X/x/g' TAULA_DIRELE.csv | perl -pe 's/ETRS89_COORD_Y/y/g' > t ; mv t TAULA_DIRELE.csv

# Reproject from EPSG:25831 to EPSG:4326:
echo '<OGRVRTDataSource>
  <OGRVRTLayer name="TAULA_DIRELE">
  <SrcDataSource>TAULA_DIRELE.csv</SrcDataSource>
  <GeometryType>wkbPoint</GeometryType>
  <LayerSRS>+init=epsg:25831 +wktext</LayerSRS>             
  <GeometryField encoding="PointFromColumns" x="x" y="y"/>
  </OGRVRTLayer>
  </OGRVRTDataSource>' > direcciones-bcn.vrt

ogr2ogr -lco GEOMETRY=AS_XY -overwrite -f CSV -t_srs EPSG:4326 DIRECCIONES-BCN.csv direcciones-bcn.vrt

# Download the csv file with the complete street names
wget http://opendata.bcn.cat/opendata/es/catalog/URBANISME_I_INFRAESTRUCTURES/taulacarrers/ -O calles.html
wget `grep csv calles.html | grep http | awk -F '"' '{print "http://opendata.bcn.cat"$2}'` -O CARRERER.csv

# Remove the first line
tail -n +2 DIRECCIONES-BCN.csv > t ; mv t DIRECCIONES-BCN.csv
tail -n +2 CARRERER.csv > t ; mv t CARRERER.csv

# Add headers
echo '<?xml version="1.0" encoding="UTF-8"?>' > $OUT_FILE
echo '<osm version="0.6" generator="direccionesbcn2osm.sh 1.0">' >> $OUT_FILE

COUNTER=0

while IFS=$';' read -r -a VIA; do
  echo "Procesando: ${VIA[3]}"

  while IFS=$',' read -r -a DIRECCIONES; do
# Si CODI_VIA es = CODI_CARRER
    if [ "${VIA[0]}" = "${DIRECCIONES[2]}" ]; then
      let COUNTER=COUNTER-1
      echo '  <node id="'$COUNTER'" lat="'${DIRECCIONES[1]}'" lon="'${DIRECCIONES[0]}'">' >> $OUT_FILE
      echo '    <tag k="ajbcn:street_id" v="'${DIRECCIONES[2]}'"/>' >> $OUT_FILE
########## TODO: JUNTAR NÚMERO Y LETRA SI TIENE LETRA
      echo '    <tag k="addr:street" v="'${VIA[3]}'"/>' >> $OUT_FILE
      NUMERO=$(echo ${DIRECCIONES[3]} | sed 's/^0*//') # Remove leading zeroes
      echo $NUMERO
	  echo '    <tag k="addr:housenumber" v="'$NUMERO'"/>' >> $OUT_FILE
	  echo '    <tag k="addr:postcode" v="080'${DIRECCIONES[7]}'"/>' >> $OUT_FILE
      echo '    <tag k="source" v="Ajuntament de Barcelona"/>' >> $OUT_FILE
      echo '    <tag k="source:date" v="'$FECHA_DIRECCIONES'"/>' >> $OUT_FILE
#      echo '    <tag k="source" v="Infraestructura de dades espacials de l\'Ajuntament de Barcelona - Geoportal"/>' >> $OUT_FILE
#      echo '    <tag k="source" v="Carto BCN / Ajuntament de Barcelona"/>' >> $OUT_FILE
      echo '  </node>' >> $OUT_FILE
    fi
  done < $TMPDIR/DIRECCIONES-BCN.csv

done < $TMPDIR/CARRERER.csv

echo '</osm>' >> $OUT_FILE
cp $OUT_FILE $ORIG_PWD/ && echo "Creado el archivo $ORIG_PWD/$OUT_FILE :)"
  1. Generate a polygon from the nodes using the convex hull tool in QGIS and use it to generate the working area on the TM
  2. Filter the nodes to import that are already near a mapped node in OSM using the osmsplitdup script
  3. Split the file with the possibly non-duplicated nodes to have one file for each task on the TM using the osmboxes script

Data Import Workflow

Team Approach

Import will be undertaken by experienced OSM volunteers, following a strict workflow.

References

The import will be discussed in the Talk-Es list and in the Imports list.

Workflow

As most (if not all) of the volunteers will be Spanish speaking, they can follow the detailed workflow in Spanish language on the TM.

The workflow will be as follows:

  • 1. Install the TODO list JOSM plugin if the volunteer don't have it.
  • 2. Create, if we don't have one yet, an import specific user account, like username_bcn_housenumbers_import (you will need a different email account from the account you used for your regular OSM account). Change the OSM username to that specific account in the JOSM preferences.
  • 3. Configure JOSM to allow remote access.
  • 4. Login into the Tasks Manager.
  • 5. Select one square to work on.
  • 6. Download the OSM data and the housenumbers data by clicking on the JOSM button and the link on the extra instructions.
  • 7. Select all nodes on the addr:housenumber layer and add them to the ToDo list.For each node we first check its correctness, correcting any errors or typos we may still encounter. If the node is clearly wrong or suspicious of being wrong, it won't be imported in the first place, and it will be added to the comments, so it can be checked afterwards by other mappers to take a decision about it. The mapper will delete that node to be sure it won't be imported. Also we will conflate the node data with the existing if there is already one node or create a new one if not. Don't replace existing hand mapped information without on-the-ground verification.
  • 8. If the housenumber is already on OSM, we'll not import it but we will check the extra information to add to this node.
  • 9. After the import is done, there will be a review process using the TM.

Updates

Each year I'll download the dataset, do a diff and update the data to OSM manually. Until I get bored. yopaseopor (talk) 23:29, 11 December 2016 (UTC)

Reverse plan

In case of any trouble, JOSM reverter will be used.

After the import

Report with problems and errors

I will make made a report with all the problems and errors detected and sent to the Barcelona city council Open Data administrators. yopaseopor (talk) 23:29, 11 December 2016 (UTC)