Madrid Pharmacies Import

Goals

The goal is to import 1,701 pharmacies from the Madrid city council (Ayuntamiento de Madrid) database. There are some hundreds of pharmacies already in OSM, most of them poorly tagged, that would be merged as part of the import, substituting them by the new ones, and transfering any useful tags from the old ones to the new ones.

Schedule

Preparation, discussion - due to start the 1st of August.
Import - expected to start any time after the community has solved any issues, doubts or concerns about this import.

Import Data

Data description

The original dataset is in csv format. It includes a total of 1,807 pharmacies, but 106 aren't geolocalized and, therefore, won't be imported.

The dataset is in several formats, but we use the csv file for processing. You can download the data here.

Background

ODbL Compliance verified: YES

The license can be consulted here (in Spanish). It requires attribution of the source, so all nodes and the changeset will include source=Ayuntamiento de Madrid. The Madrid City Council (Ayuntamiento de Madrid) is already on the contributors list.

After sending the report with the problems and errors found during the drinking water import, including a section about the license, got a response from the "Subdirección General de Transparencia del Ayuntamiento de Madrid" (the data owner), and they confirmed me that we comply with their attribution clause using their data in OSM if we clearly specify the source and the source:date, the way it was done on the Madrid_Drinking_Water_Import. And they do like the work we are doing :) Kresp0 (talk) 13:25, 16 October 2016 (UTC)

Compatibility with the ODbL was already discussed in the imports and talk-es mailing lists.

Import Type

The import will be done manually, with the data divided in sets of 50 nodes, so it can be assigned to different volunteers.

Data Preparation

Data Reduction & Simplification

As mentioned before, the data file we have chosen is in csv format. Some typos have been corrected manually in a Spreadsheet, producing a modified csv file, but some other corrections are part of the data convertion awk script. Some of the fields aren't relevant and will be ignored, while some others have been concatenated in one.

Tagging Plans

Here are the original fields, their meaning and how they will be converted to the resulting csv/OSM file:

**Tagging Conversion**
Original Fields	Meaning	OSM tag	Comments
PK	Internal reference number	Not used
NOMBRE	Name of the pharmacy	`name=*`
DESCRIPCION-ENTIDAD	There is only one node with it. It gives extra info to the name.	It will be appended to the `name=*` tag.
HORARIO	Opening hours	All nodes have this field empty, so we ignore it.
EQUIPAMIENTO	Some pharmacies give extra services, like optics, clinic analysis, etc.	It will be part of the `description=*` tag	See `description=*` further down
TRANSPORTE	It gives info on buses, subway and trains to reach the pharmacy	It will be part of the `description=*` tag	See `description=*` further down
DESCRIPCION	It's filled for all nodes with "Datos cedidos por el Colegio Oficial de Farmaceúticos de Madrid"	Not used.
ACCESIBILIDAD		Not used. It's always 0.
CONTENT-URL	Website URL for info about the pharmacy	`contact:website=*`	In fact, it's actually the URL to the Madrid council website with info about the pharmacy.
NOMBRE-VIA	Name of the street	`addr:street=*`	It is only the name, like Velázquez or Doctor Arce for Calle Velázquez or Avenida Doctor Arce
CLASE-VIAL	Type of street, like Calle (Street), Paseo (Boulevard), Avenida (Avenue), etc.	Will be appended at the beginning of `addr:street=*`
TIPO-NUM	Values are V, NUM, T, etc.	Not used.
NUM	House number	`addr:housenumber=*`
PLANTA	Level	Not used	It would be `level=*`, but all nodes have this field empty.
PUERTA	Door	Not used	It would be `addr:door=*`, but all nodes have this field empty.
ESCALERAS	Unit inside a building complex	Not used	It would be `addr:unit=*`, but all nodes have this field empty.
ORIENTACION	It gives extra information on how to get to the pharmacy.	It will be part of the `description=*` tag	See `description=*` further down
LOCALIDAD	City	Not used	We can get it from the city boundaries
PROVINCIA	Province	Not used	We can get it from the province boundaries
CODIGO-POSTAL	Post Code	`addr:postcode=*`
BARRIO	Could be translated as subdistrict, ward, quarter...	`addr:ward=*`
DISTRITO	Could be translated as district or suburb	Not used	We can get it from the district boundaries (admin level 9)
COORDENADA-X		Not used
COORDENADA-Y		Not used
LATITUD	Latitude in WGS84	Will be the lat of the OSM node
LONGITUD	Longitude in WGS84	Will be the lon of the OSM node
TELEFONO	Telephone number, with spaces every 3 ciphers, without ES country number	`contact:phone=*`	We'll delete the spaces, and will append +34 followed by a space at the beginning.
FAX	Fax number, with spaces every 3 ciphers, without ES country number	`contact:fax=*`	We'll delete the spaces, and will append +34 followed by a space at the beginning.
EMAIL	Contact email.	`contact:email=*`
TIPO	All contain the following: "/contenido/entidadesYorganismos/Farmacias"	Not used

To all the nodes, we will add the 2 following tags:

source=Ayuntamiento de Madrid
amenity=pharmacy

The description=* tag will be as follows:

description=Acceso: TRANSPORTE , ORIENTACION , Servicios: EQUIPAMIENTO

Changeset Tags

We will use the following changeset tags:

comment=Madrid Pharmacies Import, subset NUMBER_OF_SUBSET (Where NUMBER_OF_SUBSET is the serial number assigned to the 50-nodes subset. For example: comment=Madrid Pharmacies Import, subset 17 for the import of the 17th subset)
created_by=JOSM/version (This tag is created automatically by JOSM)
source=Ayuntamiento de Madrid
source:date=2014-07-08
import=yes
url=https://wiki.openstreetmap.org/wiki/Madrid_Pharmacies_Import

Data Transformation

Data is in csv format. After correcting some typos manually with a spreadsheet, we process the csv file and convert it to another csv file with a gawk script (importFarmacias.awk). We open the resulting file with JOSM+Open Data plugin, save the osm file and divide it in chunks of 50 nodes for importing.

Data Import Workflow

Team Approach

Import will be undertaken by experienced OSM volunteers, following a strict workflow. Each volunteer will choose a file with 50 nodes and will import them to OSM, while merging them with the old ones they may encounter.

References

The import has been discussed in the Talk-Es list and in the Imports list.

Workflow

As most (if not all) of the volunteers will be Spanish speaking, they can follow this detailed workflow in Spanish language, with screenshots.

The workflow will be as follows:

0. Before we do anything, we strongly recommend to install the ToDo List JOSM plugin (if we don't have it installed in our JOSM yet).

1. Create, if we don't have one yet, an import specific user account, like username_import (you will need a different email account from the account you used for your regular OSM account). Change the OSM username to that specific account in the JOSM preferences.

2. Download one of the files with a subset of 50 nodes that hasn't been imported yet, and write your username and mark that subset as Work in progress in the table of this wiki. Open that file in JOSM and download the OSM data for that area. If both data are in two different layers, merge both layers into one.

3. Now we set the filters amenity="pharmacy" and source="Ayuntamiento de Madrid" to add the 50 new nodes to the ToDo List using the ToDo List JOSM plugin. After doing that, in order to see all pharmacies, we drop the filter source="Ayuntamiento de Madrid", keeping only the amenity="pharmacy" filter active.

4 We go through all the 50 nodes, one by one, using the ToDo list. For each node we first check its correctness, correcting any typos we may still encounter, like lack of accents. If the node is clearly wrong or suspicious of being wrong, it won't be imported in the first place, and it will be added to the comments in the table of this import wiki, so it can be checked afterwards by other mappers to take a decision about it. The mapper will delete that node to make sure it won't be imported.

5. After checking the correctness of the node, we will proceed with the conflation of this new node with the old nodes, in case it is duplicated. To do so, we delete the tags that aren't needed or wrong, we change or correct those tags that need so and finally we select both the new and old nodes and we merge them with the Merge tool (M), so the old node moves to the new node location, keeping the id and history of the old node, and getting the new node tags added. Keep in mind that the conflation has to be done following the Conflation rules.

6. Once we have finished with all the 50 nodes, we proceed to upload the new data to OSM with our specific import OSM account, using the Import Changeset Tags already explained.

7. Now we go to the wiki and we mark the Status for the subset as Done.

Reverse plan

In case of any trouble, JOSM reverter will be used.

Conflation

There are hundreds of pharmacies already in the OSM database for the Madrid area. Some of them are independent of the 1,701 new ones, and therefore we will leave them in the map unchanged. But many others are a duplicate of one of the new nodes. In this last case we will use the JOSM Merge tool (M), with which we will keep the node id of the old one (and therefore its history), transfer any useful data from the old one (like opening_hours=* for example) and change its location to the new one. As the M tool merges all data, before doing so we will manually delete any tag that isn't needed or it's wrong in the old node, like addr:country=* or name=Farmacia for example.

Tags of the old pharmacies

Here we show what to do with the tags of the old pharmacy nodes:

**Tags of the OSM pharmacies to be transfered to the imported ones**
Tag	What we should do	Comments
`addr:city=*`	We ignore it	Duplicated
`addr:country=*`	We ignore it	Not needed
`addr:housename=*`	We ignore it.	There are only 2, and are both wrong
`addr:housenumber=*`	Should be the same as the new imported node
`addr:postcode=*`	Should be the same as the new imported node
`addr:state=*`	We ignore it	Not needed
`addr:street=*`	Should be the same as the new imported node
`alt_name=*`	There is only one with this tag. See `name=*` further down
`amenity=pharmacy`	We obviously ignore this
`contact:phone=*`	Should be the same as the new imported node
`date_off=*`	We ignore it	This tag is deprecated, and it's actually for use with the `access=*` tag
`date_on=*`	We ignore it	This tag is deprecated, and it's actually for use with the `access=*` tag
`description=*`	Only one node has this tag. It says it's open 24 hours, so transfer that info adding the following tag to the new one: `opening_hours=24/7`
`dispensing=*`	We'll keep only the ones with `dispensing=yes` or `dispensing=no`, and ignore the rest.
`email=*`	We'll change the tag key to `contact:email=*` and transfer their values.	There are only 2 nodes with this tag
`is_in:city=*`	We ignore it	Deprecated and duplicated
`level=*`	We will copy it to the new one
`name=*`	Some has `name=Farmacia`, so we ignore those. Some other have a name that duplicates the one of the new counterpart. In this case we also ignore it. In case the value is a possible alternative name, we create a new `alt_name=*` in the imported node and we transfer that value there.	There are more than 150 old pharmacies with this tag, so you will find them often during this conflation process
`old_name=*`	We will copy it to the new one	Only one node has this tag
`opening_hours=*`	We will copy it to the new one. Please, correct any errors	There are only 12 nodes with this tag
`operator=*`	We ignore it	There is only one node with this tag, and it's wrong. It's `operator=Lda. Consolacion Pinto Oca`, so that could go, probably, to `alt_name=*`
`phone=*`	Should be the same as the `contact:phone=*` in the new imported node
`shop=*`	We ignore it	It doesn't make any sense. Only one old node has this tag
`source:date=*`	We ignore it
`source:name=*`	You can append its value to the value in the `source=*` tag of the new node.
`source=*`	You can append its value to the value in the `source=*` tag of the new node.
`time_off=*`	We ignore it	Doesn't make any sense in this context
`time_on=*`	We ignore it	Doesn't make any sense in this context
`website=*`	We can check the url, and substitute the `contact:website=*` of the new node by that value	Only 2 old nodes have this tag
`wheelchair:description=*`	We will transfer it to the new one
`wheelchair=*`	We will surely transfer it to the new one

As explained before, if in doubt don't do anything with the node and report the problem in the comments field in the table of the Import Progress wiki.

Madrid Pharmacies Import

Contents

Goals

Schedule

Import Data

Data description

Background

Import Type

Data Preparation

Data Reduction & Simplification

Tagging Plans

Changeset Tags

Data Transformation

Data Import Workflow

Team Approach

References

Workflow

Reverse plan

Conflation

Tags of the old pharmacies

Navigation menu

Madrid Pharmacies Import

Goals

Schedule

Import Data

Data description

Background

Import Type

Data Preparation

Data Reduction & Simplification

Tagging Plans

Changeset Tags

Data Transformation

Data Import Workflow

Team Approach

References

Workflow

Reverse plan

Conflation

Tags of the old pharmacies

Navigation menu

Search