Import Borno eHealth Africa Smaller Settlements

From OpenStreetMap Wiki
Jump to: navigation, search

Goals

The goal is to import 4,587 place names of mostly hamlets, villages and isolated dwellings, surveyed on the field by eHealth Africa data collectors around a big part of Borno state, North East Nigeria, over the last year.

This import will complement the ongoing Nigeria eHealth Africa Places import for Borno state (400 nodes), both helping on getting a good Common Operational Dataset (COD) for Nigeria's Borno, Yobe and Adamawa states in OSM.

Schedule

  1. Preparation, discussion - Started on March 14th, 2015.
  2. Import - expected to start any time after the community has solved any issues, doubts or concerns about this import.

Import Data

Data description

The datasets were collected and generated by eHealth Africa in the course of their mapping activities in Northern Nigeria. The organisation has performed data collections in the entire area and remote tracing using aerial imagery. For Borno state, some areas are still off-limits due to the ongoing North East Nigeria Crisis, so unfortunately, data doesn't cover all state yet, but it's expected that more data will be collected in those areas as soon as security improves.

The original dataset consists of two files: Rural_Village.osm and Village_Placename.osm.

The first file, Rural_Village.osm, contains 2,241 place nodes. After checking 100 ramdom nodes, we see that 62 would fall under place=hamlet, 33 place=village, 3 place=isolated_dwelling, 2 place=neighbourhood and 1 would fall in a non inhabited place, so we can expect a few (10-20) nodes that we wouldn't be able to import directly (but we could report back to eHealth Africa staff to correct them).

The second file, Village_Placename.osm, contains 2,346 place nodes. Doing a similar check of 100 random nodes, we see this time 84 place=hamlet, 11 place=isolated_dwelling, 4 place=village and 1 place=neighbourhood. Not any wrong node seen among those 100 nodes.

The dataset contains several field attributes, of which the only one of interest for this import is the Settleme_1, that contains the name of the settlement.

As we can see, the first set contains bigger settlements than the second one, but none of the two files contain only one OSM place type, so setting the correct one will have to be done manually, node by node, using Bing and MapBox aerial imagery.

Background

ODbL Compliance verified: YES
eHealth Africa has given full authorization for the use of these data with the standard authorization document of the Humanitarian OpenStreetMap Team (HOT). A scan of the document can be found here.

Import Type

The import will be done manually through one HOT Tasking Manager (TM) project, having for each task of the job only the place nodes that lie within the task tile, in a similar way as it was done for the Central African Republic UNICEF import. For each task, we will first check the eHealth nodes to be imported, and second we will assess the data already in the OSM database against the eHealth Africa one for merging the eHealth data into the OSM database. The OSM mappers who will contribute to this import job will follow a detailed workflow to accomplish this.

Data Preparation

Data Reduction & Simplification

Both original data files (Rural_Village.osm and Village_Placename.osm) are in osm format. There wasn't any reduction or simplification on the original data.

Tagging Plans

Each place node will be tagged as follows:

eHealth Africa tag OSM tag Remarks
Settleme_1=* name=*
ALL place=hamlet/village/isolated_dwelling/neighbourhood... Users will be presented with the place=* tag set initially as place=unknown, and asked to change it appropiately, node by node, using Bing and MapBox aerial imagery.
ALL source=ehealthafrica.org
ALL 1st SET NODES fixme=Bigger places set. Please, set place tag to hamlet/village/isolated_dwelling/neighbourhood This set has roughly 2/3 of hamlets and 1/3 of villages. Users will remove this tag after setting the correct type of place for the node.
ALL 2nd SET NODES fixme=Smaller places set. Please, set place tag to hamlet/village/isolated_dwelling/neighbourhood This set has mostly hamlets and some villages and isolated_dwellings.Users will remove this tag after setting the correct type of place for the node.

Changeset Tags

For each task in the Tasking Manager job, will use the following changeset tags:

Data Transformation

Original data is in osm format. We just processed each file with a simple awk script for each of the two files, changing the tagging according to the Tagging Plans. Then, both resulting files were merged into one file.

Data Merge Workflow

Team Approach

Import will be undertaken by experienced OSM mappers, using an import specific OSM user account.

References

The import will be discussed in the import list, in the Talk-Ng list and in the HOT list.

Workflow

You can see the workflow here.

Reverse plan

In case of any trouble, JOSM reverter will be used.

Conflation

The location of the eHealth nodes is generally correct, so in general we won't change its original position. For the very rare cases that a node isn't located on any settlement, we won't import the node and will report the case for further check and eventual correction.

First of all, we will check if the name of the place is spelled correctly and respect the cartographic writing conventions. With very few exceptions, each noun of the place must have its initial letter in capital and subsequent letters non-capital, like for example Kurje, Unguwar Abdu and Gwarmai Cikin Gari. In case of doubt, we won't make any change to the name and we will leave a remark on the comments window that pops up when marking the task as done in the Tasking Manager. Bear in mind that some names use abbreviations that may need to be left in capital letters, like ATC Quarters neighbourhood.

Sometimes (around 5% of the cases in the first set of nodes, and even less in the second one) we find two (or three some times) nodes over the same settlement. In some of these cases they are actually two settlements, with a different name for each one. In those cases we should import both nodes as they are. But in some other cases, they are two nodes with similar alternate names, and therefore should be merged in one, putting one of the names as alt_name. And of course, in case they share the same name, we would merge the nodes in only one.

GNS nodes

Most of the place nodes we will encounter in the OSM database are from the GNS 2009 import.

In 2009 a huge set of GNS data was imported into OSM, an import that is now assessed as poor and inconvenient. In that dataset we have many duplicated, misspelled and badly located nodes, having many nodes being located in the middle of nowhere or just in the non-inhabited area between several settlements. Worse, many of the nodes don't have any relation to the village where they are located (if they are located over a settlement), nor any of the villages nearby. And, except for the name, we aren't interested in the other GNS tags, that shouldn't be imported to OSM in the first place.

In case of finding a place node already in the database and quite near the eHealth Africa node (in almost all of the cases it will be a 2009 GNS imported node), we will proceed the following way:

1. If the name of the GNS node is the same than the eHealth one, we will add the GNS source to the source tag of the eHealth Africa node (source=ehealthafrica.org;GNS), delete the GNS:dsg_code, GNS:dsg_string , fixme and is_in tags of the GNS node and then merge the GNS node with the eHealth Africa one, so we keep the node history (set source=ehealthafrica.org;GNS)

2. If the GNS node name is similar to the eHealth counterpart, we will change the GNS name=* tag to alt_name=* (example alt_name=Sansani and the source=GNS to source:alt_name=GNS, and then merge the GNS and eHealth nodes into one.

3. In case the name of the GNS node is too different from the eHealth one, we will just delete the GNS node. The limit on when the name is similar enough to add it as an alternative name or different enough to dismiss it, it's difficult to define. In case of doubt, we suggest to proceed as in point 2, and leave a comment on the comments box when you save the task in the Tasking Manager, so it will be carefully reviewed during the validation process.

GNS nodes that are far enough from any of the eHealth Africa nodes that we are importing and that are at the same time over or near a settlement or group of small hamlets, will be left where they are. They would be eventually replaced in future imports of eHealth Africa data or by anyone's own survey.

In case of nodes inside big towns and cities, they often have to be retagged manually to place=neighbourhood. Again, in case of doubt, we won't import them and we will leave a comment in the TM task.

About the GNS nodes inside big towns and cities, we will proceed similarly as we did in rural areas. The general goal is to merge the GNS nodes with the eHealth Africa nodes.