Import eHealth Africa Kaduna State Roads
Goals
eHealth Africa Foundation has released a new set of data, this time for Kaduna State, consisting of 3 files with roads, water lines (rivers/streams) and water bodies (riverbanks, lakes, etc.), with a total of 9,149,553 objects.
The goal is to import all the objects of the roads file, conflating them with the existing highways in the OSM database.
This import can improve dramatically the state of the map for the state of Kaduna, as around 75% of the state land has almost no roads in the OSM database (see next image).
Schedule
- Preparation, discussion - due to start the 15th of August.
- Import - expected to start any time after the community has solved any issues, doubts or concerns about this import.
Import Data
Data description
The original data is in shapefile format, with only one tag that states the surface of each highway segment: Paved, Unpaved or In Construction.
The ehealth roads file consists of 4,585,146 objects, of which 3,993,771 are nodes and 591,375 are ways (587,582 are unpaved, 3,793 are paved and 1 in construction). All the segments together make a total lenght of 83,370 km.
Being the original file of such a size, it has been split in 176 pieces, to make the import easier.
Background
ODbL Compliance verified: YES
eHealth Africa has given full authorization for the use of their data with the standard authorization document of the Humanitarian OpenStreetMap Team (HOT). A scan of the document can be found Media:IMG_20140501_115305.jpg.
Import Type
The import will be done manually by 2 or 3 trainned users, who will pick one piece of data each time, and process, tag and conflate it with the original data before uploading it to the OSM database.
These OSM mappers will follow a detailed workflow to accomplish this.
Data Preparation
Data Reduction & Simplification
The data is originally in shapefile format, in only one file for the whole Kaduna state. You can download it here.
Tagging Plans
Each road segment has only information about its surface. There are 3 cases: Paved (3,793 segments), Unpaved (587,582 segments) and In Construction (only one segment). For the Paved and Unpaved ones we retagged them as highway=road + surface=paved or surface=unpaved, respectively. For the In Construction segment, we tagged it as highway=construction (no surface).
Changeset Tags
We will use the following changeset tags:
- comment=eHealth Africa Foundation Kaduna State Roads import. Subset #XXX
- created_by=JOSM/version
- source=ehealthafrica.org
- source:date=2014
- import=yes
- url=https://wiki.openstreetmap.org/wiki/Import_eHealth_Africa_Kaduna_State_Roads
Where XXX will be substituted by the number of the file that is being imported (001 to 176)
Data Transformation
Original data is in shapefile format. We just open it with JOSM and Open Data plugin, and changed the tags as stated above. We have divided the data in 176 pieces to better import the data. You can download those pieces from here. The files are named as KDroadsXXX_"Something".osm
Finally, those files will be furthered modified with a script that joins most of the segments in longer ways. This transformation will be part of the import workflow.
Data Merge Workflow
Team Approach
Import will be undertaken by 2-3 experienced OSM mappers that will be trained for this task, using an import specific OSM user account and following a detailed workflow with many screenshots, that shows all the steps to follow, with the goal of best keeping the consistency among them during all the process.
References
This import will be discussed in the import list, in the Talk-Ng list and in the HOT list.
Workflow
The full workflow wiki can be checked here.
There are two main issues to consider for this import:
- Roads have to be tagged manually. We are talking about nearly 600,000 segments.
- Conflating with the existing OSM highways.
To be able to handle the import of such amount of data, the original file has been divided in 176 files of less than 4,000 segments each.
The best idea is to start with the areas with no existing or almost no existing highways in the OSM database, leaving the most heavily mapped areas for the end.
The workflow will follow these steps:
- One of the importing users chooses one of the 176 files. Optionally (quite recommended), we can cut the file in 2 or more pieces, and follow the workflow for each of those smaller files, one after one. For simplicity, we will call that file ehealth.osm from now on.
- We transform that file into a new one that will have the way segments joined. For this we use an ad-hoc script.
- We open the Bing imagery and, unless we have enough GPS traces for the area (quite unusual) we realign the ehealth.osm data to Bing. Bing imagery is the most used source of data in OSM for all that area, by far, and it seems, generally, quite well aligned.
- Next we download, in a different layer, the OSM data for the area covered by the ehealth.osm file, not doing any changes to the OSM data for the time being.
- We then correct the errors in the ehealth.osm file, like duplicated nodes, crossing ways and ways not joined to each other. Those ways that are wrong will be corrected or simply deleted and not imported.
- Now it's time to check which segments of the ehealth.osm file will have to be conflated with the OSM ones. When we find two overlapping roads from both layers, unless the quality of the OSM road is poor, we will keep the OSM road and delete the corresponding segments of the ehealth.osm file. We won't merge the eHealth segments with the OSM dataset yet. In case we find a segment that is more accurate in ehealth.osm than in the OSM database, we would substitute the OSM segment with the ehealth.osm one, but, preferably, maintaining the original way id, so we keep the history of the original way. We will bear in mind that many of the trunk and primary roads are already part of one or more route relations, that we would have to reconstruct eventually. If we have to replace an OSM segment with the eHealth one, we will have to do it later on, when merging the ehealth data with the OSM data.
- Once we have corrected the errors in the ehealth.osm file and deleted the duplicated segments, we will tag the ways according to the Nigeria Roads tagging rules, using the existing Bing imagery as background.
- Now we can merge the ehealth.osm layer with the OSM data layer and proceed to join and combine ways to finish the job.
- Before we consider the task finished, we will check once again for any errors and warnings in the merged layer, at least those related to roads. New errors that may appear are lack of bridges or fords in river crossings. There are also (only) 22 roads tagged as highway=road in the OSM for the whole Kaduna State, that have to be tagged accordingly.
- Last, we upload the data with the special OSM import account. In case we had divided the file in 2 or more pieces, we will proceed with the rest of them, one by one, until all are finished, and only then we will communicate that the chosen file has been imported. This will be done in Import_eHealth_Africa_Kaduna_State_Roads_Progress.
Reverse plan
In case of any trouble, JOSM reverter will be used.
Conflation
Conflation has already been explained in the workflow section above (steps 6 and 9).