Import eHealth Africa Kaduna State Roads Workflow
In this wiki we will:
- Present the import rules in OSM, then the data itself, and the tools we will use to make the import.
- Describe step-by-step a workflow to make the import, with screenshots and tips.
- Last but not least, show the final steps about how to upload the eHealth Africa data, including the required changeset tags, with your OSM import account.
Data and Tools for the import
eHealth Data Stats
The original file, once converted to OSM format, consists of 4,585,146 of objects, of which 3,993,771 are nodes and 591,375 are ways (587,582 are unpaved, 3,793 are paved and 1 in construction). The total lenght is 83,370 km.
Quality of the eHealth Data
- There are gaps in roads that would have to be filled manually with imagery. One example are bridges, that are in many cases absent in the original data.
- Some roads (mostly paths) aren't apparent when checking with Bing imagery, so some of them shouldn't be imported (we delete them from the file). The same happens with some ways inside build-up areas.
- Most ways are divided in segments that have to be joined. This will be assisted with an awk script, that will join most of the segments and make the process faster.
- Some ways (mainly services and paths, but not only) are traced over buildings, so they have to be corrected.
As of 11th August 2014, except for an area of about 50 x 100 km in the North East, a similar area in the South plus the capital city, most of Kaduna State - around 75% - has few roads mapped in OSM (see next picture).
This data will be processed, previously joining many segments with a script, to make the import easier and fasted, and then they will be tagged manually against Bing imagery.
We'll use JOSM with some plugins and also one ad-hoc awk script to join road segments.
It's not an easy import. Two main issues to consider:
a. Roads have to be tagged and corrected manually, and sometimes split and combined too. We are talking about nearly 600,000 segments in the original file!
b. We have to conflate them with the existing ones.
To be able to handle the import of such a massive amount of data, it has been divided in 176 files of less than 4,000 segments each.
The import should be carried out by several experienced mappers, each with a specific OSM import account. Suggestion: Create accounts eHealthKDroads_import01, eHealthKDroads_import02, eHealthKDroads_import03...
We would start with the empty or almost empty areas, leaving the ones that are more heavily mapped for the end.
During all the import process we will use heavily the JOSM filters. Among the filters, we should have the following ones ready:
1. type:node untagged
The workflow would be as follows:
1) As the files are still quite big, I would strongly advice to pick one of the 176 files to work with it (we will call it ehealth.osm from now on) and cut it in 3 or 4 pieces. To do that, open the chosen file in JOSM, make sure to set a filter 2 (very important to get rid of all the nodes) and select an area of the data. Copy (CTRL+C), open a new layer (CTRL+N), go to that layer and paste the copied data in this new layer (CTRL+V). Save the file as ehealth01.osm (CTRL+SHIFT+S) or whatever name you like in a folder of your choice in your computer, and then delete that layer from JOSM. Return to the ehealth.osm layer and delete the selection (ALT+E -> Delete) (be careful not to click on the working area before you delete the data). Now proceed cutting it further, to produce ehealth02.osm, ehealth03.osm, etc.
2) We transform each file using the joinRoadSegments0.3.awk script, so we will have most of the segments correctly joined in their respective ways, and therefore we will save lot of time.
3) We open the Bing imagery and, unless we have enough GPS traces for the area (quite unusual) we realign the ehealth.osm data to Bing. Bing imagery is the most used source of data, by far, in OSM for all that area (surely for most of the highways), and it seems, generally, quite well aligned.
To realign the imagery, disable all filters, select all objects, and drag all of them until they are well aligned with the Bing imagery. This is quite tricky. Look at some ways of reference and move the objects until they look well aligned against all those points.
When you move the objects, a warning window will appear:
Just hit the Move them button and click on the working area to dissable the selection.
4) Next we download, in a different layer, the OSM data for the area covered by the ehealth.osm file, not doing any changes to the OSM data for the time being. Let's call this layer OSMlayer.osm
5) We then correct the errors in the ehealth.osm file, like duplicated nodes, crossing ways and ways not joined to each other. For this purpose, we first have to make sure that we have disabled all JOSM filters.
We run the JOSM validator. Common errors and warnings are the following:
a) Highway duplicated nodes - Duplicated nodes: Click on the "Highway duplicated nodes - Duplicated nodes" and then on the Fix button at the botton of the Validator window.
b) Crossing ways: These can be two ways crossing, or a way that is not joined to another segment. We have to fix this by hand.
c) Way end node near other highway: This type of error indicates that, most probably, one road is not joined to another, when they were meant to be joined. It's also very important to fix, and must be fixed manually too.
6) Now it's time to check which segments of the ehealth.osm file will have to be conflated with the OSM ones. For this purpose, we set filters 1 and 3, so we will see only the highways and no untagged nodes (we will use filter 2 instead of filter 1 if we don't want to see any node at all):
When we find two overlapping roads from both layers, unless the quality of the OSM road is poor, we will keep the OSM road and delete the corresponding segments of the ehealth.osm file. In the following screenshot, we can see that the eHealth highway (highlighted in red) overlaps an OSM road (grey color) for part of its lenght:
In our example, if we zoom in, we can see that the eHealth data, although good, is not as accurate as the OSM counterpart:
So we cut the segment of the eHealth Road that overlaps the OSM road, and we delete it, leaving only the OSM road:
In the following screenshot, you can see in red colour many segments to be deleted:
We won't merge the eHealth segments with the OSM dataset yet. In case we find a segment that is more accurate in ehealth.osm than in OSMlayer.osm, we would substitute the OSM segment with the ehealth.osm one, but, better, maintaining the original way id, so we keep the history of the original way. To do this, we would use the Replace Geometry (Ctrl+Shift+G) tool (you need to install the utilsplugin2 in JOSM).
We will bear in mind that many of the trunk and primary roads are already part of one or more route relations, that we would have to reconstruct eventually. If we have to replace an OSM segment with the eHealth one, we will have to do it later on (when merging the ehealth data with the OSM data). We go on dealing with all duplicated segments, until we finish with all of them.
7) Once we have corrected the errors in the ehealth.osm file and deleted the duplicated segments, we will proceed to tag the ways according to the Nigeria Roads tagging rules  and the Highway Tag Africa : unclassified, tertiary and secondary in rural areas, residential and service in build-up areas, towns and cities, etc. You need to disable filter 1, but keep filter 3. Among the tools we will use are join (j), merge (m), cut (p) and unglue (g).
In this link you have three files as an example of joining and tagging segments. They are ehealth.osm (the original file), ehealthJoinedSegments.osm (the resulting file when we run the awk script against the ehealth.osm file) and ehealthPartiallyProcessed.osm (a file that has part of the roads already tagged and modified). Please open the three files in JOSM together with the OSM data for that area, so you can see how to realign the data with the Bing imagery, and how the ways are tagged, which ones are corrected, deleted, etc.
Here you can see how it looks this tagging task:
IMPORTANT: Those ways that are wrong will be corrected or simply deleted and not imported.
a) There are gaps in roads that would have to be filled manually with imagery. One example are bridges and fords, that are in many cases absent in the original data.
b) Some roads (mostly paths) aren't apparent when checking with Bing imagery, so some of them shouldn't be imported (we delete them from the file). The same happens with some ways inside build-up areas.
c) Some ways (mainly services and paths, but not only) are traced over buildings, so they have to be corrected.
The following two screenshots show segments that have to be deleted:
Finally, if you find an area of clouds or Low Resolution Bing imagery, do your best.
All this first 7 steps are the first part of the task. If you can't continue with it for the moment, you can save the ehealth.osm file with a name like ehealthModified.osm, and continue later or next day. We don't care about the OSM data layer, as we haven't done any change to it, so you can safely delete it.
8) Now it's the moment to merge the ehealth.osm layer with the OSMlayer.osm one. But first, we delete the OSM data layer and we download it again. The reason for this is that we have spent long time with the 7 first steps, and this way we minimize the possibilities of conflicts.
9) Second, we will disable all filters and we will select all objects in the ehealth.osm file (CTRL+A), and write down the number of objects (ways and nodes) you have in the Selection window:
If you don't have the Selection window active, activate it by clicking on this button:
Write the number of nodes and ways in the Comments column for that subset file in the Progress wiki.
We will use these numbers to keep the statistics of the whole import.
10) Now we can merge the ehealth.osm layer with the OSMlayer.osm layer:
... and proceed to join and combine ways to finish the job. We should do this in a row, so be sure you have time enough to finish with it.
11) Before we consider the task finished, we will check once again for any errors and warnings in the merged layer, at least those related to roads. New errors that may appear are lack of bridges or fords in river crossings. There are also (only) 22 roads tagged as highway=road in the OSM for the whole Kaduna State, that have to be tagged accordingly.
12) Upload the data with the import account.
We will use the following changeset tags:
- comment=eHealth Africa Foundation Kaduna State Roads import. Subset #XXX
Where XXX will be substituted by the number of the file that is being imported (001 to 176)
Normally, it will be a very big number of objects for what you are used to, so it will take long, like half an hour, an hour or even more, depending on the net speed and OSM server load at that moment. Please, be patient!
13) Mark the file as imported in the progress wiki.