2015 Sierra Leone village mapping data
Setting up a local village mapping exercise using ODK and OSMAND in Sierra Leone
In January 2015, whilst visiting the Tonkolili district in Sierra Leone to observe and help with MSF’s Ebola operations, Ivan Gayton (MSF) began an initiative to map the district using only GPS-enabled phones.
Using 9 mapping teams (consisting of one motorcycle driver and one passenger, acting as the surveyor using their Android phone), they were able to map 91 villages within three days. In addition to location, the mapping teams collected information on population estimates, as well as chief and health worker contacts.
The data collection was facilitated by installing OpenDataKit (ODK) onto each phone, creating a basic form of questions and using this to record the information. In addition, OSMAND was installed to provide offline mapping to gather location information. More details on how this was done can be found below.
Overall, the surveying was a success. There are issues on quality control that are discussed later on in this wiki, but these can be easily resolved during data processing and cleaning. The next step is to get the relevant data onto OpenStreetMap; the process behind this is explained below.
If successful, it is hoped that the motorbike mapping initiative could be expanded beyond the Tonkolili district, and cover the rest of Sierra Leone and potentially further afield. This may be done as part of future Missing Maps endeavors.
Technical Set-Up for Data Collection
A major priority for mapping was to ensure that it was low cost and repeatable. Local people who had GPS-enabled phones were asked to help, and the initial test for selecting surveyors was to check that that they owned the phone and could operate it.
To install the software and maps onto each phone, a prepaid SIM card was bought from Airtel. $100 of credit was put on the sim and a data package of 7GB was installed on the phone. The Application Programming Kits for ODK and OSMAND were downloaded. For OSMAND, F-droid was chosen as it is free but also doesn't prompt to buy the paid version (we're all for paying for OSMAND and do so on our own devices, but it's not helpful to prompt a bunch of African villagers to pay for software by credit card).
One ‘master’ phone (with the data packaged SIM) was used to download the map files for Sierra Leone and the World Basemap (the latter is useful to avoid download prompts). The map files (in OBF format) were then copied from the phone to the laptop using a USB cable. Using the ‘master’ phone, a mobile wifi hotspot was set up providing data access to the laptop and other phones used.
The survey was created in XLSForm format, uploaded onto Formhub and then downloaded onto the ‘master’ phone. The XML was edited to tweak the GPS accuracy threshold; the default is 4 metres, which leads to a lot of standing around and is not needed for village mapping. The accuracy was set to 8 metres by inserting a snippet into the xform:
<input ref="/Magburaka_area_village_mapping_V0-1-6/LOCATION" accuracyThreshold="8">
People Set-Up for Data Collection
Once a few people were set up with the phone, and the most tech-savvy few had received lots of instruction on how to set up the phones, the software started propagating naturally. A lot of people showed up looking for work, and when we started training people to use the survey software and OSMAND, the audience swelled rather quickly. Within a few hours, there were way too many phones around to install anything, so people started sharing the 5-file package by bluetooth from one phone to another, and setting up the software themselves (this actually really started happening late in the day, and the following morning a lot of phones showed up with the software running).
By the morning of day two, there were far more people set up with working phones than could be organized. Eight surveyors were picked, plus a supervisor, and 9 motorcycles with drivers making the final mapping team.
The surveyors were first made to complete a few fake surveys to work out a few of the obvious kinks. People are clever and can usually work the software as long as nothing goes wrong, but the moment there's a distraction (phone calls being a big one) or issue, they can become quickly confused or lost. For example, people confuse the ‘Back’ button with the ‘Home’ button or don't realize that they have to hit ‘Save’. They may also not understand the difference between scrolling back and forth in the survey versus using the Back button to get out of a survey, and so forth. In the end, training is essential; fortunately, by the time there's a critical mass of people who understand a fair bit of what's going on they start helping each other (albeit in a bit of a shouty fashion). It would probably not be a good idea to let them do any real surveying until they've gotten through a good number of practice surveys.
Dealing with ethics head-on
Before sending them out, the ambassadorial nature of their duties was emphasized. Surveyors were told not to shout at villagers, imperiously demanding information from village chiefs, or otherwise alienating people who are already unsettled by or suspicious of Ebola operations. The mapping teams were asked to be respectful of the villagers and chiefs, to not shout, but rather explain what they were doing and respond to any suspicion or hostility by being conciliatory or, if necessary, simply leaving.
Other technical issues
In addition to the software, hardware constraints also had to be considered, and in particular the battery life of the phones. To keep the phones charged,a pile of cheap 12-volt chargers were bought and wired to the motorcycle battery by a local electrician (this idea in itself was a big hit, as a lot of Sierra Leoneans have phones that they have trouble keeping charged). With a wire properly attached to the battery, run up along the body and zip-tied to the handlebars, the charger was robust. Without the electrician's work, the chargers were going to be hooked up in a very fragile fashion and probably would have been broken within hours.
Issues during collection
One of the surveyors collected all information excepting the actual GPS points (he excused himself by saying that he was having difficulty with his phone, and was confused when asked what value he figured there would be in mapping data without actual locations). Another had trouble saving forms, and therefore overwrote the same survey several times, replacing the GPS point each time. But by and large people got it right away, and even the aforementioned mistakes were quickly corrected.
Data Collection Output
After a week of mapping, a total of around 740 villages were mapped (give or take a few due to duplication). The forms were consolidated into one spreadsheet file, ready to be processed for updating.
Initial Data Cleaning and Processing
Initial data cleaning has been conducted on the original spreadsheet. This included looking for:
- No entries / Empty cells
- E.g. surveys that have been started accidentally and thus have no actual data.
- Consistency in spelling, typos, and formatting
- E.g. Tonkolili vs. tonkolili, Kholifa Rowala vs Kholifa Rowalla
- Consistency in names and data
- E.g. Tonkolili vs Tonkolili District vs Poor Tonkolili
The spreadsheet was then further refined, deleting data columns not required or relevant to OpenStreetMap, including healthcare worker name, nearest healthcare centre, or whether the village has a school or not.
The spreadsheet was then imported to QGIS for further processing. The main aim of the processing was to separate the surveyed villages into those that already exist on OSM and those that are new node points.
This was conducted by using a 500 m buffer around existing OSM nodes (tagged as villages, hamlets and towns) and identifying villages within the survey data that intersect with the buffer. Those that intersected were extracted into a new shapefile/csv data layer. Those that did not intersect with the buffer were also then extracted (using the difference tool within QGIS) into a new shapefile/csv data layer.
The processes of initial data cleaning and validating are described in the following wikis:
Secondary Data Cleaning
The two final shapefile/csv data layers required one final step of cleaning - potentially the most important step in order to reduce errors within the data. As spatial data, it is much easier to spot errors within the data when plotted on a map then in a CSV. As a result, a secondary data cleaning process has been established that looks for errors or duplications within the data within QGIS. The process is as follows:
Focusing on one of the shapefiles, visually scan the data to check for any POIs sitting very close to one another (or even on top!). Investigate the POIs in question, identify whether they are duplications (same names, alt names) separate villages using satellite and aerial imagery to help distinguish the spatial boundaries of each villages (in Sierra Leone, neighbouring villages can be within metres of each other) errors (check the rest of the data, does it have anything missing?) The process involves using grid squares (akin to the HOT OSM Tasking Manager) to split up areas for checking. A separate CSV file was edited for each data layer to avoid overwriting/losing data, and allowing users to retrace to the previous dataset. Both shapefiles were cleaned in this way, with the following assumptions / instructions:
- Don’t delete the data if you can’t come to an educated guess; - Often Sierra Leone villages are within metres of each other; - Villages may also be named something similar to one another - or with an added 1 or 2 at the end; - Don’t change the data e.g. change the names or spelling - Rely on the ground data being more accurate than the imagery; but use your common sense. Remember the imagery isn’t necessarily up to date - both Bing and Google have some images from 2010!
Outputs ready for OSM update
The next step is to “upload” the two sets of data onto OSM. To do this, we are proposing two different methods, one for each set of data. We would appreciate input from the HOT and OSM community on the proposoed methods, particularly in regards to providing alternative 'automated' solutions.
For new villages
For new villages, we will open the existing shapefile in OSM using the OpenData extension. The shapefile will then be saved as an .OSM file ready to be merged with OSM.
Due to the strict import guidelines within OSM, we would like to have input/help from the community to ensure that the data import is conducted acceptably. We therefore have the following questions:
- Would it be better to use OGR2SHP to convert the file rather than the OpenData plugin? If so, would there be some guidance the community could provide to ensure we use the tool appropriately? - Could we also have guidance on how to use the test database?
- From your experience, is this the simplest - and efficient - method to add new data to OSM from a CSV file?
For existing villages
At the moment, we plan to manually update existing villages by editing the existing nodes within OSM and using the CSV/SHP as a reference sheet for the data (i.e. the ‘Field Papers’ method seen in the Lubumbashi mapping). The user will identify the existing node and edit the existing tags (and add new tags) as required; this will happen for each existing village.
As this is likely to be relatively labour and time intensive, it is proposed that this is made a task for a future mapathon or an alternative method of updating found.
Data source, accuracy and licensing: Data sourced from GPS mapping conducted as explained above. No licensing issues.
Import/Software you plan to use. : Help sought from community to answer this.
Exactly how data will be translated from another format into OSM format: Help sought from community to answer this.
How the resulting data will look.: Point data (new villages).
Exact tags being used.: Name, Alt. Name, Chiefdom, District, Ward, Constituency, No. of Households
Link to sample data imported on the test database.: TBC
User name of the account performing the import, and other details of how the changesets will be tagged: TBC
Link to example data imported on the live database.: TBC
It is likely that this project may scale quickly to cover the whole of Sierra Leone and beyond. As a result, we will be dealing with datasets much larger than this. In addition, these datasets are likely to 'flow in', which requires an active process of cleaning, processing and updating. If you have any thoughts on how our current process can be improved to help cope with the volume of data we are likely to receive, it would be greatly appreciated.