2015 Sierra Leone village mapping data - data validation

From OpenStreetMap Wiki
Jump to: navigation, search

Work so far

The data you’ll be working with has undergone various steps of processing and cleaning. Firstly, the original CSV file was cleaned to ensure consistency in spelling and general data entry, and then was refined to remove several bits of data not relevant to openstreetmap.

The CSV was then converted into a shapefile through importing it into QGIS. Each village is represented by a data point.

Using GIS, the village data was sorted into two separate shapefiles containing: 1) villages new to openstreetmap; and 2) villages already existing within openstreetmap. This was carried out by:

● using the QuickOSM plugin to download current OSM data for villages, hamlets and towns.
● running a 200m buffer around each of the OSM data points.
● using the ‘select by location’ tool to select data points within the village survey data that intersected with the OSM data points.
● extracting these points to form the ‘existing villages’ shapefile.
● using the ‘difference’ tool to effectively minus the ‘existing villages’ shapefile from the village survey data to create the ‘new villages shapefile’.
● more information on precisely how this was done (step by step) can be found here: https://wiki.openstreetmap.org/wiki/2015_Sierra_Leone_village_mapping_data_-_data_processing
  • Note: during the processing, the data was reprojected to a different coordinate system (AZM). This was to allow the buffer to be conducted in metres (rather than degrees which QGIS will do when the CRS is WGS84). A folder containing the AZM files is available in the folder, however all required data has now been reprojected back into WGS84. The buffers themselves remain in AZM, but are not needed for the rest of the task.

During this processing, it was visible that there are errors within the data (e.g. duplication of data points). The data therefore requires a secondary step of cleaning and validation - as explained by the task below.

The data has been split into gridded squares (akin to the HOT OSM tasking manager). A pre-scan of the grid has been done, and squares that require further cleaning have been identified.

The task

The task is to investigate and clean the two created shapefiles using the files provided.

The excel spreadsheet, Grids_to_check , contains a list of squares that require further cleaning.

It is recommended that you focus on cleaning one of the files, i.e. new villages or existing villages. Please also make sure you edit the CSV file, and not the shapefile (the latter tends to be extremely tedious to do). The recommended way to clean the data is as follows:

1) Open up ‘Grids_to_check’ spreadsheet within Googlesheets (this is important to ensure others can see what you’re doing!). Choose a set of 10 squares that are currently unassigned and incomplete. Add your name against these squares and take note of which ones you’ll be working on.
2) Within the validation folder, download the ‘Data_validation_grid.qgs’. Open up QGIS and open up the folder.
- Download and add the data shown on the screenshot.
- You will also need to install the OpenLayers plugin to add the ‘Bing Aerial’ maps.
3) Once you’ve got your data ready, make sure it is displayed in an easy way to distinguish between your shapefile (either new or already) and the pre-existing OSM data. For example, all OSM data is coloured red, new villages are yellow, and existing villages are purple.
Note: Although you will be concentrating on either new or existing villages, it may still be worth while have the other shapefile viewable as above to help with any comparisons - just don’t try and edit both.'
4) Depending on whether you are assessing new or existing villages, open up the respective CSV file found within the data validation folder (i.e. MGB_village_data_new_WGS84.csv or MGB_village_data_alreadyinOSM.csv). This is what you will be editing rather than the shapefile itself.

The rest of the task is made from simple repeatable steps:

5) Navigate to your first square. Note, if the labels of grid are not viewable, right-click on the grid shapefile, navigate to properties and ensure that the labels are displayed.
6) Analyse the data. Why has this square been flagged? What’s wrong with the data? What do I need to do? Look at the data!
A few examples of duplications are included below. You’re mainly looking out for duplication within the data, or in some respects irregular additions (i.e. lone villages in the middle of forests!). Be aware however that the aerial imagery may be outdated and thus not necessary reflect what’s now on the ground. You will need to use common sense / best guess.

Duplication Examples

a. Firstly, visualise check the data. Are the points in the same village or are they simply just close to one another. Unlike what we find most in the U.K, separate villages may existing in very close proximity to each other (e.g. 10 metres away!).
b. Secondly, using the ‘i’ tool within QGIS, check the village names and details.
i. If the names are the same but there is a separation between the village/household, you will need to choose which point best reflects the village.
ii. If the names are different but they are within the same village, you will have to use your best guess. Using other data available for each village e.g. number of households may help you make an educated guess.
c. To delete the data, identify the data point within the CSV file you have open (i.e.MGB_village_data_alreadyinOSM.csv), and edit the data point(s) by updating or deleting the row.
- You may want to use CTRL-F and search the data by name. Please note, there may be more than one village with the same name so make sure you have the right one by checking against other values within the data (e.g. number of households).


a. Visually check the data. Explore neighbouring area. Download other externally hosted mapping layers (e.g. google streets) to find evidence of the village existing.
b. Make a best educated guess - leave the data in as a default.

Repeat the above process for each square you’re responsible for. Once you’ve completed each square, make sure you sign it off as complete on the Grids_to_check document.

Keep going with a set of new squares until they are all complete.

Next steps

The next step will be to update OSM with the two CSVs. Due to the nature of the data, the ‘new villages’ should be a simple case of ‘merging’ the current OSM data with an .OSM file version of the updated CSV. For the ‘existing villages’, it may require the manual updating of the OSM data within ID or JOSM. Stand by for further instructions.