Microgrants/Microgrants 2020/Proposal/Road Completion project/Report

From OpenStreetMap Wiki
Jump to navigation Jump to search

Final Report for the Road Completion project

Status and Report Type

type = final status = submitted

This is the final report for the Road Completion project. Considering the limited amout of time (10 working days), I didn't write an interim report.

Methods and Activities

Tell us about what happened during the planning, execution and follow-up stages of your project. Please share stories of both your triumphs and your challenges.

The Road Completion project is on OpenStreetMap Belgium mind since quite a while. Everything started with a diary post written in 2016 by Joost Schouppe, one of our board member.
Ben Abelshausen, one of our board member, presented the idea at FOSS4G Belgium (2017) and State of the Map (2018) conferences. The main goal is to have "a verified, quality checked and complete road network in Belgium in OpenStreetMap in a sustainable way" by comparing the official road network datasets with the road network in OpenStreetMap. Once the road network is (close to) complete in OpenStreetMap, we were planning to create a partnership with the official data sources to improve both our data and their.

In 2018, we convinced Brussels Region administration to invest some money to build a prototype for the Road Completion project during Open Summer of Code (organized by Open Knowledge Belgium). A prototype for the data process was built but also a user-interface to display the data process result. That allowed us to validate the idea we had for the data process, based on the work done by Matt Greene for Mapbox.

Unfortunately, the data process required quite heavy processing power and took quite a long time to run and we didn't have the time to keep maintaining it.

In 2020, when the OSMF opened the microgrant 2020, we saw the opportunity to finally revive the Road Completion.
The goal was to run the data process for the 3 datasets available in Belgium and by doing so making the data process generic enough so it could be run for any road network datasets in the world. Based on the work done during Open Summer of Code, we knew that the data process was working, but needed to be improved. We also decided to focus on just the data process and use already existing tools for the mapping part. It doesn't make much sense to spend time building a new tool if we can use already existing tools. We decided to go for MapRoulette.
The prototype data process took hours to process data, my main focus was to decrease the time as much as possible but also to try to make it run using GitHub Actions so we don't have to manage a server for that data process.
And it worked, the data process is now optimized (for example, data process for Flanders, Belgium that took several hours to run now only takes 30 minutes to run). The process uses the power of GitHub Actions to run when there is an update of the GitHub repository but also based on a schedule (once a week).

Now that the data process is running and is updating itself on a regular basis, we have to map the "missing" roads in OpenStreetMap. A MapRoulette challenge was created for each datasets and the OpenStreetMap Belgium community took over to map the missing roads. Each missing road is a task in the MapRoulette challenge (road geometry + road properties).

Outcome

What is the most important and valuable result of your project? What did it change, solve or accomplish? If the goal was a mapping goal, please provide before and after images. If this was a software project, please provide a link to the project.

Our data process is now available on GitHub and anyone can use it to compare any road network datasets to the OpenStreetMap road network.
Since the goal is to make sure we have all the roads in OpenStreetMap, the most valuable result is the file containing the difference between the official datasets and OpenStreetMap. The difference is available as a GeoJSON file (example: Flanders, Belgium) and a MapRoulette challenge is directly linked to that GeoJSON file so we can update the challenge really easily. The administration of Flanders, Belgium already showed some interest to know the abnormalities we find in their datasets (roads that do not exist anymore, roads that are missing in their datasets, ...).

Link to the GitHub repository: https://github.com/osmbe/road-completion
Link to a MapRoulette challenge: https://maproulette.org/browse/challenges/14645

Detail Report

Please report on your original project targets, use the below table to:

  • List each of your original targets from your project plan.
  • List the actual outcome that was achieved.
  • Explain how your outcome compares with the original target. Did you reach your targets? Why or why not?
Target outcome Achieved outcome Explanation
Build a generic road network comparison tool Tool available for any road network datasets Since we have 3 official datasets to cover whole Belgium (1 per region), I can already confirm that the tool is generic enough to work on any datasets. The 3 official datasets for Belgium have indeed a completely different structure.
Source code is available in our GitHub repository : https://github.com/osmbe/road-completion
Compare OSM road network to Brussels official data Less than 100 "missing" roads/paths/...
MapRoulette challenge created.
Some false positive (outdated data is the official dataset), the other roads have meanwhile been fixed.
Documentation, process, and result are available in our GitHub repository : https://github.com/osmbe/road-completion/tree/master/data/belgium/brussels
Compare OSM road network to Flanders official data Around 9000 "missing" roads/paths/...
MapRoulette challenge created.
Work in progress!
Documentation, process, and result are available in our GitHub repository : https://github.com/osmbe/road-completion/tree/master/data/belgium/flanders
Compare OSM road network to Wallonia official data Around 3000 "missing" roads/paths/...
MapRoulette challenge created.
Work in progress!
Documentation, process, and result are available in our GitHub repository : https://github.com/osmbe/road-completion/tree/master/data/belgium/wallonia

Learning

Projects do not always go according to plan. Sharing what you learned can help you and others plan similar projects in the future. Help the movement learn from your experience by answering the following questions:

  • What worked well?
  • What did not work so well?
  • What would you do differently next time?

To be honest, I'm really happy with the current status of the data process.
It's not perfect (currently only compares geometry, not properties) but I was quite happy when I realized that my process was so much faster that the prototype built earlier.
I also consider as an achievement to make the process run using GitHub Actions : no server to maintain (and pay for), automatic update every time I update the process, scheduled updates, and result files automatically uploaded in the repository.

I'm also really happy that OpenStreetMap Belgium community jumped on the MapRoulette challenges. The process is automated and I'm confident the missing roads will be updated quite quickly.

On a more negative note, playing with external datasets is not always easy, you need to find where to download the data, find the documentation (if there is any), and then understand the data. We are quite lucky in Belgium since each region has its own open-data portal where you can download the datasets and find documentation.
That being said 1 of the 3 regions only offers data download behind a request form (and manual validation) so we could only automate the full process for 2 regions out of 3.

The quality of the data is also "fluctuating".
We had to filter all the roads without a name from the official datasets to have a manageable amount of roads to check (for instance, 32000 "roads" missing in Wallonia compared to 3000 if we only take the roads that have a name).
Another issue is that some driveways have the name of the main street (and so, appears as missing road in the process).
We're lucky to have qualitative open-data available in Belgium but every datasets need to be checked carefully before being used.

Grant funds used

Please describe how much grant money you spent for approved expenses, and tell us what you spent it on.

The 5000 EUR from the microgrant have been used for 10 days of work to build the data process, run it for the 3 Belgian regions, creating the 3 MapRoulette challenges, and updating the documentation.