This article is a stub. You can help OpenStreetMap by expanding it.

This page aims to be a brainstorming space for an opinionated distribution of Planet.osm.

Why

The openness, freedom and community focus of OSM are its strengths but they can make life harder for data consumers:

often unexperienced (and sometimes experienced) mappers involuntarily introduce errors in the map
it's easy to sneak in vandalic edits and while most of the time other mappers quickly find and fix these edits, the exceptions can be very problematic
Deprecated features usually are not immediately mass-updated to their suggested replacement, forcing consumers to check multiple tags to find the same data
OSM's non-stringent schema can make life harder for consumers, forcing them to check multiple non-documented and often non-homogeneous tags
Good practice rules are suggested but not always enforced and this can lead to non-homogeneous data

It would be useful to find a way to simplify data consumers' life by making available a distribution of the data checked, cleaned, schema-normalised (and possibly enhanced) through opinionated filters and actions to the data.

Who

This proposal was born from this OSM community thread and takes inspiration from Meta Daylight Map Distribution' Planet file. Given that having a clean and safe dataset out of OSM is in the best interest of not only Meta but of the full OSM Community and all OSM consumers, this proposal aims to explore the feasibility, opportunity and obstacles of an OSM in-house opinionated distribution of the data where all stakeholders of such a project could join forces.

What

Brainstorming of possible activities to execute on the data:

Wrong element removal

removing unintentionally broken or intentionally vandalized elements
- identified based on name/description, geometry and latest editor track record
- Daylight Map Distribution does a big job in this aspect (implementation details described here)
- Some research resources and datasets available:
  - Nicolas Tempelmeier; Elena Demidova. “Attention-Based Vandalism Detection in OpenStreetMap”.
  - Yinxiao Li; Jennings Anderson; Yiqi Niu. “Vandalism Detection in OpenStreetMap via User Embeddings”.
  - Nicolas Tempelmeier; Elena Demidova. “Ovid: A Machine Learning Approach for Automated Vandalism Detection in OpenStreetMap”.
  - OSM Name Vandalism Corpus (1k vandalic + 1M non-vandalic verified changeset comments) by Meta
remove elements where the content of the source tag suggests usage of invalid sources with non-compatible licenses

Element editing to fixup tagging errors

Remove tags with values that clearly are impossible (such as if maximum highway speed in a country is 120 km/h, have 300 km/h in a highway=residential likely to be extra 0)
remove broken links in website=, wikipedia=, wikidata=* and wikimedia_commons=*
remove wikidata links that are clearly wrong because they point to a person (the user likely used wikidata=* instead of subject:wikidata=* or something similar), a tree species, …
fix coastlines to prevent the “flooding” effect when they get broken
- Evaluate using OSMCoastline

Schema normalization

Move values from officially deprecated keys or tags to the respective substitution tag
- for example, manufacturer:type=foo=>model=foo and emergency=aed=>emergency=defibrillator

Data enhancement

restore elements removed by changesets highly likely to be vandalism
- an algorithm would be needed to calculate how long a changeset will be ignored before it's applied
- very complex task. This would mean selectively reverting changesets; tools like osmium apply-changes would help, but it would still be complex and computationally expensive
integration with OSM's schema of data from Wikidata in elements where wikidata=* is available (Wikidata entities are CC-0 licensed, compatible with ODBL)
- compilation of missing name=* languages (name:*=*) with internationalized labels from Wikidata
  - similar to what Mapbox already does to its map data
- compilation of missing image=* or wikimedia_commons=* with Wikidata's images and/or category links to Wikimedia Commons

How

Most basic rule based checks could be executed with libraries like Osmium.

For more efficient handling of the computing load a parallel MapReduce approach could be more appropriate, for example with libraries for Apache Spark such as [Atlas](https://github.com/osmlab/atlas).

For some of the above tasks rule-based elaboration will not be enough and AI powered tools will be needed (machine learning powered classification, NLP models, ...). Daylight Map Distribution has publicly described some details of its ML-powered pipeline for vandalism prevention (see its wiki page for details), given Meta invlovement in OSMF it would be great to see its participation in this project.

For tasks that require the intersection of OSM data with Wikidata or other resources, other libraries will need to be used (hypothesis: wikibrain). In general, Wikidata data can be accessed in one of three ways:

Download a dump of the DB and do anything you want with it ^[1]
- high client cost (Requires a lot of space, more than OSM), high availability, high bandwidth (once downloaded it will be extremely fast)
Wikidata Query Service (WDQS), Wikidata's own SPARQL endpoint^[2]
- very powerful query language, low client cost (no need to download the full DB), high server cost, low availability, low bandwidth (unfeasable for very big quantities of data, pagination will be needed)
Linked Data Fragments (LDF) endpoint ^[3] ^[4]
- somewhere in between the two options: low client cost (no need to download the full dump of the DB), extremely basic query language, high bandwidth

A Proof of Concept implementation can be found at https://github.com/Danysan1/opinionated-planet .

When

TBD

Where

OSM infrastructure, details TBD

Notes

[1] wikidata:Wikidata:Database download

[2] wikidata:Wikidata:SPARQL query service

[3] tafragments.org

[4] wikidata:Wikidata:Data access#Linked_Data_Fragments_endpoint

[1]

[2]

[3]

[4]

User:Danysan/Sandbox/Opinionated Planet.osm

Contents

Why

Who

What

Wrong element removal

Element editing to fixup tagging errors

Schema normalization

Data enhancement

How

When

Where

Notes

Navigation menu

User:Danysan/Sandbox/Opinionated Planet.osm

Why

Who

What

Wrong element removal

Element editing to fixup tagging errors

Schema normalization

Data enhancement

How

When

Where

Notes

Navigation menu

Search