Wikidata RDF database

From OpenStreetMap Wiki
Jump to: navigation, search

This page documents how to use an RDF database that contains both Wikidata and OpenStreetMap data, accessible with SPARQL queries.

HowTo-Video
All Wikidata SPARQL documentation is at Wikidata Query Help.

How OSM data is stored

All data is stored in a Triplestore as subject predicate object. statements. For example, a statement could be relation #123 has a tag "name" set to "value".. The subject (first) and predicate (second) part of the statement must always be a complete URI, e.g. <https://www.openstreetmap.org/way/42> (node #42), or <https://wiki.openstreetmap.org/wiki/Key:wikidata> (tag wikidata). To make the URI more readable, we use prefixes, e.g. osmway:42 and osmt:wikidata. The object (third) part of the statement can be either a value (string/number/boolean/geo coordinate/...), or, just like the first two parts, a URI. This way one statement's object could be another statement's subject. Each statement must end with a period, but if multiple statements have the same subject, we can separate them with a semicolon.

prefix osmnode: <https://www.openstreetmap.org/node/>
prefix osmway: <https://www.openstreetmap.org/way/>
prefix osmrel: <https://www.openstreetmap.org/relation/>
prefix osmt: <https://wiki.openstreetmap.org/wiki/Key:>
prefix osmm: <https://www.openstreetmap.org/meta/>

osmnode:1234 osmm:type      'n' ;
             osmm:version   42 ;
             osmm:loc       'Point(32.1 44.5)'^^geo:wktLiteral ;  # longitude/latitude
             osmt:name      'node name tag' ;
             osmt:name:en   'node name:en tag' ;
             osmt:wikipedia <https://en.wikipedia.org/wiki/Article_name> ;
             osmt:wikidata  wd:Q34 .

osmway:2345  osmm:type      'w' ;
             osmm:version   42 ;
             osmm:isClosed  true ; # is this way an area or a line?
             osmt:name      'way name tag' ;
             osmt:name:en   'way name:en tag' ;
             osmt:wikipedia <https://en.wikipedia.org/wiki/Article_name> ;
             osmt:wikidata  wd:Q34 .

osmrel:3456  osmm:type      'r' ;
             osmm:version   42 ;
             osmm:has       osmway:2345 ;  # relation contains a way with blank label
             osmm:has:_     osmnode:1234 ; # relation contains a node with a non-ascii label
             osmm:has:inner osmrel:4567 ;  # relation contains a relation labelled as "inner"
             osmt:name     'way name tag' ;
             osmt:name:en  'way name:en tag' ;
             osmt:wikipedia <https://en.wikipedia.org/wiki/Article_name> ;
             osmt:wikidata  wd:Q34 .

Simple Queries

Get started with this simple query to list OSM objects by a particular type or tag. That's somewhat comparable to the following Overpass Query.


prefix osmt: <https://wiki.openstreetmap.org/wiki/Key:> 
prefix osmm: <https://www.openstreetmap.org/meta/>  

# List all OSM objects with a place tag  

SELECT * WHERE {   
  # Limit to subjects that have an OSM type ('n', 'r', 'w').   
  # Replace ?osmType with a string 'r' to show only relations.   
  ?osmId osmm:type ?osmType .    
  
  # Limit to subjects that have an OSM tag `place`   
  # Replace ?place with a string 'city' to filter the tag value to `place=city`   
  ?osmId osmt:place ?place  . 

  # This will limit the results to places which do not have a `name:en` tag:
  # FILTER NOT EXISTS { ?osmId osmt:name:en ?nameen . }

} LIMIT 10
Run it

Quality Control Queries

Disambiguation Pages

prefix osmt: <https://wiki.openstreetmap.org/wiki/Key:>
prefix osmm: <https://www.openstreetmap.org/meta/>

SELECT ?osmId ?wdLabel ?osmType ?wd ?wpTag WHERE {
  # Limit to subjects that have an OSM type ('n', 'r', 'w').
  # Replace ?osmType with a string 'r' to show only relations.
  ?osmId osmm:type ?osmType .

  # Limit to subjects that have a tag called "wikidata"
  ?osmId osmt:wikidata ?wd .

  # Include Wikipedia tag if it exists
  OPTIONAL { ?osmId osmt:wikipedia ?wpTag . }
  
  # Optionally, find pl:* wikipedia tags (point to Polish wiki)
  # For performance, remove the "OPTIONAL {" and "}" part above
  #  FILTER( STRSTARTS(STR(?wpTag), 'https://pl.wikipedia')) .

  # Or, instead, only show Wikidata items that have a Polish WP article
  # You may also want to add ?article to the list of fields returns by SELECT statement
  #  ?article schema:about ?wd .
  #  ?article schema:isPartOf <https://pl.wikipedia.org/>.

  # Optionally, restrict OSM objects to those that have a specific tag (and value)
  #  ?osmId osmt:place 'city' .        # exact string matching
  #  ?osmId osmt:name:en ?nameen .     # unless filtered, matches all objects with this tag
  #  FILTER( regex(?nameen, "A.b") )    # filter name:en to match a regex. Not very efficient

  # ?wd must be "instance of" disambig, or instance of an item which is a subclass(es) of it.
  ?wd wdt:P31/wdt:P279* wd:Q4167410 .

  # Pick the first available language for the wikidata entry (creates ?wdLabel value)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,fr,it,pl,ru,es,sv,nl" . }
}
LIMIT 10
Run it

Find nodes located too far from Wikidata's

This query shows nodes whose location is more than 50km from where corresponding Wikidata item is located. See also distance function.
NOTE: Database only has location (osmm:loc) for the recently changed nodes. A full refresh is needed to regenerate older data.

prefix osmt: <https://wiki.openstreetmap.org/wiki/Key:>
prefix osmm: <https://www.openstreetmap.org/meta/>

SELECT ?osmId ?wd ?wdLabel ?dist WHERE {
  ?osmId osmm:type 'n' .
  ?osmId osmm:loc ?osmLoc .
  ?osmId osmt:wikidata ?wd .
  ?wd wdt:P625 ?wdLoc .
  BIND(geof:distance(?wdLoc, ?osmLoc) as ?dist) 
  FILTER(?dist > 50)
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,fr,it,pl,ru,es,sv,nl" . }
}
ORDER BY DESC(?dist)
LIMIT 10
Run it

Places without a translation on OSM or Wikidata

prefix osmt: <https://wiki.openstreetmap.org/wiki/Key:>
prefix osmm: <https://www.openstreetmap.org/meta/>

# Find all OSM objects with wikidata tag that do not have the name:en tag, and no English label on Wikidata item

SELECT ?osmId ?osmType ?place ?wd WHERE {
  ?osmId osmm:type ?osmType .
  ?osmId osmt:place ?place .
  ?osmId osmt:wikidata ?wd .
  FILTER NOT EXISTS { ?osmId osmt:name:en ?nameen . }

  OPTIONAL { ?wd rdfs:label ?label FILTER(lang(?label) = "en") }
  FILTER(!BOUND(?label))
}
LIMIT 10
Run it

Other quality control queries

Current limitations

  • Only includes OSM objects that have at least one tag or at least one member (for relations)
  • The OSM data only contains tags with only Latin letters, digits and symbols - : _
  • OSM geometry info is not imported, e.g. no center point or bounding box, except for osmm:isClosed (true/false) property for ways. Nodes with tags have an osmm:loc value, but it needs to be backfilled for nodes that haven't changed recently.