Wikidata+OSM SPARQL query service

From OpenStreetMap Wiki
Jump to: navigation, search
See short tutorial video
About this image

Short video on getting started

This page documents how to improve OSM data by cross-referencing it with Wikidata. The service is an RDF database that contains both Wikidata and OpenStreetMap data, accessible with SPARQL queries. All Wikidata SPARQL documentation is at Wikidata Query Help.

See the short getting-started video (right) for basic concepts

Simple Queries

Get started with this simple query to list OSM objects by a particular type or tag. (Compare it with a similar Overpass Query).

# List all OSM objects with a place tag  

  # Limit to subjects that have an OSM type ('n', 'r', 'w').   
  # Replace ?osmType with a string 'r' to show only relations.   
  ?osmId osmm:type ?osmType .    
  # Limit to subjects that have an OSM tag `place`   
  # Replace ?place with a string 'city' to filter the tag value to `place=city`   
  ?osmId osmt:place ?place  . 

  # This will limit the results to places which do not have a `name:en` tag:
  # FILTER NOT EXISTS { ?osmId osmt:name:en ?nameen . }

} LIMIT 10
Run it

How OSM data is stored

All data is stored in a Triplestore as subject predicate object. statements. For example, a statement relation #123 has a tag "name" set to "value" can be represented as subject (relation #123), predicate (a tag "name"), and object ("value"). Both the subject and the predicate parts of the statement must always be a complete URI, e.g. <> (node #42), or <> (tag wikidata). To make the URI more readable, we use prefixes to shorten it, e.g. osmway:42 and osmt:wikidata. The object part of the statement can be either a value (string/number/boolean/geo coordinate/...), or, just like the first two parts, a URI. This way one statement's object could be another statement's subject, creating a linked graph. In SPARQL, each statement must end with a period, but if multiple statements have the same subject, we can separate them with a semicolon. The prefixes are defined in the engine, but could be specified manually for clarity.

Note: Both Wikidata and OSM data is stored together, in a giant virtual "subject/predicate/object" table. So the subject could be osmnode:1234 and wd:Q42. The easiest way to filter just the OSM objects is to use osmm:type predicate - all OSM subjects will have one.

prefix osmnode: <>
prefix osmway: <>
prefix osmrel: <>
prefix osmt: <>
prefix osmm: <>

osmnode:1234 osmm:type      'n' ;
             osmm:version   42 ;
             osmm:loc       'Point(32.1 44.5)'^^geo:wktLiteral ;  # longitude/latitude
             osmt:name      'node name tag' ;
             osmt:name:en   'node name:en tag' ;
             osmt:wikipedia <> ;
             osmt:wikidata  wd:Q34 .

osmway:2345  osmm:type      'w' ;
             osmm:version   42 ;
             osmm:loc       'Point(32.1 44.5)'^^geo:wktLiteral ;  # way's centroid
             osmm:isClosed  true ; # is this way an area or a line?
             osmt:name      'way name tag' ;
             osmt:name:en   'way name:en tag' ;
             osmt:wikipedia <> ;
             osmt:wikidata  wd:Q34 .

osmrel:3456  osmm:type      'r' ;
             osmm:version   42 ;
             osmm:has       osmway:2345 ;  # relation contains a way with blank label
             osmm:has:_     osmnode:1234 ; # relation contains a node with a non-ascii label
             osmm:has:inner osmrel:4567 ;  # relation contains a relation labelled as "inner"
             osmt:name     'way name tag' ;
             osmt:name:en  'way name:en tag' ;
             osmt:wikipedia <> ;
             osmt:wikidata  wd:Q34 .

Other tags:

  • osmm:badkey "@Sammelmütze" - any unsupported tag names will be stored here as "osmm:badkey" objects, without their own values. For a tag to be supported, it must match /^[0-9a-zA-Z_]([-:0-9a-zA-Z_]*[0-9a-zA-Z_])?$/ regex -- contain only Latin letters, digits, and underscore. All characters except the first and last may also be "-" and ":".
  • osmm:loc:error "Error: ..." - whenever node or way's location could not be calculated, this property contains relevant error info.

Quality Control Queries

Here are a few example queries to find potential errors in OSM data.

Disambiguation Pages

edit this query
SELECT ?osmId ?wdLabel ?wd ?wpTag WHERE {
  # Limit to subjects that have an OSM type ('n', 'r', 'w').
  # Replace ?osmType with a string 'r' to show only relations.
  ?osmId osmm:type ?osmType .

  # Limit to subjects that have a tag called "wikidata"
  ?osmId osmt:wikidata ?wd .

  # Include Wikipedia tag if it exists
  OPTIONAL { ?osmId osmt:wikipedia ?wpTag . }
  # Optionally, find pl:* wikipedia tags (point to Polish wiki)
  # For performance, remove the "OPTIONAL {" and "}" part above
  #  FILTER( STRSTARTS(STR(?wpTag), 'https://pl.wikipedia')) .

  # Or, instead, only show Wikidata items that have a Polish WP article
  # You may also want to add ?article to the list of fields returns by SELECT statement
  #  ?article schema:about ?wd .
  #  ?article schema:isPartOf <>.

  # Optionally, restrict OSM objects to those that have a specific tag (and value)
  #  ?osmId osmt:place 'city' .        # exact string matching
  #  ?osmId osmt:name:en ?nameen .     # unless filtered, matches all objects with this tag
  #  FILTER( regex(?nameen, "A.b") )    # filter name:en to match a regex. Not very efficient

  # ?wd must be "instance of" disambig, or instance of an item which is a subclass(es) of it.
  ?wd wdt:P31/wdt:P279* wd:Q4167410 .

  # Pick the first available language for the wikidata entry (creates ?wdLabel value)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,de,fr,it,pl,ru,es,sv,nl" . }
Run it

Find nodes located too far from Wikidata's

This query shows nodes whose location is more than 50km from where corresponding Wikidata item is located. See also distance function.
NOTE: Database only has location (osmm:loc) for the recently changed nodes. A full refresh is needed to regenerate older data.

SELECT ?osmId ?wd ?wdLabel ?dist WHERE {
  ?osmId osmm:type 'n' .       # Nodes only
  ?osmId osmm:loc ?osmLoc .    # With location value
  ?osmId osmt:wikidata ?wd .   # With "wikidata" tag
  ?wd wdt:P625 ?wdLoc .        # Corresponding Wikidata must also have location
  BIND(geof:distance(?wdLoc, ?osmLoc) as ?dist)   # Calculate distance between OSM's and WD's
  FILTER(?dist > 50)           # Filter by minimum difference
  # For the Wikidata entry, get its label in current user's language, or in the first available
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,de,fr,it,pl,ru,es,sv,nl" . }
Run it

Places without a translation on OSM or Wikidata

# Find all OSM objects with wikidata tag that do not have the name:en tag, and no English label on Wikidata item

SELECT ?osmId ?osmType ?place ?wd WHERE {
  ?osmId osmm:type ?osmType .
  ?osmId osmt:place ?place .
  ?osmId osmt:wikidata ?wd .
  FILTER NOT EXISTS { ?osmId osmt:name:en ?nameen . }

  OPTIONAL { ?wd rdfs:label ?label FILTER(lang(?label) = "en") }
Run it

Places with incorrect Wikipedia/wikidata tags

Shows OSM objects that have an invalid wikidata tag.

# List OSM objects whose Wikipedia tag does not match Wikidata tag
SELECT ?osmId ?wdLabel ?wpFromOSM ?wpFromWd ?wd WHERE {
  ?osmId osmm:type ?osmType .        # osm object must have an OSM type (n/w/r)
  ?osmId osmt:wikidata ?wd .         # osm object must have a "wikidata" tag
  ?osmId osmt:wikipedia ?wpFromOSM . # osm object must have a "wikipedia" tag

  # wikipedia tag link must not be listed in the corresponding wikidata object
  FILTER NOT EXISTS { ?wpFromOSM schema:about ?wd . }

    # if Wikidata has another link in the same domain, show it
    ?wpFromWd schema:about ?wd .
    FILTER( SUBSTR( STR(?wpFromOSM), 0, 25 ) = SUBSTR( STR(?wpFromWd), 0, 25 ) )
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
Run it

Other quality control queries

Current limitations

  • Only includes OSM objects that have at least one tag or at least one member (for relations)
  • The database Only stores values for simple tag keys - those that only English letters, digits and symbols - : _. Tag keys that don't satisfy this criteria, e.g. those that contain spaces or accented characters, are stored as osmm:badkey "bad tag key", without tag values.
  • OSM geometry info is not imported, e.g. no center point or bounding box, except for osmm:isClosed (true/false) property for ways. Nodes with tags have an osmm:loc value, but it needs to be backfilled for nodes that haven't changed recently.