Wikidata+OSM SPARQL query service

From OpenStreetMap Wiki
(Redirected from Wikidata RDF database)
Jump to: navigation, search


See short tutorial video
About this image

Short video on getting started

Wikidata+OSM database allows users to query data from both OpenStreetMap and Wikidata simultaneously, using a SQL-like language. The service also allows users to edit the map based on a query result. The service is an RDF database, accessible with SPARQL queries. All Wikidata SPARQL documentation is at Wikidata Query Help. Similarly to Overpass-Turbo, this service can be used directly from JOSM (install Wikipedia plugin).

The short getting-started video (right) provides a quick intro. Try it at http://88.99.164.208/wikidata/ - use the examples button.

Current Challenges

See Wikipedia Improvement Tasks

Hello (OSM SPARQL) World

Here is a simple example of using SPARQL, the Wikibase RDF format and OSM RDF format. This query lists a few OSM place objects, and can be executed by clicking "run it" underneath it. (Compare it with a similar Overpass Query)
There are many more query examples on how to cross-reference OSM data with Wikidata, use Wikipedia page popularity, perform quality control queries, and other.

# A sample query to list all OSM objects with a place tag.
# The * will show all variables that are used in the WHERE clause
SELECT * WHERE {
  # The first expression matches the three columns of the Triple Store with:
  #   "find any subject that has an OSM 'place' tag, and any value of that tag"
  #
  # Question mark represents a variable. ?osmId becomes a URI for an OSM object, e.g.
  #    <http://www.openstreetmap.org/node/2681940767>  (Everest)
  # The same URI may also be written with a prefix - osmnode:2681940767
  # All OSM tags use osmt: prefix. Using osmt:place constant only
  # matches OSM object with a place tag. The ?place variable will
  # become the value of the place tag. You can use a constant instead of a variable,
  # e.g. substitute ?place with a constant string 'city'.

  ?osmId osmt:place ?place .

  # osmm: prefix means "meta values", e.g. OSM object type, location (loc), version, ...
  # Uncomment to limit to a specific type of an OSM object ('n', 'r', 'w').
  #   ?osmId osmm:type 'r' .

  # osmm:loc is a centroid point for most OSM objects. If included, the results
  # can be shown on a map by switching to the map view (right under the query)
  #   ?osmId osmm:loc ?loc .

  # The FILTER expressions allow complex filtering beyond a simple "must match".
  #   FILTER (?place = 'town' || regex(?place, '[a-z][0-9]', 'i'))

  # This will limit results to places which do not have a `name:en` tag.
  #   FILTER NOT EXISTS { ?osmId osmt:name:en ?nameen . }

  # For OSM objects with the wikidata tag, find the English name of that object in Wikidata
  # if it exists. Also, get what type of the object it is (P31 = instance of),
  # and the label for that type, also if exists in English. For labels and descriptions,
  # you may instead use a service that picks first available language based on a list.
  #   ?osmId osmt:wikidata ?wd .
  #   ?wd wdt:P31 ?type .
  #   OPTIONAL { ?wd rdfs:label ?label .  FILTER (lang(?label) = "en") }
  #   OPTIONAL { ?type rdfs:label ?typelabel .  FILTER (lang(?typelabel) = "en") }
}
# Similar to SQL, queries can use aggregations, subqueries, and limit the results
LIMIT 10
Run it (edit query)

Using from JOSM

This service can be used directly from JOSM editor.

  • Install Wikipedia plugin and enable expert mode (in the view menu)
  • Use File / Download data / Download from Wikosm API tab.

JOSM will download all OSM object IDs it finds in the query result. All other values will be ignored, and should not be requested to reduce server load. There are several magic keywords to help with the query:

  • {{boxParams}}: use it with wikibase:box service. Your map selection will be converted to two lines for cornerWest and cornerEast parameters.
  • {{center}}: the center of the map selection will be inserted as a point constant.

Examples:

OSM objects are not humans, but humans could be their subject, so subject:wikidata should be used. This query finds objects tagged with human in wikidata tag within the selected area. This query is very slow if used on a large area.
# Find objects tagged as humans in a selected area (very slow on large areas)
SELECT ?osmid WHERE {
  SERVICE wikibase:box {
    ?osmid osmm:loc ?location .
    {{boxParams}}
  }
  ?osmid osmt:wikidata ?wd .
  ?wd wdt:P31/wdt:P279* wd:Q5 .
}
This query use s a simple calculation and filter to find objects that form a ring around the selection's center point.
# Find educational places at least 2, and at most 3 km from the center of the selection
SELECT ?osmid WHERE {
  VALUES ?amenity { "kindergarten" "school" "university" "college" }
  ?osmid osmt:amenity ?amenity ;
         osmm:loc ?loc .
  BIND(geof:distance({{center}}, ?loc) as ?distance)
  FILTER(?distance > 2 && ?distance < 3)
}

Same as above, but using "wikibase:around" service.

# Find educational places at least 2, and at most 3 km from the center of the selection
SELECT ?osmid WHERE {
  VALUES ?amenity { "kindergarten" "school" "university" "college" }
  ?osmid osmt:amenity ?amenity .
  SERVICE wikibase:around { 
      ?osmid osmm:loc ?loc . 
      bd:serviceParam wikibase:center {{center}} . 
      bd:serviceParam wikibase:radius "3" . 
      bd:serviceParam wikibase:distance ?distance .
  } 
  FILTER(?distance > 2)
}

How OSM data is stored

All data is stored in a Triplestore as subject predicate object. statements. For example, a statement relation #123 has a tag "name" set to "value" can be represented as subject (relation #123), predicate (a tag "name"), and object ("value"). Both the subject and the predicate parts of the statement must always be a complete URI, e.g. <https://www.openstreetmap.org/way/42> (node #42), or <https://wiki.openstreetmap.org/wiki/Key:wikidata> (tag wikidata). To make the URI more readable, we use prefixes to shorten it, e.g. osmway:42 and osmt:wikidata. The object part of the statement can be either a value (string/number/boolean/geo coordinate/...), or, just like the first two parts, a URI. This way one statement's object could be another statement's subject, creating a linked graph. In SPARQL, each statement must end with a period, but if multiple statements have the same subject, we can separate them with a semicolon. The prefixes are defined in the engine, but could be specified manually for clarity.

Note: Both Wikidata and OSM data is stored together, in a giant virtual "subject/predicate/object" table. So the subject could be osmnode:1234, wd:Q42, sitelinks, or many other types of URIs. The easiest way to get just the OSM objects is to use osmm:type predicate - all OSM subjects will have one. But for performance reasons, don't use more filters than you need - if you are querying for any OSM objects with a osmt:wikipedia, you don't need to specify the osmm:type, because osmt:wikipedia already implies an OSM object.

# These prefixes are defined in the DB, and do not need to be added to your queries
prefix osmnode: <https://www.openstreetmap.org/node/>
prefix osmway: <https://www.openstreetmap.org/way/>
prefix osmrel: <https://www.openstreetmap.org/relation/>
prefix osmt: <https://wiki.openstreetmap.org/wiki/Key:>
prefix osmm: <https://www.openstreetmap.org/meta/>
prefix pageviews: <https://dumps.wikimedia.org/other/pageviews/>

osmnode:1234 osmm:type      'n' ;
             osmm:loc       'Point(32.1 44.5)'^^geo:wktLiteral ;  # longitude/latitude
             osmt:name      'node name tag' ;
             osmt:name:en   'node name:en tag' ;
             osmt:wikipedia <https://en.wikipedia.org/wiki/Article_name> ;
             osmt:wikidata  wd:Q34 .

osmway:2345  osmm:type      'w' ;
             osmm:loc       'Point(32.1 44.5)'^^geo:wktLiteral ;  # way's centroid
             osmm:isClosed  true ; # is this way an area or a line?
             osmt:name      'way name tag' ;
             osmt:name:en   'way name:en tag' ;
             osmt:wikipedia <https://en.wikipedia.org/wiki/Article_name> ;
             osmt:wikidata  wd:Q34 .

osmrel:3456  osmm:type      'r' ;
             osmm:has       osmway:2345 ;  # relation contains way #2345
             osmway:2345    "inner" ;      # way #2345 has a "inner" role
             osmm:loc       'Point(32.1 44.5)'^^geo:wktLiteral ;  # centroid of all members
             osmt:name     'way name tag' ;
             osmt:name:en  'way name:en tag' ;
             osmt:wikipedia <https://en.wikipedia.org/wiki/Article_name> ;
             osmt:wikidata  wd:Q34 .


# Other common values:
     osmm:version    42         # Last object's version
     osmm:changeset  1234567    # Last edit of this object in this changeset
     osmm:timestamp  '2017-08-23T01:02:03+00:00'^^xsd:dateTime    # Last edit time
     osmm:user       'UserName' # Name of the user - could change

Other tags:

  • osmm:badkey "@Sammelmütze" - any unsupported tag names will be stored here as "osmm:badkey" objects, without their own values. For a tag to be supported, it must match /^[0-9a-zA-Z_]([-:0-9a-zA-Z_]*[0-9a-zA-Z_])?$/ regex -- contain only Latin letters, digits, and underscore. All characters except the first and last may also be "-" and ":".
  • osmm:loc:error "Error: ..." - whenever node or way's location could not be calculated, this property contains relevant error info.

PageViews:

  • <https://en.wikipedia.org/wiki/Universe> pageviews: 12345 - most popular pages have a pageview counter, making it possible to order results by popularity in a specific wiki.

Quick-fix Editor

This service can also be used to edit OSM data directly to allow quick fixes.

Map Editor is one of the SPARQL query result viewers. It allows users to save or reject suggested changes. User will be able to review, and either accept or reject each change one by one. Place #defaultView:Editor comment at the top to automatically use this feature, and ensure your query returns all the required columns.

Mandatory Query Columns

All queries must always produce the following columns:

  • id - OSM object URI, e.g. osmway:123
  • loc - marker location. You could get it from osmm:loc(OSM) or wdt:P625 (Wikidata), or construct it on the fly
  • comment - text string that will be used for the change set comment. Use CONCAT() function to dynamically build the text string.
  • any number of t* and v* columns to specify object modifications, e.g. t1 - v1, t2 - v2. Every t* must have a corresponding v*. The * must be a number.
    • The t* columns represent tag names. Tag can either be a string literal or a osmt:*value.
    • The v* column represents the corresponding new value. If the value is unbound, it will be deleted.

Enabling "Reject" button

Reject functionality requires community consensus. The rejections are stored in the OSM objects themselves as an extra tag, so your community must agree to allow it. An alternative storage for rejections is being worked on.

Place #rejectTag:... and #queryId:... comments at the top of your query, replacing dots with proper values (only letters/digits/underscores allowed). The reject tag specifies the field to keep rejections. The query id describes the goal of your query. If a user rejects the proposed change, it will add a (rejectTag)=(queryId) tag to the object, and prevent another user from accidentally accepting it when using the same query. (rejectTag) can store more than one value (semicolon-separated).

Editor Example

This query allows users to quickly convert amenity=sanatorium into two tags: leisure=resort + resort=sanatorium. The query needs to output 3 tag pairs - t1/v1 to delete existing amenity tag, and t2/v2, t3/v3 to generate two new tags. Click "run it" at the bottom. All rejected objects will get an _autoreject=amenity_sanatorium tag to prevent others from editing it.

# amenity=sanatorium -> leisure=resort + resort=sanatorium
#defaultView:Editor
#queryId:amenity_sanatorium
SELECT
  # Generate output fields - id of the object, mark location, and the changeset comment
  ?id
  ?loc 
  ('amenity=sanatorium -> leisure=resort + resort=sanatorium' as ?comment)

  # Delete existing amenity tag by using an unbound (unset) ?v1 variable
  (osmt:amenity as ?t1)
  ?v1

  # Set leisure tag to the value 'resort'
  (osmt:leisure as ?t2)
  ('resort' as ?v2)

  # Set resort tag to the value 'sanatorium'
  (osmt:resort as ?t3)
  ('sanatorium' as ?v3)

WHERE {
  # Must have amenity=sanatorium and a location.
  ?id osmt:amenity 'sanatorium' ;
      osmm:loc ?loc .

  # Ignore if leisure tag already exists, and not set to 'resort'
  OPTIONAL { ?id osmt:leisure ?leisure }
  FILTER (!BOUND(?leisure) || ?leisure = 'resort')

  # Ignore if resort tag already exists, and not set to 'sanatorium'
  OPTIONAL { ?id osmt:resort ?resort }
  FILTER (!BOUND(?resort) || ?resort = 'sanatorium')
}
Run it (edit query)

Editor limitations

  • At the moment the editor only allows one change per change set.

Other Queries

  • The query service uses a community-maintained query example list - directly accessible and searchable from the query interface under the "examples" button.

Current limitations

  • Only includes OSM nodes that have at least one tag. All ways and relations, even empty ones, are included.
  • The database only stores values for simple tag keys - those that only English letters, digits and symbols - : _. Tag keys that don't satisfy this criteria, e.g. those that contain spaces or accented characters, are stored as osmm:badkey "bad tag key", without their values.
  • OSM geometries are not imported, but most objects have osmm:loc (center point), and ways have osmm:isClosed (true/false).