Wikipedia Link Improvement Project

From OpenStreetMap Wiki
(Redirected from Wikipedia Integration Tasks)
Jump to navigation Jump to search

This page documents ongoing tasks to fix Wikipedia and Wikidata related tags. Most queries here use Wikidata+OSM SPARQL query service. See also Quick fixes.

Wikipedia links in the "website"/"url" key

Query — show resultsquery editor
List of all objects, whose url or website points to Wikipedia
#defaultView:Map
SELECT ?osmId (IRI(?url) as ?wp) ?loc WHERE {
  
  { SELECT ?osmId ?url ?loc WHERE {
    ?osmId osmt:url ?url ;
           osmm:loc ?loc .
    FILTER( contains(str(?url), 'wikipedia.org') )
  } }
UNION
  { SELECT ?osmId ?url ?loc WHERE {
    ?osmId osmt:website ?url ;
           osmm:loc ?loc .
    FILTER( contains(str(?url), 'wikipedia.org') )
  } }
}

Often users add links to Wikipedia in website and url tags. They should be moved to wikipedia + wikidata instead. To fix:

  • Use the above query to view and fix each object

Missing Wikidata tags

Query — show resultsquery editor
List of all nodes with Wikipedia but without Wikidata tags, excluding anchors.
#defaultView:Map
SELECT ?osmId ?wp ?loc WHERE {
  # Limit to nodes that have a tag called "wikipedia", and get its location
  ?osmId osmt:wikipedia ?wp ;
         osmm:loc ?loc ;
         osmm:type 'n' .

  # At the moemnt, the "#" symbol is incorrectly encoded as %23.  It will not be encoded in the future
  FILTER( !contains(str(?wp), '%23') )
  
  # Must not have Wikidata tag
  MINUS { ?osmId osmt:wikidata ?wd . }
}

iD editor automatically adds wikidata tag when a user adds wikipedia field. In JOSM, wikidata tag can be added with Data/Fetch Wikidata IDs command using Wikipedia plugin. These objects can be easily found with Overpass turbo using [wikipedia][!wikidata] query. There are several reasons why the wikidata tag may be missing:

  • In iD, user added wikipedia tag using "tags" instead of "fields" section. In JOSM, user forgotten to use Fetch IDs.
    • Using JOSM, use fetch IDs command in the data menu.
  • The wikipedia tag is incorrect, either because the title was entered incorrectly, or because it was deleted.
    • Find a Wikipedia title about the object, possibly in a different wiki language, or delete wikipedia tag.
  • The Wikipedia page exists, but there is no corresponding Wikidata entry.
    • Check if there is an article about this exact object in another Wikipedia language. If exists, link both Wikipedia articles using "edit links" in the list of languages on the left, and re-fetch.
    • If not, create a new Wikidata entry. You should always add at least one label, description, "instance of" statement, and a link to the Wikipedia article. Save and re-fetch.

Mismatching wikidata and OSM name tags

Any OSM feature that gets linked to a wikidata item should ideally have the same or very similar name as they refer to an identical geographical feature. Any mismatches in the name might indicate a potentially incorrect wikidata tag. One can review a list of recent wikidata tags on OSM with a mismatched name using OSMCha:


Mismatching wikidata and wikipedia tags

Both wikipedia and wikidata tags should consistently point to the same thing. wikidata tag must always point to the Wikidata entry that links to the same Wikipedia title as stored in the wikipedia tag, except if Wikidata has a more precise entry, which corresponds better to the OSM object than the Wikipedia article does. In some cases, wikipedia tag points to a "redirect page", whose target in turn is part of the correct Wikidata entry. While this is OK, the OSM SPARQL service does not store such informatiton, thus producing errors. It is better to fix wikipedia tags to point to the actual articles to help with quick verification.

Links to Wikipedia pages about multiple objects

Frequently, there is no Wikipedia article about the specific OSM object, e.g. a church, yet there exists a Wikipedia page that mentions the object. This page could either be a table or list of all churches in the area, or it could be a page about a town, with a section of the article dedicated to the church. In some cases, it could be a list of different concepts with the same name (disambig page, see disambig section below). In any of these cases, do not use wikipedia tag. Instead, use related:wikipedia (TBD!), and no wikidata tag at all.

Links to disambiguation pages

Query — show resultsquery editor
This query shows a map of all objects that point to Wikipedia disambiguation pages.
#defaultView:Map
SELECT ?osmId ?wdLabel ?wd ?wp ?loc WHERE {
  # Limit to subjects that have a tag called "wikidata", and show its location
  ?osmId osmt:wikidata ?wd ;
         osmm:loc ?loc .

  # ?wd must be an "instance of" a disambiguation page, or an instance
  # of some type, which itself is a (sub-)*subclass of a disambig page.
  ?wd wdt:P31/wdt:P279* wd:Q4167410 .

  OPTIONAL { ?osmId osmt:wikipedia ?wp . }
            
  # Pick the first available language for the wikidata entry (creates ?wdLabel value)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,de,fr,it,pl,ru,es,sv,nl" . }
}

A disambiguation page is a page that lists multiple meanings of the same term. wikipedia and wikidata tags should never link to such pages. These items can be easily found by using ?wdId wdt:P31/wdt:P279* wd:Q4167410 query. In some rare case, Wikidata entries might have been incorrectly marked as disambiguations, and should be fixed (set proper "instance of", and remove a few main disambig descriptions). For all other cases, either find the right Wikipedia/Wikidata values, or remove them if there is no such entry. Having a link to disambig page has no value, with the possible exception of related:wikipedia tag as described above.

Links to list pages

Query — show resultsquery editor
This query shows a map of all objects that point to Wikipedia disambiguation pages.
#defaultView:Map
SELECT ?osmId ?wdLabel ?wd ?wp ?loc WHERE {
  # Limit to subjects that have a tag called "wikidata", and show its location
  ?osmId osmt:wikidata ?wd ;
         osmm:loc ?loc .

  # ?wd must be an "instance of" a list page, or an instance
  # of some type, which itself is a (sub-)*subclass of a disambig page.
  ?wd wdt:P31/wdt:P279* wd:Q13406463 .

  OPTIONAL { ?osmId osmt:wikipedia ?wp . }
            
  # Pick the first available language for the wikidata entry (creates ?wdLabel value)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,de,fr,it,pl,ru,es,sv,nl" . }
}

Similar to disambiguation pages, lists can be found using ?wdId wdt:P31/wdt:P279* wd:Q13406463 query, and should be fixed to use the related:wikipedia tag, and no wikidata tag.

Links to page sections using a hash symbol

If wikipedia tag contains a "#" (a link to a page section), most likely it should also not use wikipedia tag, but instead use the related:wikipedia tag, and no wikidata tag.

Links to Concepts, Brands, Subjects, Networks

Brands

Objects where wikidata should probably be brand:wikidata — show resultsquery editor
This query uses existing brand:wikidata tags to determine which Wikidata IDs are likely to be brands, and shows all OSM objects with wikidata tag set to those IDs.
#defaultView:Map
SELECT ?osmId ?location ?bwd ?bwdLabel ?bwdDescription WHERE {

  # Subquery finds brand:wikidata IDs used more than 10 times
  {
    SELECT ?bwd (count(*) as ?count) WHERE {
      ?o osmt:brand:wikidata ?bwd .
    }
    group by ?bwd
    having (?count > 10)
  }

  # Find OSM objects where wikidata tag is one of the common brand:wikidata IDs
  ?osmId osmt:wikidata ?bwd ;
         osmm:loc ?location .

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,fr,ru,es,de,zh,ja". }
}
Editor query to move WP/WD to brand:WP/WD — show resultsquery editor
#defaultView:Editor
SELECT
  ?id ?loc 
  (CONCAT('Moving ',
          if(!bound(?bwdLabel), '', ?bwdLabel),
          ' from wikipedia to brand:wikipedia') as ?comment)

  (osmt:wikipedia as ?t1)
  ?v1   # unbound, which means it will be deleted

  (osmt:wikidata as ?t2)
  ?v2   # unbound, which means it will be deleted

  (osmt:brand:wikipedia as ?t3)
  (if(bound(?existingBwp), ?existingBwp, ?existingWp) as ?v3)

  (osmt:brand:wikidata as ?t4)
  (?bwd as ?v4)

WHERE {

  # restrict to just a few brands. Comment it out to search all
  # VALUES ?bwd {wd:Q65310 wd:Q24933790 wd:Q1684639}

  # Subquery finds brand:wikidata IDs used more than 10 times
  {
    SELECT ?bwd (count(*) as ?count) WHERE {
      ?o osmt:brand:wikidata ?bwd .
    }
    group by ?bwd
    having (?count > 10)
  }

  # Find OSM objects where wikidata
  # is one of the common brand:wikidata IDs
  ?id osmt:wikidata ?bwd ;
         osmm:loc ?loc .
  
  OPTIONAL { ?id osmt:wikipedia ?existingWp. }
  OPTIONAL { ?id osmt:wikipedia:brand ?existingBwp. }

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,fr,ru,es,de,zh,ja".
    ?bwd rdfs:label ?bwdLabel .
  }
}

As described in Wikidata proposal, there are many cases when Wikipedia/Wikidata may be about the general concept, and not the specific object. For example, a McDonald's restaurant should not link to the Wikipedia McDonald's article because the article is about the brand, not this specific restaurant. Instead, OSM object should use brand:wikipedia & brand:wikidata tags. The brand:wiki... tag should also be used for anything brand-related, such as a supermarket or an ATM. Similarly, a statue of Einstein should use subject:wiki... tags, unless there is an article about the statue itself. subject:wiki* applies to many other cases, such as memorials boards and graves.

Linking to Humans

Direct links to Humans, instead of using subject/artist/name:etymology — show resultsquery editor
#defaultView:Editor{"taskId":"wikipedia_human_links", "comment": "Fix wp/wd link to a human, instead of a more specific subject/artist/...", "labels":{"a":"subject","b":"artist","c":"name:etymology"}, "vote":true }
SELECT
  ?id ?loc

  (osmt:wikidata as ?tag_a1)                 (false as ?val_a1)
  (osmt:subject:wikidata as ?tag_a2)         (?wd as ?val_a2)
  (if(?isWpAboutWd, osmt:wikipedia, false) as ?tag_a3)                 (false as ?val_a3)
  (if(?isWpAboutWd, osmt:subject:wikipedia, false) as ?tag_a4)         (?wp as ?val_a4)
    
  (osmt:wikidata as ?tag_b1)                 (false as ?val_b1)
  (osmt:artist:wikidata as ?tag_b2)          (?wd as ?val_b2)
  (if(?isWpAboutWd, osmt:wikipedia, false) as ?tag_b3)                 (false as ?val_b3)
  (if(?isWpAboutWd, osmt:artist:wikipedia, false) as ?tag_b4)          (?wp as ?val_b4)

  (osmt:wikidata as ?tag_c1)                 (false as ?val_c1)
  (osmt:name:etymology:wikidata as ?tag_c2)  (?wd as ?val_c2)
  (if(?isWpAboutWd, osmt:wikipedia, false) as ?tag_c3)                 (false as ?val_c3)
  (if(?isWpAboutWd, osmt:name:etymology:wikipedia, false) as ?tag_c4)  (?wp as ?val_c4)

WHERE {
  # Limit to subjects that have a tag called "wikidata"
  ?id osmt:wikidata ?wd ;
      osmm:loc ?loc .

  # ?wd must be an "instance of" a human, or an instance
  # of some type, which itself is a (sub-)*subclass of a human
  ?wd wdt:P31/wdt:P279* wd:Q5 .

  # Check if wikipedia tag exists, and if it matches the wikidata tag
  OPTIONAL { ?id osmt:wikipedia ?wp }
  BIND( EXISTS{ ?wp schema:about ?wd } as ?isWpAboutWd)
}

OpenStreetMap represents objects, but not human beings. When OSM object links to a human being, there is a very good chance that's a mistake:

Linking to Fictional Humans

Direct links to a Fictional Human, instead of using subject — show resultsquery editor
#defaultView:Editor{"taskId":"wikipedia_fictional_human_links", "comment": "Fix wp/wd link to a fictional human, instead of using subject:wikidata" }
SELECT
  ?id ?loc

  (osmt:wikidata as ?tag_1)                 (false as ?val_1)
  (osmt:subject:wikidata as ?tag_2)         (?wd as ?val_2)
  (if(?isWpAboutWd, osmt:wikipedia, false) as ?tag_3)                 (false as ?val_3)
  (if(?isWpAboutWd, osmt:subject:wikipedia, false) as ?tag_4)         (?wp as ?val_4)

WHERE {
  # Limit to subjects that have a tag called "wikidata"
  ?id osmt:wikidata ?wd ;
      osmm:loc ?loc .

  # ?wd must be an "instance of" a fictional human, or an instance
  # of some type, which itself is a (sub-)*subclass of a fictional human
  ?wd wdt:P31/wdt:P279* wd:Q15632617 .

  # Check if wikipedia tag exists, and if it matches the wikidata tag
  OPTIONAL { ?id osmt:wikipedia ?wp }
  BIND( EXISTS{ ?wp schema:about ?wd } as ?isWpAboutWd)
}

OpenStreetMap represents objects, but not human beings. When OSM object links to a human being, there is a very good chance that's a mistake. For fictional human beings, most likely it was meant to use subject:wikidata - Who does this feature represent?

subject:wikidata pointing to a sculptor

Shows sculptors who are the subjects, not the artists (some might be correct) — show resultsquery editor
Unless the sculpture is about a sculptor, most sculptors would the the artist, not the subject
#defaultView:Map
SELECT
 ?osmId
 (SAMPLE(?wdLabel) as ?label)
 (SAMPLE(?wd) as ?wd)
 (GROUP_CONCAT(DISTINCT(?occupation); separator=", ") as ?occupation)
 (SAMPLE(?loc) AS ?loc)

WHERE {  
  # Get OSM elements with "subject:wikidata" tag and its location.
  ?osmId osmt:subject:wikidata ?wd ;
         osmm:loc ?loc .
  
  # The subject:wikidata must have occupation="sculptor",
  # or a subclass of sculptor.  Get all occupations of that person.
  ?wd wdt:P106/wdt:P279* wd:Q1281618 ;
      wdt:P106 ?occ .

  # Get labels for the Wikidata entry, and for all occupations
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,de,fr,it,pl,ru,es,sv,nl" .
    ?wd rdfs:label ?wdLabel .
    ?occ rdfs:label ?occupation
  }

} GROUP BY ?osmId

Duplicate tags in wikipedia & brand:wikipedia

Frequently the same value is set on both wikipedia and brand:wikipedia (or subject:wikipedia, ...). Only one of them should be set. Same thing for *:wikidata.

Duplicate tags on a relation and its members

Query — show resultsquery editor
This query looks for all relations that do not contain other relations or nodes, and whose way members either have no wikidata tags, or wikidata tags are the same as on the relation itself.
#defaultView:Map
SELECT
  ?rel
  (SAMPLE(?location) as ?location)
  (sum(?failed) as ?failCount)
  (count(?mwd) as ?memberWithWdCount)
  (count(?member) as ?memberCount)
  ((count(?member) - count(?mwd)) as ?diffCount)

WHERE {
  # Find relations with wikidata tag and at least one member
  ?rel osmm:type 'r';
       osmt:wikidata ?wd;
       osmm:loc ?location;
       osmm:has ?member .

  # Get member's type
  ?member osmm:type ?mtype .

  # Get member's wikidata tag if it exists
  OPTIONAL { ?member osmt:wikidata ?mwd }

  # If any of the conditions are met, set ?failed to 1.
  # The sum of ?failed must be 0 for the relation to be shown
  BIND (if((?mtype='r' || ?mtype='n' || (bound(?mwd) && ?mwd!=?wd)), 1, 0) as ?failed)
}
GROUP BY ?rel
HAVING (?memberWithWdCount > 0 && ?failCount = 0)
ORDER BY DESC(?memberCount)

As described in Key:wikipedia, the tag should only be set on a relation, not on its members. In general, most common tags should be moved to the relation, such as multilingual and international names, wikipedia, and wikidata. The name tag should remain on each member to simplify identification.