User talk:Yurik

From OpenStreetMap Wiki
Jump to: navigation, search

SPARQL question

I see that you created SPARQL examples. I have two questions:

Is it feasible to generate list of wikidata entries that

  • have interesting (long or featured) articles on Polish or English Wikipedia
  • have coordinates or are otherwise described as mappable object
  • are without wikipedia/wikidata tag
  • are located in Poland or around some specific location

?

To avoid XY problem - I want to add wikidata entries but I am bored by processing at https://osm.wikidata.link/ bunch of substubs without any interesting content. Also, trying to match major articles may help to detect missing OSM data.

Is it possible to get query result from http://88.99.164.208 using API? https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Wikidata_Query_Help/Result_Views is mentioning only manual download

Mateusz Konieczny (talk) 08:32, 9 September 2017 (UTC)

Hi Mateusz, yes to all of the above :) First, you may want to take a look at the main WDQS (click "Examples" there) - it has a lot of Wikidata-only examples and info. Also some help links too. You may use the API directly (look in the browser debugger at the request it sends, and do the same). Its a simple GET request. Or you can use my python code, with http://88.99.164.208/bigdata/sparql endpoint. I will post the queries here in a bit. --Yurik (talk) 01:06, 10 September 2017 (UTC)
P.S. Mateusz, I wrote a query per above and added it to examples. --Yurik (talk) 02:44, 10 September 2017 (UTC)
P.P.S Wikidata does not store article length, but it has page views counts, and how many different wiki languages have an article on the topic. Both are very good indicators of which objects should be shown first.
Is it possible to exclude events? Note that I am not interested in limiting to subclasses of Q618123 (that should be easier, but many entries in Wikidata lack "instance of"). I tried in Wikidata Query Service
SELECT ?item ?itemLabel
WHERE
{
  ?item wdt:P625 ?location.
  MINUS { ?location wdt:P31/wdt:P279* wd:Q1190554. }  # excludes events
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 10

but it timeouts for some reason. I looked at examples but I failed one that excludes items that are subclasses of something and once I tried adapt others I ended with query above that for some reason has performance problems. What is the CPU safe way of excluding events, without excluding objects missing instance of? I prefer false positives, I have no problem with adding missing "is instance".

Modyfying query at SPARQL examples tinyurl y9ma6e3f or tinyurl 993o5sy (no direct URL, wikidata for some reason is using public link shortener typically used to hide spam) resulted in 504 Gateway Time-out.

Mateusz Konieczny (talk) 06:08, 13 September 2017 (UTC)

You are using the wrong subject - it should be ?item, or in my example - ?wd: FILTER NOT EXISTS { ?wd wdt:P31/wdt:P279* wd:Q1190554 . } But yes, it does take too long. It might work faster if replace circle service with the coordinate filter, like I did here (last commented portion). --Yurik (talk) 06:45, 13 September 2017 (UTC)
Thanks! I tried bbox, but it is failing even in the simplest case:
SELECT ?osmId ?wdLabel WHERE {
   ?osmId osmm:loc ?loc .
   BIND( geof:longitude(?loc) as ?longitude )
   BIND( geof:latitude(?loc) as ?latitude )
   FILTER( ?longitude > 19 && ?longitude < 20 && ?latitude > 50 && ?latitude < 51)
} LIMIT 10

is it a hardware limitation or is something wrong wrong with my query? And thanks for featured articles query, I already used it to add some links and read interesting Wikipedia articles Mateusz Konieczny (talk) 08:30, 13 September 2017 (UTC)

I think its failing because there are so many points, and that query requires a sequential scan through the whole DB. That's why there is a box and circle geo services developed by Wikidata. The sequential filtering works well after the results are already small enough. The service on the other hand seems to be doing small geo-indexing, but it does it first, before the filtering. Need to look at it more. --Yurik (talk) 10:41, 13 September 2017 (UTC)

SPARQL question II

Sorry for bothering you but I have other query that fails for an unknown reason:

I wanted to find on Wikidata human settlements in Poland with teryt code and exclude already linked from OSM to check whatever wikidata import using teryt ids for matching may be useful

SELECT ?item ?itemLabel
 WHERE
 {
   ?item wdt:P31 wd:Q486972.
   FILTER EXISTS  {
 		?item wdt:P4046 ?teryt
   }
   # There must not be an OSM object with this wikidata id
   FILTER NOT EXISTS { ?osm1 osmt:wikidata ?wd . }
 
   # There must not be an OSM object with this wikipedia link
   FILTER NOT EXISTS { ?osm2 osmt:wikipedia ?sitelink . }
   
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
 } LIMIT 10
Run it (edit query)

Query return 0 elements, what is surprising given that there are elements that should match.

For example https://www.openstreetmap.org/node/1589969137#map=19/53.97112/14.54503 and https://osm.wikidata.link/Q673875 (I checked lack of matches with http://overpass-turbo.eu/s/rGW )

I used following query adapted from one examples to check that both osm node representing Grodno and its Wikidata entry are in database:

SELECT ?marketLoc ?marketName (?amenity as ?layer) ?osmid WHERE {
   VALUES ?place { "hamlet" }
   ?osmid osmt:place ?place ;
         osmt:name ?marketName ;
         osmm:loc ?marketLoc .
   # Get the location of Grodno from Wikidata
   wd:Q673875 wdt:P625 ?myLoc .
   # Calculate the distance,
   # and filter to just those within 5km
   BIND(geof:distance(?myLoc, ?marketLoc) as ?dist)
   FILTER(?dist < 5)
 }
Run it (edit query)

so it seems that there is a bug in my TERYT query. Is it something obvious? Mateusz Konieczny (talk) 16:35, 13 September 2017 (UTC)

Mateusz, the first query is a bit wrong - you used different variable names - ?item and ?wd, instead of the same one. Also, you don't need "Filter exists" - you can simply list both statements. No need to filter out sitelinks because they are not connected to the rest of the query. And lastly, you don't want just the "instance of a human settlement", you want "instance of a human settlement or anything that is a sub-sub-sub... class of a human settlement".
SELECT ?item ?teryt ?itemLabel WHERE {
   ?item wdt:P31/wdt:P279* wd:Q486972 .
   ?item wdt:P4046 ?teryt .
   FILTER NOT EXISTS { ?osm1 osmt:wikidata ?item . }
 
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
} LIMIT 10
Run it (edit query)
--Yurik (talk) 22:00, 13 September 2017 (UTC)