Talk:Wikidata

From OpenStreetMap Wiki
Jump to: navigation, search

Tools

Is there a tool to add Wikidata tags to existing Wikipedia tags? --LA2 (talk) 10:37, 6 August 2014 (UTC)

This should be quite easy to do. If an object has the tag wikipedia:{lang}={article}, it suffices to read the URL http://www.wikidata.org/wiki/Special:ItemByTitle/{lang}wiki/{article} and get the corresponding wikidata code. However, I think it is premature to do any bulk addition of wikidata tags. I personally would oppose doing that, precisly because the wikipedia -> wikidata translation is such a trivial thing to do (can be done on the fly by any application). Augusto S (talk) 20:53, 25 August 2014 (UTC)

Progress

How far have we come?

Date Wikipedia
keys
Wikidata
keys
Wikidata keys as
% of all objects
Wikidata keys as
% of Wikipedia tags
2014-08-06 355 026 18 676 0.00 5.26
--LA2 (talk) 10:37, 6 August 2014 (UTC)

OSM's IDs

Why OSM's IDs are not stable? --Pastakhov (talk) 17:48, 16 September 2014 (UTC)

One case: A feature may be mapped as a node at first, but it's later changed to an area for the sake of more geographical precision, and consequently change ID.
Another case: A new user is having some trouble changing the current geography of an area, so he simply deletes it and recreates it.
Another case: A vandal silently deletes an object. Later a good mapper sees the feature is missing, but is not aware the previous object was deleted and creates a new one instead.
There probably are other specific cases, but the main takeaway is that in OSM there is no guarantee a feature will always be represent by the same object. --Jgpacker (talk) 18:12, 16 September 2014 (UTC)
Have any idea how to get around this problem? --Pastakhov (talk) 03:13, 17 September 2014 (UTC)
I see only one solution to preserve the integrity of the data, it is to use triggers and use key like wikidataid for linking from Wikidata to OSM. Trigger should not allow delete objects that have the wikidataid key which used in wikidata (it is vandals case) or if the transaction does not contain any other object which received this wikidataid key (it is other cases). Is it theoretically possible? --Pastakhov (talk) 03:52, 17 September 2014 (UTC)
My impression is that the OSM community is too conservative to do something like this. So far, they have avoided to put restrictions on the user as much as possible. I believe it could be possible, but the benefits to OSM in doing this don't seem to be worth it (I might be wrong). --Jgpacker (talk) 12:28, 17 September 2014 (UTC)
Well, there is another way that is make a selection from history of edits and find wikidataid keys that were deleted or changed. Then need check whether the keys exists in the current OSM database and Wikidata. The filtered wikidataid keys give community for recovering. It is also possible instead of the wikidataid key we can use the existing wikidata key if it is possible to link back from Wikidata to OSM by the wikidata key. What do you think about this? --Pastakhov (talk) 13:59, 17 September 2014 (UTC)
I think it's a good idea to have an updated list of changed/removed wikidata ids for review. I'm not sure how to do this, but I think it's possible. Yes, as said on the page, we can link both from OSM to Wikidata, and from Wikidata to OSM, and I believe some people already do this. --Jgpacker (talk) 14:14, 17 September 2014 (UTC)
Can someone explain to me what problem we are trying to solve here? Wikidata has stable IDs, so we can create the connection by linking from OSM to WIkidata. What else do we need? --Tordanik 14:18, 17 September 2014 (UTC)
I'm trying to understand the meaning of this idea. And I can not understand anything except "Testing Wikibase to a scale of 60x its current size". I can not find any benefit from it and how it can work with unstable identities. I thought that there is is a problem that idea is trying to solve, but it seems it is not. Thank you for your time. --Pastakhov (talk) 04:53, 18 September 2014 (UTC)
That grant proposal has nothing to do with Wikidata tagging in OSM, or with the links between OSM and Wikidata. They are trying to convert the OSM database into the Wikidata format and importing it into their own instance of the Wikidata software (Wikibase). They then want to allow people to edit that OSM data using the editing tools from the Wikidata community. In the end, the changes should be sent back to the main OSM database, as with other editor software. During all this, they only use OSM identifiers and content, no Wikidata content involved.
If you ask me, though, the project still does not solve a real problem, as specialized editing software for OSM is already available. At best it might be interesting for a niche audience (e.g. Wikidata users who can start editing OSM without a learning curve). --Tordanik 13:10, 18 September 2014 (UTC)
Thanks, it is more understandable about grant's goal. Maybe I'm wrong, but in this case it looks like using a jackhammer to crack a nut. Store and handle such an enormous data volume only for the user interface... Is it really? Likely I should ask this in the grant proposal... --Pastakhov (talk) 15:07, 18 September 2014 (UTC)

Wikidata queries for administrative objects to compare with OSM hierarchies?

For quite some time I try to dig into Wikidata model generally to just understand how Wiki data works and how we can use it. Thus I found some similarities to the OSM data structures when dealing with administrative boundary relations inside OSM data.

For example in OSM we can easily query for all sub districts inside an upper district via the Wizard mode of overpass-turbo.eu ... try to enter in the wizard: boundary=administrative and type:relation in "Landkreis Cuxhaven"

How can I do such a query in Wikidata?

Well, https://wdq.wmflabs.org is an external tool to build such queries, and I managed to find out the following (by entering the English expressions in its auto-complete-search feature), so two conditions are needed: instance of [P31] : municipality of Germany [Q262166] AND located in the administrative territorial entity [P131]: Landkreis Cuxhaven [Q5897]

Finally we need the string CLAIM[31:262166] AND CLAIM[131:5897] there. Hopefully the link to the Autolist webservice is updated correctly, you can click to start a query there.

So general question: Is this the right way to find out what administrative Objects and hierarchies are "tagged" in Wikidata, and to verify if they are correct? --Stephan75 (talk) 09:36, 3 January 2015 (UTC)

How to link from a wikidata item to a way

Wikidata can already link from to a point, using coordinates. Wikidata would also really like to link wikidata items like countries, towns, motorways, rivers to a 'way' which defines these borders (including ways for historic/obsolete borders) of these linear and spatial objects. There has been talk of having a datatype for ways in wikidata but the general feeling is that OSM is a much more sensible place to create, edit and maintain such geographical objects rather than wikidata trying to duplicate your efforts somehow. The problem is that we keep being told that OSM doesn't have any stable ID for such ways.

Can anyone think of a workaround to make this work, based on existing OSM practice or based on something that OSM could be persuaded to adopt? Filceolaire (talk) 00:25, 3 August 2015 (UTC)

Ok, let me first describe the challenges we have to tackle when connecting OSM to Wikidata:
  • As you said, OSM IDs were not designed with stability in mind. They are not directly visible to the user when editing data, and regular editing operations sometimes affect IDs. In some cases (e.g. representing a feature with a closed way that was previously represented as a single point), it is even impossible to keep the ID intact.
  • Not all Wikidata items map 1:1 to an OSM feature. For example, roads are sometimes split into multiple segments, and only the most important ones (e.g. motorways) have relations collecting the segments.
  • There is sometimes no clean semantic separation of entities in OSM - e.g. the attributes for a restaurant and for the building it occupies might be on the same way. This shorthand is generally only accepted if it doesn't cause problems, though, so you could simply fix any problematic instances of this practice when trying to link with Wikidata
Now for the possible solutions. The easiest approach would probably be to link from OSM to Wikidata instead, as the Wikidata IDs seem to be more stable. The wikidata key can be used for this. Of course that would be harder to integrate into query tools and other Wikidata infrastructure, but there is something to be said for not duplicating the work. Adding these tags is already pretty much accepted in OSM, so you wouldn't have to do any persuasion on our side.
Alternatively, you could just accept the flaws of OSM IDs and use them anyway, as long as you are prepared for link breakage and the other issues above. It would be wise to at least set up some automated testing of the ids in that case, though.
Some other ideas have surfaced during discussions within the OSM community in the past. Some have suggested storing queries (e.g. with Overpass API instead of or in addition to IDs. Some have suggested to add permanent, unique IDs as attributes to OSM elements, and making it mappers' obligation to preserve these across editing operations. There hasn't been any real conclusion so far, though. --Tordanik 11:46, 4 August 2015 (UTC)
Thanks Tordanik. I think Wikidata would mainly be interested in linking to relations. For something like a Relation:boundary wikidata could want to link one wikidata item to multiple different boundaries (i.e. the current boundary and various historic boundaries) The OSM could presumably have "start time =" and "end time =" tags in addition to the wikidata key so we can distinguish between the various boundaries for the same wikidata item however those sorts of time properties would seem to be the sort of thing Wikidata would do better.
If we want the info box for a wikipedia article to show the boundaries then wikidata needs to be able to tell wikipedia where to find the boundary relation. Having a key on the OSM relation won't help much here. We need a link in the other direction.
Sounds like the best bet is to just link Wikidata items to OSM relations and accept that these have to be updated regularly.
Is there any other forum on OSM where I can ask about this? Filceolaire (talk) 21:29, 6 August 2015 (UTC)
Considering that there has been a solution for a few years to display boundaries (and many other items) on Wikipedia articles based on keys of OSM ways and relations ("WIWOSM"), I'm surprised that you don't consider that option.
As for using only relations, keep in mind that relations are generally used on a "only use if necessary" basis in OSM. Most linkable objects won't have a relation, but only, say, a way. You may be able to link to country boundaries with relations, but not to e.g. footprints of historic buildings or roads in cities. Historic information is likewise a bit contentious, with many considering it outside of the scope of OSM and suggesting the use of a specialized DB (using the OSM data model) such as Open Historical Map.
If you want to discuss this with the broader community, it's probably best to join the talk mailing list. --Tordanik 08:57, 7 August 2015 (UTC)

September 2015 discussion

Yesterday, a discussion on using Wikidata tags in OSM took place, as part of Wikidata's regular "office hours", on IRC.

The discussion log starts at https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-09-23-17.01.log.html#l-109

-- Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 12:31, 24 September 2015 (UTC)

WikiData API and wrappers for various programming languages

I guess I am not the only one who have problems finding this out. I am looking for a not too complicated API, or wrappers to query information from WikiData, such as list of name:*=* tags from the WikiData entry based on the qID. The best thing I have found so far is reading the WikiData page of the entry, and parse the entire page. --Skippern (talk) 15:55, 25 November 2016 (UTC)

Wikidata has a REST API that can simply return JSON data, instead of returning plain HTML (or some internal wiki syntax used in its pages, not directly editable with its UI where the wiki editor is completely disabled in its main namespace storing "Qnnn" data items or "Pnnn" property items, and replaced by the data editor).
E.g. w:d:Special:EntityData/Q42.json (this retrieves the full dataset for Q42, in JSON format, without any properties filter)
E.g. w:d:Special:EntityData/Q42.php (same request, but the returned JSON is reformatted into a "serialized" PHP array)
E.g. w:d:Special:EntityData/Q42 (same request, but the returned JSON is reformatted as HTML with the usual editable Wikidata UI)
The internal RESTAPI on Wikidata is still the MediaWiki RESTAPI (i.e. the standard "action" API of MediaWiki), where you can set the JSON output format with the "format=json" parameter (in fact, Wikipages for "Qnnn" items and "Pnnn" properties are just standard internal Wiki redirects to the "Special:EntityData" page of Wikidata, which itself will create a request to this RESTAPI).
See examples on https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities
Only the "json" and "php" formats are supported for now by this Mediawiki REST API ("html" returns the the usual editable Wikidata UI, "xml" and other documented formats are still not supported by "action=wb*" API requests), so,
E.g. w:d:Special:EntityData/Q42.xml (same request, but still fails, as "xml" is not a supported format). Some wiki developers have suggested the addition of a "py" format for Python, or other formats for Ruby, or some other standardized wrapping formats. For now everyone seems to live with the "json" format (even if they need a JSON parser in their prefered programming language, something that most languages already have)
The "php" format was added because it is far better performing than using a JSON parser in PHP. If an additional format is added, it will probably be "xml" first.
Some simple raw text formats would also be useful ("csv"?), but the queries would need to be more selective (using filters) to be flattened with a less complex tabular 2D structure.
Wikidata also comes with a separate "Wikibase library" (directly installable in other wikis), however it is based on the MediaWiki "modules" requiring the support of Lua, and some internal authorizations to be setup to allow performing requests to another site/database through Lua. Basically, this Lua library is made for allowing integration within MediaWiki templates and pages (without needing any client-side Javascript, like with the current "Taglist" template recently introduced on this OSM wiki): clients are still viewing standard HTML pages, they can edit them in the wiki syntax or with the Visual Editor, and clients won't perform any direct connections to the external Wikidata server: the local wiki will make this connection itself via the library, the Lua module will parse the results, and present it in standard Wiki syntax which is then embedable in Wiki templates and pages, and finally formatted to HTML by the MediaWiki server). But this library is still not usable on the OSM wiki, as it currently has no support for Lua modules. If you think about developping a Javascript extension (similar to Taglist), you should better use the REST API to perform JSON data requests directly to the Wikidata server, and you don't need that Wikibase library.
There's also an external "WDQ" tool server (https://wdq.wmflabs.org/api_documentation.html) which can process the data and perform data queries with some specific request syntax, and some more advanced capabilities to perform "joins" for recursive traversal through linked Qnnnn elements or Pnnn properties (this tool still internally uses the MediaWiki RESTAPI). This WDQ API is also a REST API.
More information in w:d:Wikidata:Data access if you want more specific filters — Verdy_p (talk) 16:58, 25 November 2016 (UTC)
Thanks for the answer, the Special:EntityData returning JSON should allow me to do the queries I was looking for. --Skippern (talk) 17:14, 25 November 2016 (UTC)
If you just want the translated "labels" and not all other properties, use the MediaWiki RESTAPI directly (instead of the Special page that has no filters at all):
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q42&props=labels
You can experiment this query in the API sandbox of Wikidata (the filter used is "props=labels", you can set here a list of property names separated by vertical bars):
https://www.wikidata.org/wiki/Special:ApiSandbox#action=wbgetentities&format=json&ids=Q42&props=labels
For example you can query the label in a single language (parameter "languages=fr") with an additional parameter to use fallbacks if there's no label currently in that language (parameter "languagefallbacks=1"):
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q42&props=labels&languages=fr&languagefallback=1
See how the volume of data returned is much reduced with these "props" and "languages" filters !
Another very useful API parameter is "utf8=1" (to avoid many characters being escaped as sequences of hexadecimal UTF-16 codeunits "\uNNNN", i.e. for all non-ASCII characters, and some ASCII characters, however the delimitation quotes of JSON string literal are still escaped). Here also it generally reduces the volume of data returned (notably when you query properties in non-Latin languages such as Russian, Arabic or Chinese).
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q42&props=labels&languages=ru&languagefallback=1
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&utf8=1&ids=Q42&props=labels&languages=ru&languagefallback=1
An additional API parameter you may need in your application (using concurrent requests, possibly from multiple users) is the possibility to define your own query id by an additional parameter, which will be present in the returned response, so that you can handle them asynchronously and map them to the state of your initial concurrent queries, instead of using blocking threads (this will increase the performance of your application if it is hosted on a server with many users).
But be aware of the Wikidata API usage policy: your server application should use some reasonnable cache to avoid "spamming" the Wikidata server with too many repeated requests for the same data. If your queries are sending data massively to Wikidata, you'll need to authenticate with a Wikimedia user account authorized to run on Wikidata as a "bot" user (and you'll need to also specify your current bot authorization token in the documented parameter for some very privileged actions, according to the general Wikidata policy about bots and to the security requirements for using some very restricted privileges).
The API sandbox on Wikidata is providing a very friendly UI to help you build your queries, with help provided for many options. Just click the proposed options or fill in their values, other panels may appear on the right panel you can select to view additional parameters you can set. Then execute your query: you'll see in the results tab the generated URL along with the data returned shown just below: copy-paste this URL as an example you can reuse in your app.
Final note: this API may be used with either a GET method (with url-encoded query strings) or a POST method (with parameters attached as web form data in the request body): for security or privacy reasons, some queries require you to use POST, notably those requiring a user authentication or for editing data if you need an edit token or if you use CORS requests requiring another token; the documentation informs you when POST is required, because POST requests are normally not cachable. But for most read-only data requests not containing private user data in the query itself or its reponse (and that should then be cachable and reusable independantly of users), you should use the GET method (all examples above are using the GET method, and are fully cachable).
Verdy_p (talk) 18:20, 25 November 2016 (UTC)