User:Pigsonthewing/Wikipedia

From OpenStreetMap Wiki
Jump to navigation Jump to search

This Wikipedia bot is a proposed bot to add Wikipedia/Wikidata tags to OSM features. It was discussed with several colleagues at State Of The Map 2013.

Functionality

A bot would scan Wikipedia category trees (starting with en.Wikipedia, perhaps others later) listing entities with coordinates (e.g "churches", "schools", "bridges", "hospitals", etc.) and/ or Wikidata.

For each such entity, it would:

  • Check the coordinates on OSM
  • Look for a matching entity (e.g. a church) within a defined radius (say, 50m)
  • If a single matching entity is found
    • perform name matching
    • If the names match within a defined allowance (e.g. "St John" vs "Saint John's"; allowing for Wikipedia's disambiguation), add a wikipedia=* and/or the proposed wikidata=* tag to OSM
    • if the match is possible rather than probable (say below rather than over 80%)
      • add to a list for manual checking
  • If more than one matching entity is found
    • add to a list for manual checking

Optional extras:

  • Perform the check on more than one language Wikipedia at once. Flag any entities whose coordinates and names are similar, but whose coordinates are not the same, for manual checking.
  • Add links (or hidden HTML comments) to Wikipedia and/ or Wikidata (n.b. this will require bot approval at the target wiki).
  • Write a tool/ app/ game where people work through the list of uncertain matches, and select a "yes" or "no" button.

Updates

Mailing list discussions:

Discussion

Great idea. One request however, and this follows discussions I had at the conference. I think it would be better if we could start creating tools rather than Bots. In my book a Bot adds information under its own username and is the sole responsibility of the Bot-author. A tool is used by a named contributor who chooses to add information to OSM, based on advice or recommendations given to them by the tool they are using. Potlatch and JOSM could of course be considered to be tools, this could as well. As such, I would like to use a WikipediaTagTool to add information in my area. I would ask it for recommendations, I would review the recommendations, weed out or tweek recommendations I thought were inappropriate and then press submit. The edits would then appear in my edit history, and the associated comment would include any text I suggested followed by the text 'using WikiTagAdded v1.1' (or similar). In this way, users can vote with their feet to use good tools, and take heat away from the tool-maker. A bad tool will probably not get any serious take-up.

-- PeterIto (talk) 17:40, 13 September 2013 (UTC)

I think the original proposal points towards a split between obvious cases (-> bot) and fuzzy situations (-> tool), and I prefer that distinction. There are now so many tools, bug report sites, quality checkers and so on that even a worthwhile tool will only get used rarely - look at how many problems reported by OSMI or even the JOSM validator remain uncorrected, for instance. Therefore I do not think that burdening contributors with easily automated tasks is a good idea. --Tordanik 11:44, 14 September 2013 (UTC)
As Tordanik indicates, I was at pains to devise a suggested algorithm (improvements welcome) that distinguishes between ambiguous cases, which need human intervention, and the unambiguous, where a bot can safely be applied. We already have tools for tagging with Wikipedia links (e.g. JOSM Wikipedia-Plugin; Add-tags (which is slow)), but these are clearly not going to resolve the volume of cases which need to be addressed, in a timely manner. My experience of working with the operators of such bots on Wikipedia shows that the automation of the task is achievable. Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 11:59, 14 September 2013 (UTC)
I see some problems for bots:
For WIWOSM we want the OSM-object with the highest complexity, this means for a building or country we don't want to have the node with the name-tag, we want to have the polygon. Other problem can be that you are looking for a church but finding the square in front of the church with the same name.
In other cases a bot can run fine, so for streets I see no big problems and a good matching of OSM-names and WP-names. I see also the argument that the amount of linking is really big, I know that we have 300.000 Wikipedia-tags in OSM but we have 3 Mio. coordinates in Wikipedia, so we are now at 10%.
Please, don't start with wikidata-Tags as long as a Wikipedia-article exist for all objects.
I could speed-up "Addtags"-tool by using Overpass-API and moving to WMF-Toollabs. So tools will be in a lot of cases the better way but I would give bots a chance. The bot maintainer should start with semiautomatic tools to learn what can go wrong and switch later to automate everything. --Kolossos (talk) 20:54, 14 September 2013 (UTC)
Hi Kolossos, thanks for joining us. I think your first point is addressed by my "Look for a matching entity (e.g. a church)" bullet-point. That should prevent us from mistakenly linking a square or other feature to a church. If only a node is present, then we should tag it - a later editor can copy the tags when they draw the polygon. Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 14:46, 15 September 2013 (UTC)

Adding Wikipedia tags to OSM: what the italian community has done so far

Hi, Groppo created a script to harvest mappable articles in Wikipedia, which helper links to add the tags in OSM (using JOSM) see the result here. The code is on github. This project has been published during the days of OSMit (the "State of the Map Italy" on 4-6 October 2013) -- CristianCantoro (talk) 12:11, 6 November 2013 (UTC)

Frequency and regions

Very good idea. How often would you want to run this? Or would it be possible for a volunteer to select a region and run it (again)? Ter-burg (talk) 10:30, 8 June 2014 (UTC)

Good questions, thank you. I was thinking of a one-off run, but a regular follow-up would be a good idea. I suppose the frequency and method is up to the community; or at least to those willing to code bots. Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 10:55, 8 June 2014 (UTC)

Tool vs bots

I see that a tool (not a bot) is being developed: http://sotm-eu.org/en/slots/69 Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 14:48, 8 June 2014 (UTC)

How to query Wikipedia

Well, here's the documentation for their query API. ...ah poo. I found this mediawiki feature which would be v helpful - but it is not enabled yet for Wikipedia. If there's a way you can advocate for that feature in wkp, it would simplify your job a lot, I guess.

Otherwise, iterating through requires something like...

  1. Ask for pages in a particular category like this: https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Public_houses_in_the_United_Kingdom - but then we have to iterate down to find the sub-sub-sub-category members that are actually pubs.
  2. Then here's the query to retrieve the lat/lon for a single item: https://en.wikipedia.org/w/api.php?action=query&prop=coordinates&titles=King%27s_Head_Inn,_Aylesbury
  3. So then after that, we can do all the work in OSM-land. That does NOT make the rest of the job trivial, just that a good OSMer might be able to handle the search-around-a-point with their existing abilities.

See also/ updates