Xybot

From OpenStreetMap Wiki
Jump to navigation Jump to search
OSM Logo This user submits data to OpenStreetMap under the name
xybot.

xybot is collection of robot scripts doing janitor jobs. The robot is maintained by xylome. It is written in perl (first choice when doing pattern matching) and was based on the fixbot script of Frederik Ramm, which has received many extensions and additional sanity checks in the meanwhile. xybot is currently used several times a week on the Germany data and sometimes on the European data. It has also made edits at 10,10 for testing purposes on the new api 0.6 [1]

Erroneous modifications

Although i try to make only beneficial changes, there is no guarantee that i miss a circumstance where the data should not be altered.

Please report any erroneous alterations (please include node, way or relation ID) made by the robot, so i can avoid these alterations in the future or revert the changes made.

So what does the FixTypo ruleset do exactly

    • removing surrounding whitespace from keys and values. this could not be seen by most api browsers or is easily overseen in the editor software, but as a consequence many things are not rendered.
    • removing keys with empty values.
    • change of obvious and undoubtable misspellings of keys (eg. buidling=>building, bycicle=>bicycle)
    • change misspellings of tracktype=gradeX (eg. trcktypr=3 => tracktype=grade3)... i think this is the tag with the most misspellings
    • change of key/value pairs (eg. landuse=water => natural=water, amenity=spielplatz => leisure=playground)
    • remove '+'-sign on values of key layer=*, remove blank layer=* tags (but not layer=0)
    • adding religion=christian when a christian denomination=* is already set
    • change obvious misspelling of denominations (eg. portestan=>protestant)

So what does the FixKarlsruheSchema ruleset do exactly

  • rules for the currently used FixKarlsruheSchema ruleset.
    • change misspellings of the karlsruhe schema (eg. add:house_nummer=>addr:housenumber)
    • change values of addr:country to valid uppercase ISO3166-2 codes according to [2]

So what does the FixStrasseDeAT ruleset do exactly

  • rules for the currently used FixStrasseDeAT ruleset.
    • applied only to German and Austrian data extract
    • it does change name=* and addr:street=* tags having "Str." or "Strasse" to "Straße"
    • fix faulty Potlatch Umlauts ä => ä, ö => ö, ü => ü, Ä => Ä, Ö => Ö, Ãœ => Ü, ß => ß

So what does the FixStrasseCh ruleset do exactly

  • rules for the currently used FixStrasseCh ruleset.
    • applied only to swiss data extract
    • it does change name=* and addr:street=* tags having "Str." or "Straße" to "Strasse"
    • fix faulty Potlatch Umlauts ä => ä, ö => ö, ü => ü, Ä => Ä, Ö => Ö, Ãœ => Ü, ß => ß
    • change any 'ß' to 'ss'

So what does the FixRussianAddress ruleset do exactly

The FixEscapes ruleset

  • rules for the currently used FixEscapes ruleset.
    • applied worldwide
    • in some values there are falsely (doubly) escaped html-entities like & < > ' "
    • there are also falsely escaped semicolons (\s), equals signs (\e) and backslashes (\\) appearing in osmosis output. sometimes they are expanded to things like \\\\\\\\\\s and \\\\\\\\\\\\
  • the FixEscapes ruleset tries to fix this. The initial main run can be found here: Changeset 1293006
  • this ruleset will be run about once a month.

The FixPotlatchDiacritic ruleset

  • rules for the currently used FixPotlatchDiacritc ruleset.
    • applied worldwide
    • potlatch running on unixoid OS generates some strange roman characters with diacritic marks (actually due to flash bugs)
  • the FixPotlatchDiacritic ruleset tries to fix this. An example of this bots run can be found here: Changeset 1350856
  • this ruleset will be run about once a month.

The FixRomanianDiacritics ruleset

Hey wait, xybot touched an object, but i can't see any change

xybot also removes whitespace characters surrounding keys and values that are existent in the database. Unfortunately these are not visible when browsing the objects using the api (as the api seems to omit them), but they are in the database dumps and in the country extracts. you can see (if xybot hasn't removed them yet) some of them using Dirk Stöckers Tagwatch as red markers around a key or value.

Why are the changesets spanning the whole planet and why is it spamming the history tab

Since api 0.6 a history tab was added to the OpenStreetMap main page. When the current map view intersects the bounding box of a changeset, this changeset is shown, even when there is no edit within the current view. I consider the current implementation of the history tab not very well considered.

xybot downloads the worldwide changes of the last 24 hours. This data is piped through the xybot scripts and obvious errors are corrected. A way or relation has no direct spatial information, this is given indirectly by the nodes a way or relation refers to. Splitting the changesets into chunks covering smaller areas would impose other problems:

  • Checking this spatial data would be very expensive for the OpenStreetMap infrastructure.
  • What is the "right" size of a changeset? 10m², 100m², 1km², 10km², 100km², 1000km²
  • The only solution to really fix the spamming of the history tab the way the history tab is implemented right now would be to put each change in its own changeset. But that's what we had in api 0.5 and would leak changesets ad absurdum and i strongly believe a single bot-run should result in a single changeset.

in my eyes there are three solutions for the problem of history spamming:

  • Evaluate the "bot=yes" tag in changesets in the history tab. xybot sets this tag in all its changesets. In consequence all bots should set this tag. Drawback will be, that there could be malicious bot changes not being spotted, because people set this filter!
  • Add a bot-flag to bot-accounts and filter those changeset in the history. Same drawback as above.
  • Replace the history tab with something like on ito-world osmmapper (including the rss-feed of changes).

Suggestions

if you have any ideas what xybot can do in your area, please feel free to suggest them on the discussion page. the suggestions should look like:

  • when a key is misspelled:
 
"buidling" => "building",
"bycicle"  => "bicycle",
  • when a value is wrong or has foreign language values with exact matches to the defined value sets
"landuse|farm_yard"  => "landuse|farmyard",
"leisure|spielplatz" => "leisure|playground",
  • when a key/value pair is wrong or misspelled
"landuse|wasser"      => "natural|water",
  • when a key/value pair is wrong or misspelled and should be expanded to multiple key/value pairs (2-n)
"denomination|kirche" => "amenity|place_of_worship#building|church",