From OpenStreetMap Wiki
Jump to: navigation, search
Sophox maxspeed example (try it live)

Sophox is a tool to find and cleanup OSM database issues using expertise of the entire community. Sophox can be used in several distinct ways: it can function as a challenge manager, similar to Osmose and MapRoulette. It can also let power users do complex query-driven search and replace tasks. Lastly, it can serve as a bot evaluation platform.

  • The developer view of the tool is located at
  • The Quick fixes page contains a large collection of tasks, might not have been reviewed yet by the community.

Sophox design goals

  • Allow anyone to propose new cleanup tasks, share and discuss them on the OSM wiki.
  • Allow contributors with the local expertise to efficiently review proposed changes.
  • Simplify bad edit reverts - all changes made as part of a task are tracked with the task-specific changeset tags, allowing for an easy recovery in case of an error.
  • For unproven tasks, vote on changes before saving them to the OSM database
  • Once the task has been shown to have no false positives, provide a safe and bug-free migration path for some bot operator to run it automatically.
  • Allow multiple choice changes in addition to the accept/reject tasks.

Monitoring and Bad Edit Recovery

Despite everyone's best efforts, mistakes sometimes slip in, and we need an efficient way to find and fix them. Sophox changes are very easy to revert because all changesets are marked with the task_id tag. So in case we need to quickly revert, simply look for changsets with generator=Sophox, created_by='Sophox <version>', task_id=my_bad_task.

Creating Tasks

Sophox tasks are defined using SPARQL queries. Similar to SQL, SPARQL query produces a table of proposed changes (rows). Each row must include the ?id (OSM feature ID) and ?loc (location of the object). Other columns specify the needed changes -- tag names and their desired values.

For example, let’s say we would like to replace identical maxspeed:forward=* and maxspeed:backward=* tags with the maxspeed=* tag. First, we need to find all features that have both of these tags, and whose value is the same for both. In SPARQL, this would be done with two statements. Because we used the same variable name for both statements, the results will only match when both maxspeeds are equal.

?id  osmt:maxspeed:forward   ?maxspeed .
?id  osmt:maxspeed:backward  ?maxspeed .

Additionally, we should check if maxspeed=* tag exists, and if its value is different. Our result should only include features that either don’t have a maxspeed=* tag (the ?opt_maxspeed variable is unbound), or when it has the same value as the other two tags.

OPTIONAL { ?id osmt:maxspeed ?opt_maxspeed }
FILTER( !BOUND(?opt_maxspeed) || ?opt_maxspeed = ?maxspeed )

Lastly, we need to include the feature's location on the map. Note that unlike tags, which use osmt: prefix, location is part of the feature metadata, so it uses osmm: prefix.

?id  osmm:loc  ?loc .
By now we have ?id and ?loc to identify and position the feature, plus the value in the ?maxspeed variable. Now we need to tell the engine how tags should be changed. The tag name is given by the ?tag_* variables, e.g. ?tag_1, ?tag_2, etc. (?tag_00 -- ?tag_99). So to add a new tag, we set the value of ?tag_1 to the constant value osmt:maxspeed:
(osmt:maxspeed as ?tag_1)

Because we have used ?tag_1, there must be a corresponding tag’s value, specified with ?val_1. We alias our existing ?maxspeed value here:

(?maxspeed as ?val_1)

This change will be ignored if the feature already has a maxspeed tag with the given value. To delete a tag, set its value to false:

(osmt:maxspeed:forward as ?tag_2)  (false as ?val_2)

Additionally, Sophix needs some metadata to operate. Each task should have a taskId, so that if user votes or rejects a change, it can be associated with the query. Also, the task needs a comment to add to the changeset. These and other values can be set using a JSON object. NOTE: In the query, this text must be on a single line. Here we break it up for readability. The “vote” requires users to vote on the change before changing it (two person agreement)

  "comment":"Replacing identical maxspeed:forward and maxspeed:backward with maxspeed"

Putting it all together, we end up with this query. Click "run it" to see the results, or edit query to view it inside the tool.

#defaultView:Editor{"taskId":"josm_same_maxspeed_fwd_bk", "vote":true, "comment":"Replacing identical maxspeed:forward and maxspeed:backward with maxspeed"}
  ?id  ?loc
  (osmt:maxspeed as ?tag_1)           (?maxspeed as ?val_1)
  (osmt:maxspeed:forward as ?tag_2)   (false as ?val_2)
  (osmt:maxspeed:backward as ?tag_3)  (false as ?val_3)
  ?id osmt:maxspeed:forward ?maxspeed .
  ?id osmt:maxspeed:backward ?maxspeed .
  ?id osmm:loc ?loc .
  OPTIONAL { ?id osmt:maxspeed ?opt_maxspeed }
  FILTER( !BOUND(?opt_maxspeed) || ?opt_maxspeed = ?maxspeed )
Run it (edit query)

Multiple Choice Tasks

Sophox multiple choice example (try it live)

In some cases, we cannot describe the change in terms off accept vs reject. Instead, we have a set of well known choices, and allow users to research and choose the correct one, or reject them all. For example, sport=diving has been deprecated and needs to be replaced with either sport=scuba_diving or sport=cliff_diving.

To enable this mode, change the tag/value column names from ?tag_1 + ?val_1 style to ?tag_a1 + ?val_a1. The letter indicates the group of choices (a-z), so if the user picks group B, all ?tag_b* / ?val_b* will be applied.. For our example, the task has two choices, so we can use groups A and B - the ?tag_a1 + ?val_a1 will set sport=scuba_diving, and the ?tag_b1 + ?val_b1 will set sport=cliff_diving.

Lastly, we must define what we want to call groups A and B using the "labels" parameter inside the editor configuration at the top.

#defaultView:Editor{"taskId":"depr_sport_diving", "vote":true, "labels": {"a":"Scuba diving", "b": "Cliff diving"}, "comment":"Replace sport=diving with either sport=scuba_diving or sport=cliff_diving"}
  ?id  ?loc
  (osmt:sport as ?tag_a1)  ('scuba_diving' as ?val_a1)
  (osmt:sport as ?tag_b1)  ('cliff_diving' as ?val_b1)
  ?id osmt:sport 'diving' .
  ?id osmm:loc ?loc .
Run it (edit query)


Rejections and votes are stored in the same RDF database as OSM features, which means your task can access this information as part of the regular query. When voted, the following data is added. The vote value is one of these: osmm:pick_no (change rejected), osmm:pick_yes (accepted), osmm:pick_a .. osmm:pick_z (multiple choice vote).

# taskURI is <>
# userURI is <>

osmnode:123  osmm:task    <taskURI> .
<taskURI>    osmm:taskId  "my_task_id" .
<taskURI>    <userURI>    osmm:pick_yes .
<taskURI>    <userURI>    <date> .

For example, you can use this extra condition with the above query's WHERE clause to show only the rejected changes:

?id    osmm:task    ?task .
?task  osmm:taskId  "josm_same_maxspeed_fwd_bk" .
?task  ?user        osmm:pick_no .