Overpass API/Augmented Diffs
The augmented diffs extend the ordinary minute diffs. They make it possible to keep even specialized databases up to date minutely. They are based on an idea by Matt Amos, refreshed by Frederik in a talk in German.
This feature is still experimental. This means on the one hand that the service may change or be suspended at any time, on the other hand that suggestions are welcome as the format and content of the augmented diffs will be improved on good suggestions.
The augmented diffs can be downloaded from http://overpass-api.de/augmented_diffs/
Contents |
Overview
The augmented diffs contain three kinds of data
- The new data that comprises the usual minute diffs
- The old data that is either replaced by new data or explicitly deleted
- Context data: ways and relations that have modified objects as members and for all contained ways and relations also all their nodes and members
This allows in particular the following two applications
- Proper minute updates on datasets restricted by a bounding box
- A wayback mechanism: You can start with a planet file or other database and walk back instead of forward through time
While the use of the former is immediately clear, the second needs some explaining words: A couple of examples from the talks above like the counting application need to know what is deleted to decrease the counter.
Contained data
first version
Each diff is structured in three sections, one for each type of element. Within a section, the elements are ordered by id. If an element appears in multiple versions, these appearances are ordered by version. Each element is wrapped in erase, keep, or insert, depending on whether the element is added (insert only), deleted (erase only), changed (delete with old version and insert with new version), or not changed (keep only).
In the preamble, the file contains the end date of the diff.
<?xml version="1.0" encoding="UTF-8"?> <osmAugmentedDiff version="0.6" generator="Overpass API"> <note>... </note> <meta osm_base="2012-08-26T20\:24\:02Z"/> <!-- Elements are ordered as: nodes first, then ways, then relations. Within each class of elements they are ordered by id --> <erase> <node ... /> <!-- contains the nodes that are either explicitly deleted or the old version of nodes that are replaced by a new version. --> </erase> <keep> <node ... /> <!-- contains the nodes that belong to changed ways or relations including ways that contain a changed node or to ways that are members of a changed relation. --> </keep> <insert> <node ... /> <!-- contains the nodes that are updated in their newest version. --> </insert> <erase> <way ... /> <!-- contains the ways that are either explicitly deleted or the old version of ways that are replaced by a new version. --> </erase> <keep> <way ... /> <!-- contains the ways that contain a changed node or ways that are members of a changed relation. --> </keep> <insert> <way ... /> <!-- contains the ways that are updated in their newest version. --> </insert> <erase> <relation ... /> <!-- contains the relations that are either explicitly deleted or the old version of relations that are replaced by a new version. --> </erase> <keep> <relation ... /> <!-- contains the relations that contain a changed node or way (including ways changed only by changing their underlying nodes). --> </keep> <insert> <relation ... /> <!-- contains the relations that are updated in their newest version. --> </insert> </osmAugmentedDiff>
second version
Discussed and developed during Karlsruhe_Hack_Weekend_October_2012. Still work in progress...
<osmAugmentedDiff version="XXX" generator="Overpass API"> <note>... </note> <meta osm_base="2012-08-26T20\:24\:02Z"/> <!-- contains one explicitly deleted node, way or relation --> <action type="delete"> <old> <node ... /> </old> <new> <node ... visable="false" ... /> </new> </action> <!-- newly created nodes, ways or relations --> <action type="create"> <node ... /> <!-- or way or relation --> </action> <!-- contains nodes, ways or relations with changed tags or geom or members --> <action type="modify" reason="changed tags"> <!-- reasons for nodes and ways: changed tags, changed geom reasons for relations: changed tags, changed geom, changed members --> <old> <node ... /> <!-- or way or relation --> </old> <new> <node ... /> <!-- or way or relation --> </new> </action> </osmAugmentedDiff>
All way and relation elements contain a additional bounds element:
<way ... > <bounds minlat="..." minlon="..." maxlat="..." maxlon="..."/> ... </way>
id-sorted
The id-sorted version contains one additional action type named info. It allows to combine two id-sorted files into a new id-sorted file without loading both files complete into memory.
<!-- contains the nodes that belong to changed ways --> <action type="info" reason="used by way"> <node ... /> </action> <!-- contains the nodes or ways that belong to changed relations --> <action type="info" reason="used by relation"> <node ... /> <!-- or way --> </action> <!-- contains the nodes that belong to a member ways of changed relation --> <action type="info" reason="used indirectly by relation"> <node ... /> </action> <!-- contains the nodes or ways with more then one of above reasons --> <action type="info" reason="used by multiple"> <node ... /> <!-- or way --> </action>
All nd elements inside of way elements contain additional lat and lon attributes:
<way id="..." ...> <bounds ... /> <tag ... /> ... <nd ref="..." lat="..." lon="..." /> <nd ref="..." lat="..." lon="..." /> <nd ref="..." lat="..." lon="..." /> ... </way>
geo-sorted
The other version isn't strictly sorted by id which allows nested elements, so you don't need memory to access referenced objects. This could lead to redundant information inside one file, but it's zipped anyway.
All nd elements inside of ways elements are replaced by full node elements:
<way id="..." ... > <bounds ... /> <tag ... /> ... <node id="..." version="..." ... changeset="..." lat="..." lon="..." /> <node id="..." version="..." ... changeset="..." lat="..." lon="..." /> <node id="..." version="..." ... changeset="..." lat="..." lon="..." /> ... </way>
Same for ways or nodes being members in relations:
<relation id=".." ... changeset="..."> <bounds ... /> <way id="..." ... role="..."> <bounds ... /> <tag ... /> ... <node id="..." version="..." ... changeset="..." lat="..." lon="..." /> <node id="..." version="..." ... changeset="..." lat="..." lon="..." /> <node id="..." version="..." ... changeset="..." lat="..." lon="..." /> ... </way> <node id="..." version="..." ... changeset="..." lat="..." lon="..." role="..."/> <node id="..." version="..." ... changeset="..." lat="..." lon="..." role="..."> <tag ... /> ... </node> <member type="relation" ref="..." role="..."/> or <rel ref="..." role="..."/>? <tag ... /> </relation>
Nodes which are created together with a new way only appear inside the way and have no own action[@type='create'] block.
Time slices and numbering
Basically, an augmented diff is produced once per minute, like the minute diffs. As with the minute diffs, a single diff may contain more than one minute to allow the server to catch up. Furthermore, there is no one-to-one match of minute diffs and augmented diffs. An augmented diff may contain more than one minute diff, but a minute diff is never broken up and distributed over more than one augmented diff.
The augmented diffs have a numbering scheme similar to the numbering scheme of the minute diffs but have their own counter.
The augmented_diff API call
On overpass-api.de exists an API that allows to filter the Augmented Diffs already on the server. The base URL is
http://overpass-api.de/api/augmented_diff?
It accepts the following three arguments:
- id= is mandatory and is the id of the Augmented Diff to process
- bbox= is optional and is a min_lon,min_lat,max_lon,max_lat limited bounding box of the area of interest
- info= defaults to yes. With info=no, all blocks of action type info are omitted.
The info=no allows to drastically reduce the file size but contains still the necessary information to show way geometry: The coordinates are replicated in the nd tags of the ways. On the other hand, minor changes to country boundary relations happen quite often, and info=no then sends only the geomtry of the affected way, not the geometry of the entire relation.
The id of the last produced Augmented Diff can be obtained from
http://overpass-api.de/augmented_diffs/state.txt
The id of the Augmented Diff belonging to a certain date can be obtained via
http://overpass-api.de/api/augmented_state_by_date?
It takes as only parameter osm_base. This must be an OSM formatted time stamp. The call returns the id of the newest Augmented Diff that is older than the given date.
Applications
Software and Services using Augmented Diffs:
- osmconvert (since 0.7E): The program can now read Augmented Diffs and convert them to .osc. It also can process them and update an existing .osm or .pbf. All this is highly experimental, you should expect one or two bugs... :-)
- Osm-watch: osm-watch is an OSM contributions almost real-time monitoring tool with filters based alerts that you can created and receive by RSS or email.
- achavi: Augmented Change Viewer - visualizes updates to OpenStreetMap
- Show Me The Way on OSMLab