OSM Protocol Version 0.6
From OpenStreetMap
During the Monitoring and Rollback Hack-a-thon London it was determined that to really solve a number of problem some extra fields would be necessary and while we could accomodate older clients, the complexity required would be more than just requiring clients to make some minor modifications.
- People making clients, see #Summary of changes required in clients
- People interested in the DB see #Related database improvements
- There's also a #Todo List
Contents |
Preliminary 0.6 Sketches
- Support for changesets, including creating and closing, tracking comments and user agents
- API to return current version number of every object
- Any object change to contain old version number (optimistic locking) - update fails if version does not match
- Any object change to contain changeset id
- The user field moves from the individual objects to the changeset
- DELETE to get payload as result of above requirements (does not have payload in 0.5)!
- Support for diff uploads with transactional support
- Improvements to allow the possibility of HTTPS in the future
- Support for transactional database updates
- Limiting trackpoints age [1]
- Can I suggest that limiting the age of trackpoints is not always desirable. It is very useful to be able to average out errors over two or more tracks. Some areas are still very sparsely populated with trackpoints. Some older tracks will still be perfectly valid. I agree, however, that in certain areas it would be useful to mark trackpoints as out of date - e.g. if the road layout changes. Richard B 14:33, 7 May 2008 (UTC)
- This doesn't require an API change. --GabrielEbner 09:06, 8 May 2008 (UTC)
Changesets
To make it easier to identify related changes we invented the concept of a changeset. The idea is you create a changeset, then make all your changes in the changeset and finally it is closed. The changeset will group all the changes and store information related to the useragent, user and any other relevent tags.
Changesets are specifically *not* atomic. Given how many changes might be uploaded in one step it's not feasible. Instead we opted for optimistic (client-side) locking. Anything submitted to the server in a single request will be considered atomically, hence the need for diff uploads. Hence you cannot rollback a changeset.
Changesets facilitate the implementation of rollbacks. By providing insight into the changes committed by a single person it becomes easier to identify the changes made, rather than just rolling back a whole region. Direct support for rollback will not be in the API, instead they will be a form of reverse merging, where client can download the changeset, examine the changes and then manipulate the API to obtain the desired results.
To support easier usage, the server will be storing a bounding box for each changeset and allow users to query changesets in an area. This will be calculated by the server, since it needs to look up the relevent nodes anyway. As an optimisation the server will create a buffer slightly larger than the objects to avoid having to update the bounding box too often. Client should note that if people make many small changes in a large area they will be easily matched. In this case clients should examine the changeset directly to see if it truly overlaps.
PUT /api/0.6/changeset/create
Payload:
<osm>
<changeset>
<tag k="created_by" v="JOSM 1.61"/>
<tag k="comment" v="Just adding some streetnames"/>
...
</changeset>
</osm>
+ start/end time, random "tag" elements, bound element?
Only the "comment" tag is required, everything else is optional. The server will fill in the user and start_timestamp
Returns:
Changeset ID
PUT /api/0.6/changeset
For updating e.g. commit comment
to be used for updating the tags, the user and start_timestamp are immutable. Cannot be used after the changeset has been closed.
PUT /api/0.6/changeset/#id/close
Closing a changeset. As payload it contains the final settings.
GET /api/0.6/changeset/#id
This will return the changeset data. What has not been decided is:
- Is the list of objects returned immediatly, or should that be a seperate request?
- Should the contents of these objects be returnd, or is this a seperate request?
- Are the nodes that are part of a way part of the changeset?
- Should the server return both the old and the new version so client can quickly decide what was actually changed? Perhaps seperate prior/after calls?
Querying changesets
There will be an API for querying changes, the actual methods have not yet been decided. At the least it should support:
- Querying by bbox and a timestamp range
- Querying by user (anonymous users excepted)
The exact implementation of these is to be guided by the requirements of rollback and other uses we find for changesets.
Version numbers
The server currently supports version number for most objects (not nodes, that that is one of the planned DB changes). These versions numbers are not exposed anywhere. It was decided that it would be useful if clients could see then directly to know unambiguously which is the correct version, instead of relying on timestamps.
At the same time we realised this could be used for optimistic locking, so this will be used also. So the changes are:
- In the planet dump and in the API the version number for all objects will be exposed
- To upload a new version of an object, the client will need to provide the version of the object it is modifying. If the version is not the most recent an error will be returned (which error code?)
- Clients will be able to ask for specific versions of any object
The change for clients is to remember the version number they are given and too feed it back on upload. Error handling is an optional extra.
If an editor allows the same object to be uploaded multiple times in an editing session then it will have to take note of how the version changes on each upload. In this situation a given changeset will increment the version number of an object more than once. The history will then have the same changeset ID recorded against multiple versions of the object.
Reliably identifying users
The previous (0.5) API return the user display name. The user can update this at any time and there is no history stored for display name changes. This means there was no way to reliably identify which user made a specific change. The 0.6 API will include the numeric user ID of the account in addition to the display name. e.g.
<node id="68" ... user="fred" uid="123"/>
This still requires the user to have made his edits public. User id for anonymous users will not be visible.
Why specifying user and uid? Uid should be a ref to another object. For example:
<users> <user uid="123" user="fred" /> </users>
--Skinkie 17:41, 5 June 2008 (UTC)
Modifications related to the above
GET /api/0.6/{node|way|relation}/#
Node/way/relation objects will return the current version id and user id (uid) in addition to everything else, e.g.
<way id="1234" user="fred" uid="123" timestamp="..." version="4"> ... </way>
PUT /api/0.6/{node|way|relation}/{id|create}
need to include attributes changeset="id" and (except create) version="id"
DELETE also needs to have the new attributes, and hence will require a payload similar to an update
Calls that modify existing objects will return a plain-text new version number (currently return nothing). The client needs to store this as the new version and provide it is the user subsequents modifies the same object again.
Diff uploads
Currently there are a number of problems relating to sending each change as a new request:
- It makes support HTTPS impossibly expensive
- Cannot guarentee transactionality since a transaction would be open-ended
Hence it was decided to allow client to upload chunks. Everything in a chunk could be supported in a single transaction. And less connections means HTTPS becomes feasable. The different Change file formats we considered, we decided on OsmChange since it seemed closest to what was required. In particular the need for placeholders to allow changes to refer to object created in the same upload.
There will probably be a maximum size to the diffs to prevent a single client holding the server too long. The limit will have to be determined after seeing practical usage. There is also the possibility of truly large changesets to be post for offline processing.
PUT /api/0.6/map
To upload OSC file as per OsmChange specification, with the following enhancements:
- add changeset_id attribute to <osm> tag
- add version attribute to each <node>, <way>, <relation> (except for "create")
Returns:
412 precondition failed + Error: header if the change could not be applied in full
Otherwise, 200 ok with the following payload:
<osm> <node|way|relation old_id="#" new_id="#" new_version="#" /> ... </osm>
with one element for every element in the upload.
- For "create" commands, the new_version will be 1, the old_id will be the negative id specified on upload, and the new_id will be the id assigned by the API.
- For "modify" commands, the new_version will be the new version number assigned by the API (always at least version+1), and old_id and new_id will be identical to the id specified on upload.
- For "delete" commands, the new_version and new_id fields are unset, and old_id is the id specified on upload.
Changes in related software
Version information added to planet dump
The planet dump now includes a version="<number>" field for each object in the planet dump, e.g.
<node ... version="123" .../>
Related database improvements
I'm doing this from memory, a DB guru will have to correct the table/fieldnames - Kleptog
To support the above a number of database changes will be required. In particular:
- Adding changeset and changeset_tags tables
- The user_id field in the object tables will become the changeset_id field. Every existing user will get a initial changeset which has the same ID as their user_id and will contain all the objects they have at the moment. This makes upgrading easier.
- So we don't throw away existing created_by tags then? --Frederik Ramm 14:02, 5 May 2008 (UTC)
- Not in the current planning, no. While we decided it would be possible to retroactivly create changesets for the users and move the tags, we decided it wasn't worth the effort. -- Kleptog
- So we don't throw away existing created_by tags then? --Frederik Ramm 14:02, 5 May 2008 (UTC)
- The nodes table will get a version number and the tags will be split into a seperate table
- The history will be thrown away, again. Because... someone please justify this action
- The basic issue is that with the node tags being split into a seperate table, this table needs to be created. Also with the restriction of only one value per key a lot of the old data becomes invalid. This is the only reason as far as I know. I think that it's possible we could keep it, but I'm not the person writing the code... - Kleptog
- To put that into perspective, I have written the code to split the node tags into a separate table. This will not remove history. It's just multiple tags with the same key that might cause problems, but it should be relatively easy to modify the historical tags instead of throwing them away. As far as I remember, at the hackathon it was proposed to delete the history once again; but we had a general consensus to keep it (also to facilitate determining authors viz. copyright for an eventual switch to another license.) --GabrielEbner 13:30, 6 May 2008 (UTC)
- All the important tables will be moved to innodb to support transactionality. The fulltext indexes will be dropped to accomodate this.
- It is planned to create an unique index on the combination of object id and tag key, so that it will no longer be possible to have multiple identically named tags on the same object. This was possible until now but rarely used. The impact of this is being analysed.
Summary of changes required in clients
The changes are basically:
- Prior to starting an upload, ask the user for a comment message. Create a changeset, upload the the changes providing your changeset ID and the close the changeset when done
- Changesets may contain arbitrary tags. It is strongly recommended to cease adding "created_by" or "converted_by" tags to individual objects; just put these in the changeset instead.
- When downloading data the client will receive a version number. This number must be stored and provided when updating (or deleting); if it does not match the server's current verision number at the time the update/delete is made, the operation will fail.
- When uploading a change to an existing object, the editor will receive the new version number to be used for any subsequent update.
- Are there cases where new version number != old version number + 1? --Frederik Ramm 14:06, 5 May 2008 (UTC)
- Possibly no. But in the future with transactional updates we decided we couldn't necessarily rule out the possibility that numbers might be skipped, depending on where exactly the numbers were generated. So we decided to play it safe: less assumptions is good. -- Kleptog
- But it will always increase, won't it?--Bartv 11:47, 19 May 2008 (UTC)
- Possibly no. But in the future with transactional updates we decided we couldn't necessarily rule out the possibility that numbers might be skipped, depending on where exactly the numbers were generated. So we decided to play it safe: less assumptions is good. -- Kleptog
- Are there cases where new version number != old version number + 1? --Frederik Ramm 14:06, 5 May 2008 (UTC)
Clients which only download data for futher processing do not need to do anything, unless they wish to use the newly available information somehow.
For bonus points you can implement diff uploading for performance.
Potlatch
Potlatch is a different beast to most users of the API and the above proposal was made in the presense of the main Potlatch developer and it was on our minds during the whole discussion.
In particular, what potlatch will do is open a changeset at the beginning of a session and every change will simply add to that changeset. How this will merge with the requirement for comments and when to switch to a new changeset are outside the scope of this document.
Also, the introduction of the diff upload is expected to reduce the differences between the main API and the API used by potlatch. It is expected the potlatch will also gain the advantages of transactionality in the process.
One thing not completely worked out is potlatch's "delete way" message which means "delete way and the following nodes unless they're required by another way". There is a conflict here if we want diff uploads to be atomic. The issue is that potlatch does not know if node outside the view area use by anything else. (JOSM has this problem also incidently, but in JOSM you generally delete the way without deleting the nodes outside yor view).
Phased introduction
It was noted that this API update easily falls into two peices:
- creating the concept of changesets and exposing version numbers
- querying changesets and diff uploads (+ doing reverts based on these)
The first part is what requires the incompatable API version change, database changes and required changes to clients. The second involves optional extra features but doesn't affect any clients unless they want to use it. Hence it was proposed to possibly rollout the first part in a single update with all the hassle it entails, and then once the system is working again, add the extra parts as they become available.
Todo List
- Discuss impact of proposed changes on dev mailing list
- Changeset and version number support in JOSM - done (Martijn)
- Diff upload in JOSM - Martijn
- Diff upload support in API - Gabriel - done (needs integration with the changeset code)
- Version numbers in API current tables - done
- Modify all OSM object controllers to check incoming version numbers and fail if != current - Fred
- Completed for update, not for delete -- User:crschmidt
- Modify all OSM object controllers to work with changesets - verify that changeset exists, and update bounding box of changeset if required - Fred
- Object access to support GET /api/0.6/{node|way|relation}/id/version
- Basic changeset support (create/update/close) in API - Fred - done
- This was not done: close did not exist in SVN until I checked it in a couple hours ago... -- User:crschmidt
- Changeset access in API (find changesets with certain properties/bbox; return full data for changeset)
- I've added a changeset/:id method + route: to_xml on the changeset model needs more work, and there is no query-changeset stuff yet. -- User:crschmidt
- Potlatch stuff - probably replace "delete way with all nodes" by a diff upload? unclear how Potlatch would then get to know why it failed
further, less important stuff
- Osmosis to keep version number info, use version numbers for conflict resolution
- Osmosis to emit and parse usir ID information.
- Add uid into attributes returned by all API GET features
- Add support for version and uid into weekly planetdump code - done (Jon)
- Add support for version and uid into planetdiff code - done (Jon)
- Testing
- Create unit/functional/integration test so that you can run rake test and know that the api hasn't been broken. Shaun has started this, all changes to the tests should now pass, coverage, and good test data is the next steps.
- Import a bunch of data into MySQL based on the 0.5 api, then run the migrations scripts. TomH has just had problems with this. It will need to run on the full data set without issue.

