User:Zverik/API 0.7 Proposal
We should move in small steps. I propose to postpone "area" step to 0.8, and instead focus on small improvements, mainly in API and validation. Please discuss those suggestions on the talk page.
- 1 Changing API
- 1.1 HTTP result codes
- 1.2 JSON output
- 1.3 Remove AMF controller
- 1.4 A call for deleted objects in bbox
- 1.5 List of parent objects when they are absent from output
- 1.6 Ignore updates that do not change the object
- 1.7 If-modified-since
- 1.8 Unify capabilities data
- 1.9 Filter GPS traces by date
- 1.10 Private Messaging API
- 2 Changing data and API
- 2.1 Restrict charset for keys and roles
- 2.2 Trim and forbid empty values, forbid special chars
- 2.3 Forbid uploading empty relations
- 2.4 Ways validation
- 2.5 Last member modification date
HTTP result codes
Currently used codes, and the fact we use them instead of producing readable error messages, are hacky and very hard to use. And some of them are plainly wrong, that is, do not correspond to the HTTP specification. I propose to drastically reduce the number of used codes, basically to 200, 404 and few others, and introduce a return format for errors, which would be produced under a single 4xx code. Maybe leave more of codes, but still, produce an XML with a human-readable description and more information. For example, return the last version of a deleted object, instead of 410.
The list is a subject to discussion. We can leave them as is, but still provide a proper XML or JSON content with error messages.
Make JSON a first-class (well, second: extracts will still be in xml only) output format: all error messages should be output in json (readable with js), data in json, history calls and so on.
Remove AMF controller
AMF controller is used solely by Potlatch 1, which is long obsolete (since October 2012). The only reason it is still used now is for its undelete feature, which has no alternatives, because only AMF controller implements it. The one drawback is obvious: it is still possible to ruin objects with Potlatch 1 unintentionally. Also data for undeletion hogs current_nodes table, since deleted versions have to be stored there. Removing AMF controller would clean up that table, allowing for better constraints, e.g., for ways to contain only existing nodes.
As for undelete functionality, see #A call for deleted objects in bbox.
A call for deleted objects in bbox
This call would replace AMF controller's deleted objects call. Maybe it is not needed, due to recent advancements to Overpass API and OWL.
- #Remove AMF controller
- New OWL source code
- New OWL server (offline most of the time)
- Ovepass API annoucement with history-related functions
List of parent objects when they are absent from output
Most relation breakage comes from editors not having information on containing relations for objects outside "downloaded zone", e.g. downloaded individually, or as members of a relation. This can be avoided, if all parent objects (both ways and nodes) not included in output would be mentioned in their member objects. I propose such tag (<parent type="..." ref="..." /> or <in ... />) to be included in OSM XML output, just for absent objects (so it won't affect planet dumps).
This change would affect both map calls (for nodes in ways outside bbox), /full output and individual object requests.
Ignore updates that do not change the object
There are versions that do not differ from previous versions by a bit. Uploading such new versions should be silently ignored. This would require more database requests at the upload stage.
Favor "If-modified-since" HTTP request header, so objects returned are filtered by modify date. This won't allow less database requests (since member objects can be modified without updating parent object's version), but will produce shorter output for modified objects. Basically, it would allow updating hundreds of thousands objects, if server load for that is not too high.
If #Last member modification date is implemented, this request would be much easier to process.
Unify capabilities data
Current capabilities output is horrible: we add a tag for every new value, and if there is min+max, tags could become puzzling, not to mention it all actually changes schema each time we need to add a capability, while the API version does not increase. I propose (well, actually relay the Gubaer's proposal) to make it into key-value pairs (capability can be replaced with tag):
<capabilities> <capability k="min_version" v="0.6"/> <capability k="max_version" v="0.6"/> <capability k="max_area" v="0.25"/> <capability k="max_tracepoints_per_page" v="5000"/> <capability k="max_nodes_per_way" v="2000"/> <capability k="max_changeset_size" v="50000"/> <capability k="timeout" v="300"/> </capabilities>
Filter GPS traces by date
Add &after=YYYY-MM-DD parameter to /trackpoints call to filter points by date. It obviously won't return traces which have no date.
Private Messaging API
Getting number of unread messages is not enough. We need a full-featured messaging API, allowing reading and sending of private messages. This would require a separate OAuth permission, so it doesn't surprise users.
Changing data and API
Restrict charset for keys and roles
By now, the tagging rules are mostly fixed, with all of processed keys fitting in a very small subset of characters: basically, "[a-z][a-z0-9:_]*[a-z0-9]" (A and B in taginfo stats, see link below). I propose to restrict characters allowed in keys to those, maybe with more characters added, and fail uploads with other characters in keys.
There are two ways to fix existing keys. It is agreed that these not covered by the regexp above are "wrong", that is, not processed by any known map style, router or geocoder. Again, see statistics for examples.
Uppercase ascii characters would be converted to lower case.
Tags with non-ascii characters in keys would be deleted: there is no simple way to convert them. We would provide a list of objects with such keys and offer a period to fix them.
Spaces and other characters would be converted to underscores ("_"). There are cases when a mapper should have used semicolons instead of spaces, such cases should be caught and fixed manually by users, with help of some statistics tool.
Trim and forbid empty values, forbid special chars
Values with spaces at start or end are erroneous: those spaces bear no information. Also, newlines, tabs and other special characters in values (\0!) puzzle editors and cannot be edited correctly. Values should be trimmed, and special characters should produce an error on upload.
All values should be trimmed of whitespace characters.
There should be a list of values with special characters in them (I'm sure it won't be long), so mappers can fix them manually. After transition, special characters should be replaced with spaces.
Forbid uploading empty relations
Just like ways, empty relations (with or without tags) should be forbidden to upload, because those are not placed on the map and cannot be retrieved later, except by an ID. We have geospatial database, after all.
All empty relations as of now should be extracted to a stand-alone XML for history, and removed from the database.
Check on upload time that ways have at least 2 nodes and there are no sequential duplicates (ABCCDEF) in it.
Ways that have 0 or 1 nodes should be deleted from the database. Duplicates of sequential nodes in ways should be removed.
Last member modification date
Each way and relation should have last member modification time field. It should be updated either on member update (so each object upload would invoke two db queries for nodes or one for ways and relations), or separately, for example once every ten minutes.
It should be included as an attribute to ways and relations, "modified" or something.
A field should be added to all object tables, and a single query should be run to fill it from current data, even for deleted objects.