Overpass API/development

From OpenStreetMap Wiki
Jump to: navigation, search

Long-term ideas

These long term ideas give a direction for the development. We can now assess potential features whether they help with the long-term ideas or not and priorize features that help most.

The order does not imply any priority.

Multiple sources

Support data from other sources in a similar manner than the main database. And allow in particular to blend it on the fly. I'll add multiple examples with a wide range of requirements.

The first application are imports: Let us recall why they are controversal, beside license issues. On the one hand, they add data on a large scale and may be a crucial step to improve data completeness. On the other hand, they may harm the community - the data looks complete, hence there is not much fame to earn left. In particular, all work looks tiny compared to the data amounts of the import. Caring for existing data in conflict is a huge problem, updating the imported data later on can be even worse.

Now assume that you keep the import in a separate database but that you can blend it on-the-fly with the main database by using Overpass API: You get the full advantage or more complete data, but no conflict or update problems - the separated data acts only as fallback. The data can be updated in its own cycle. And no user with giant activity appears in the OpenStreetMap logs.

A second application is a planning process. Assume you are a group of citizens that would like to challenge or change a building project. Then you would need now to cut out a set of OSM data. And then you can only use tools that do not require a background database. Or you need to set up a complete database stack with The Rails Port and so on.

With the multiple sources approach, you could model only the planned change and blend it on the fly on top of the existing OSM data. This allows, in particular with the rendering and routing tools, to set up a full blown presentation of the building project's impact or the impact of potentially better alternatives.

A third application is isolating a changeset or a single change. If you could blend in the inverse change on the fly, you could see much more easily what the impact of that change was and properly attribute contributions or aspects of contributions.

Finally, the below explained Thank-You-Plugin could profit from the changes.

Steps to implement this feature will include:

  • A concept to handle conflicts: ids could be equal on accident
  • A syntax to select multiple sources and probably from multiple sources inside the query
  • Rules and an engine to mix the data from multiple sources and resolve conflicts

Backing renderers

Add a toolset that makes it easy to render tiles on the fly.

Client-side rendering is still dragging behind in OpenStreetMap. But there are a couple of useful things you could do with client-side rendering. One immediate advantage is the possibility to add arbitrary zoom levels. Another is that you could easily alter your stylesheet. And a third is that you could compress data far better than for tiles.

And don't forget that this is a essential part of the planning tools suite mentioned above.

The third feature requires that the server already pre-filters data that is unecessary for the client. Beside selecting classes of objects this could also mean a couple of other things:

  • Coarse Coordinates: Simplify Coordinates on the fly, but with respect to the tagging of the objects.
  • A cache to offer data for low zoom levels.
  • Filter and rewrite tags: Tags that the renderer won't show should not be sent at all.

Select by Route

Make it possible to retrieve the shortest path between two locations.

We don't want to compete with the routing engines. But we want to be able to answer questions about what impact a change of the routing parameters does.

We want to answer questions like:

  • What supermarket (or other POI) requires the least extra time?
  • Can I get a touristically worthy route with small extra time?

We want to support or draw reachability polygons.

We want to answer: If I take this road instead of the proposed, how much extra time does that add? This could help to try out a completely new type of routing: When you see a traffic congestion but still have a road you can turn into, you can get the information if that turn is useful or not.

We could even experiment with probability based routing: Instead of asking for the in theory shortest way - the way with earliest expected arrival time, we ask for a way that offers the highest probability to still arrive on time.

This requires Overpass API to:

  • measure lengths
  • offer a basic toolkit to work with numerical values
  • assign tags with computed values to elements
  • snap points to nearby ways and to create derived objects there


An editor should show for a changeset its expected impact before or during upload.

This allows data users to thank mappers for their work with immediate hence automatic feedback. It gets clearer on an example:

Assume somebody has

  • added a restaurant called "Maison du chef"
  • its parking site as public parking site
  • changed the road to "access=designtaion"

Then the plugin could tell the user: - The main map (link to the correct desination), the special restaurant map (link), and 42 other maps (see list) say thank you because they can now show an additional restaurant and an additional parking site. - Nominatim is grateful because it learned a 9th item of name "Maison du chef" which is a restaurant. - The parking map is grateful because it learned about a new public parking site without fees (Is it public? It is so close to the restaurant you also have modified) - The routing engine OSRM is grateful because it will no longer route people on error through a desination street. - The routing engine Graphhopper is grateful because it will no longer route people on error through a desination street.

For me as a mapper this would be an intrinsic incentive: I get feedback how my contribution is valuable. I even can get back information on probable mistakes in a polite way.

So why should I involve Overpass API in the whole story? A straightforward approach would be to ask all the map makers and tool writers to offer an API that evaluates any submitted changeset. However, most tool writers won't do that or not on time, because they have other priorities, and for good reasons.

If we process the changeset at Overpass API and augment it with useful context information then we can model a map or a web service with a stateless mock-up: This allows to write a response without messing up the source code of the project to represent. A web map mock-up would check whether a change takes place that is catched by one of its rendering rules. The Nominatim mock-up would be fed all name related changes. And a full blown tool integration API can be invoked whenever it is more mature than its mock-up.

Thus we can have an operational prototype from the first day and gradually improve it instead of requiring everybody to do work on their projects for mostly our project's merit.


The big ideas give rise to some features that could be implemented immediately. We split this up into multiple categories, depeding on how much changes for the user.

The order does not imply any priority.

API changes

All changes that make a change in the API. I require backward compatibility, hence all ideas here are enhancements.

Furthermore, every feature here needs tests, preferably with at least path coverage.

  • Backing renderers
* Coarse Coordinates
* Filter, add, rewrite Tags
  • Fun with numbers
* Length calculator
* Sorting
* Select by Route
  • Multiple sources
* Engine to mix sources
* Blending rules
* Enhance Changesets


All changes that are visible to a system administrator but not to an API user.

Furthermore, every feature here needs tests, preferably with at least path coverage.

  • Data compression: Use gzip on each individual block and change block management such that it can take advantage from the smaller file sizes.
  • Clean up number of executables: We have so many executables in the directories that server admins have got confused.


All changes that make no difference in the behaviour of the software.

In general, no tests are required. But one needs to do regression testing, i.e. to check whether all exisiting tests still pass.

  • Proper interface for output: All output should be concentrated in one class per output format. While Print_Target goes in the right direction, there are still some outputs, e.g. HTTP heder, not managed by Print_Target.
  • Data micro operations: The statements should be refactored such that they rely on a small set of micro operations on the data sets.
Examples for the micro operations are:
* get all objects with a certain index
* get the indices that are used in a set of objects
* filter all objects in a set with a function object that is called once for each object
There is also already some work done, but I don't have yet an idea how the complete interface should look like.

Everything but software

Everything that doesn't change code. Examples are a better documentation. By design, these things don't require tests.

  • Proper documentation: Given the size of the software, this might get a complete book in the end
  • Rework testing environment: There are a lot of automated tests at the moment and they do a good job to prevent regressions. But some tests fail on rouding errors. Some tests aren't integrated in the framework - the tests for attic features. And the files "generate_test_data" is quite a mess. Adding a continuos integration toolchain is valuable but won't solve any of the mentioned problems. I don't want to use a framework because this could cause versioning problem at any later time without much chances to fix it then.