The Future of Areas/Super Areas

From OpenStreetMap Wiki
Jump to navigation Jump to search

The important distinction between an area and a way is that for an area we are interested in the set of all points in the area's interior; with an area it's not enough to simply know where its boundary is. While this is a trivial issue for small areas that are entirely loaded into memory (like a city park), it becomes a problem when the area is huge, like the Atlantic Ocean.

The Problem

Currently, multipolygon relations are not used for oceans and continents because:

  • The resulting relation would be extremely fragile. A huge number of coastline ways build up the perimeter, and a missing node-link in any two ways, or the loss of any way would 'break' the coastline. (Instead, software processing implicitly rebuilds the perimeter rings using a left/right rule, and in practice there are gaps that have to be fudged.)
  • The resulting relation would be enormous, and a client would have to load a huge amount of data to process it.

Super Areas

A super area is an area made of other areas. The area contained within the area is the union of the internal areas. E.g.

<area id="1">
  <tag k="landuse" v="pond" />
  <outer>
    <nd ref="1" />
    <nd ref="2" />
    <nd ref="3" />
    <nd ref="1" />
  </outer>
</area>
<area id="2">
  <tag k="landuse" v="grass" />
  <outer>
    <nd ref="5" />
    <nd ref="6" />
    <nd ref="7" />
    <nd ref="5" />
  </outer>
</area>
<area id="3">
  <tag k="landuse" v="park" />
  <tag k="name" v="Grant Park" />
  <area ref="1">
  <area ref="2">
</area>

A few notes on this scheme:

  • Areas 1 and 2 are 'entities' in their own rights - one defines a pond, and one defines a grassy field.
  • Area 3 is a park, and its area is now the union of the two contained parks.

Advantages

Super Areas would be capable of representing a huge area via smaller sub-tiles. If we do a bounding-box query on part of a large super-area, we would get back:

  • The individual areas whose bounding box intersect the bounding box.
  • The super areas that intersect the areas.

With this model, we can get an area 'tile' that tells us how to treat the points within our mapping box and the super-area with a single set of meta-data (no meta-data duplication) but we don't have to fetch the entire large perimeter of the super-area.

Disadvantages

Since the sub-areas of a super-area could overlap, clients would have to be capable of handling area union operations to correctly interpret valid data. This might increase the complexity of clients doing relatively simple operations.

On the other hand, this can happen now - two closed disjoint multipolygon relations might not actually be disjoint.

The tiled data might be hard to work with for editors. Tiling is really more of a performance optimization, not data.

This scheme adds significant complexity to area processing.

Variations/Alternatives

Support tiling implicitly via the editing API, e.g.:

  • When we query a bounding box, we get back only part of an area's XML, enough to work with it.
    • If the area fully surrounds our tile and the tile is interior to the area, it is returned (but empty).
    • If none of our tile is in the area, it is not returned.
  • Only the parts of the area's perimeter that cross through our bounding box are returned.

(I admit, I have no idea how you would then edit and resubmit these 'subsections' and have them link up again.)