Area/The Future of Areas

From OpenStreetMap Wiki
Jump to: navigation, search
Minimal approach, without mandating a change to current OSM API and building on established principles. Note that this approach is different to super areas, but not as fragile as huge MPs are claimed to be.

OpenStreetMap does not have a native Area (or polygon) data primitive and areas are currently modeled using Ways which are also used to describe Polylines.

Not all closed polylines are areas, as this varies based on which other tags are present. A closed way tagged highway=pedestrian is considered to be a circular path unless it has a area=yes tag which turns it into a pedestrianised area, however a closed way tagged with leisure=park is assumed to be an area. Areas can also be described using multiple ways as members of a relation:multipolygon. It is possible within the editors to create parks which are not closed even though these won't normally render unless they are included in a multipolygon. Anyone wishing to tag areas or use areas therefore needs to always check on the default usage of closed ways.

This set of wiki pages describes the problems and confusions created by the current model and suggests some alternative proposed approaches to model areas more clearly, flexibly and simply.

The current situation

There are currently two different ways of modelling areas/polygons in OSM: Way-based polygons and relation-based polygons.

Way-based Polygons

Polygons can be created from closed ways (i.e. ways where the first and last node are the same). This is most often used for small polygons such as parks, buildings or small lakes.

What makes a closed way into a polygon are the tags used. If the tag refers to something that is an area (such as landuse=* or building=*), the way marks an area. If it refers to something that is a line (such as highway=* or railway=*), the way still is a line. So a roundabout might be described by a closed way, but that does not make it into an area. There is an extra tag (area=yes) to turn line features into area features, so a way with highway=pedestrian and area=yes describes a pedestrian area (such as a market place in a city).

Polygons modelled as closed ways can not have inner rings, i.e. they can have no holes. It is not possible to define multipolygons (with several outer rings) this way.

Relation-based Polygons

See Relation:multipolygon.

Polygons modelled with relations can have inner rings, i.e. they can have holes. They can also have multiple outer rings so they are proper multipolygons.

Note that there is some confusion where the tags for the multipolygons go. (Tags for the area that the relation makes up should be on the relation, because tags on the individual ways describe the individual way.)

Data wise, the current API version is the main culprit for invalid multipolygon relations, because it allows broken ones to be uploaded. The API should try to assemble them according to the specification and reject a changeset in question if this cannot be done. Mass edits or automated edits are discouraged anyway, so in case these checks cascade a changeset could be rejected early, after a timeout, without waiting for the checks to finish.

Special Case: Coastlines

Coastlines can be seen as the boundary between land polygons and sea polygons. But in OSM, they are handled in a different way. See Coastline and natural=coastline for details.

Special Case: Wide rivers

What makes this case special is that homogeneous ground areas are split up at arbitrary boundaries to limit the extents of representing data objects. Sometimes a split is justified, e.g. at weirs. To get or compute the homogenous surface at a whole, all the patches used to represent it in the database need to be stitched together (joined, aggregated) first.

Rivers that are too wide to be mapped as line features are mapped with waterway=riverbank, either using way-based or relation-based polygons as described above. The tag riverbank comes from the early times of OSM when only the riverbanks were mapped like this, without them being used to render the area (river surface) within.

Sometimes other large areas (such as forests) are arbitrarily broken up into smaller polygons, too.

Problems with the current approach

There are several problems with the current situation:

  • The current situation with several different ways of creating areas/polygons is difficult for people to understand and to work with.
  • The different ways of solving what is essentially the same problem create problems for software development. More software and more complex software is needed. Software that works with one of the area types does not necessarily work with others.
  • changes to mulitpolygon constructs are difficult and often damage the construct
  • To create an actual multipolygon that can be rendered or otherwise worked with from relations, ways, and nodes is rather complex. Only a few people have actually implemented software that does this. There are numerous corner cases and possibilities to create invalid multipolygons. Assembling multipolygons from an OSM data file is difficult, because you either need to read relations first and then find the corresponding ways and then the nodes, or you need to keep nodes and ways in memory until you read the relation section. Even worse: Simple polygons (i.e. closed ways) can not be used without checking whether they are actually part of a multipolygon relation.
  • Even less software can assemble coastline data into something that is usable for the renderer.
  • There is no way to find out if a closed way is a linestring feature or a polygon feature without having a list of tags and tag combinations. It is difficult to write generic software that handles those cases differently.
  • evaluation of multipolygons is ill-defined and difficult, producing different renderings and different problems in different maps instead of a similar picture. Most useful software by third parties using OSM data can't evaluate multipolygons at all.
  • way-based and relation-based polygons are incompatible. Describing the same real-life feature may require changing the type of polygon description used as more detail is added, this requires a complete rework and is prone to error.
  • The current situation where semantic information is needed to figure out whether a way represents an area or a line is awkward and sometimes even undecidable.
  • Because multipolygon relations are simply collections to the server/database, it is fairly easy to commit changes that break a multipolygon relation - in other words, because multipolygons are semantic, they break in ways that we do not see with way data (where the topological relationships are enforced in syntax).
  • Tagging multipolygons is not straight-forward either. A simple polygon has only one closed way which is holding the tags. For relation-based polygons, the relation feels to be the more logical element to be tagged. However, this is enough for making editing much more complicated: If a polygon is born as a simple one (building), but then evolves later into a relation (building with inner court), then the tags should be transferred. It is also possible that the inner rings are also objects which need their own tags (islands in a lake), similarly parts of multipolygons with many outer rings (islands which are together building up an archipelago).
  • Because it is difficult to create large areas in OSM, often several smaller areas are created right next to each other. This often leads to bad rendering when the outline of the area ('casing') is drawn in a different color than the area itself. Lines in the outline color will cross the area. Also it makes labelling and other uses of the data more difficult when there is not one OSM object but several objects describing only one real-world object.
  • There is no support for setting defaults (e.g. "this area is almost completely forest" or "speed limits are in mile per hour in this country").
  • Feel free to add your own

Proposals

Proposed solutions should ideally address all the problems and issues identified above. Partial solutions may however be useful as a basis.

Here are various proposals for altered, new or extended area types:

Proposal Mandates API change Comments
Simple Features (yes)
  • abstract idea to select and deploy parts of this standard to solve some data type issues in OSM
Areas on Nodes yes
  • based on one or more closed rings
  • each ring definition directly references three or more Node node ids
Areas on Ways yes
  • based on one or more closed rings
  • each ring definition references one or more Way way ids
  • a complete ring is obtained by concatenating all of the ways it references
  • unlike for multipolygons ways are fixed to a specific ring by the data structure
Areas on Nodes or Ways yes
  • based on one or more closed rings
  • a ring definition is either like those in Areas on Nodes or those in Areas on Ways above
  • mixing definitions is allowed, but for one given ring it's either one or the other
User:Sanderd17/Areas yes
  • similar to Areas on Ways and current multipolygons
  • unlike for multipolygons
  • ways are added to a specific ring by the data structure
  • way direction is given/recorded by the data structure
  • the area is left of the way (by default), but the behaviour can be reversed by setting the "reversed" flag to allow a way to be used for areas on both sides.
User:Zverik/Areas yes
  • mostly the same as Areas on Nodes; if a closed ring is outer or inner is computed by node order (clockwise order: inner, counter-clockwise: outer ring) and not recorded by the proposed data structure
  • alternatively proposes a concept similar to Tagging Outline Ways that
  • does not duplicate tag sets for areas on each way
  • instead records one or more arearefs per way side
  • referenced objects are tagged once, which is similar to relations, but the member list is stored at a different location:
  • it is given by ways/members pointing to the relation
  • not by relation entries pointing to ways/members
Tagging Outline Ways yes
  • each way gets maximum of three tag sets
(1) describing way itself
(2) describes area to the left of way
(3) .. area to the right of way
  • the tag set describing an area is repeated for each outline way encasing an area
  • needs overlapping ways if two areas should overlap, since for each side of any way at most one area may be defined
Triangles (tesselation based areas) yes
  • based on enumerating all triangles within a polygonal area
  • each triple of nodes in the database forms a triangle, each triangle gets an unique id
  • an area references all the triangles lying within its outline
  • for each new node placed inside existing area objects (or exactly on the outline), the list of triangle references grows; when deleting, the list shrinks → checking and updating area objects in the database needs to be done for all changesets that add or delete nodes
Super Areas (yes)

  • an area may be built from a collection of (a collection of..) other areas
  • depends on at least one basic area type (either existing or proposed above)
  • super areas reference two or more of (other super area, basic area); possibly mixing refs to super and basic types
  • rings need to be computed using children (of children..)
Super Multipolygons no
(1) either by building an area from a collection of (a collection of..) other multipolygons
  • like Super Areas, but without mandating an API change; defined by using Relation relations
  • references two or more of (other super MP, basic MP); possibly mixing refs to super and basic types
  • rings are computable by using only those ways (tree leaves) that hang exactly once in the tree; a way leaf appearing twice or more is a shared boundary between children areas
(2) or by building an area from a collection of (a collection of..) way-concatenating relations
  • way-concatenating relation: relations that can be computationally reduced to a single way, closed or not, such that every member is used exactly once (this non-exclusively includes most of current route, superroute and boundary relations)
  • a ring is given by one or more way-concatenating relations
  • this variant is portrayed in the picture at the beginning of this article
  • note that lower level of detail (lower LOD) versions may be generated, supposedly without having the full data tree
for instance, asserting sorted members of way-concatenating relations, using only first and last members in turn may suffice to build and render low detail version of the rings
to some degree the usefulness of this aspect may depend on editors to craft the closed rings in such a manner that the route ends lie at points of great significance to a low detail representation
Fixing Multipolygons no
  • splits the basic multipolygon definition
  • each ring gets its own type=ring relation; inside is always left to a counter-clockwise oriented ring
  • type=polyring relations contain only such rings as inner or outer (and additionally contain other polyrings, if recursion as in super areas above is allowed)
  • a ring relation may define an area by itself (to avoid creating polyrings with a single outer ring)
Relations/Proposed/Area no
  • specific to highway handling, uses Relation a relation for definition
  • a type of area that references two of its outline ways, unconnected to each other
  • missing parts of the outline ring need to be computed by clients
  • does not accomplish for holes or disjunct ground areas, which is similar to closed ways or multipolygons with one outer ring
  • proposed 2009 (seems to be obsoleted by Street area)

Computing or deriving areas from other data

Having proposed super areas or super multipolygons above, keep in mind that Relations are not Categories. This means that either of these area defining methods should be employed only when computing or deriving it by a query is hard, computationally expensive or (atm) impossible.


E.g.: For the german primary road network, relations have been submitted to the database that collect primary roads based on their ref=* reference. Overpass API instances have shown that the same collection might be computed using simple database queries (e.g. Zeige B 2 *R auf overpass.eu (overpass)).

If the ref=* tags are maintained properly on each Way way, there is no need to maintain these collecting relations. Unless they are used for e.g. validation checks, they are redundant. For validation checks to work, mappers maintaining ref=* tags on Way need to be different from those updating the Relation redundant relations. Anything else is an error-prone maintenance burden.

To tag extra information about the complete road route, there are a bunch of possibilities:

  1. Duplicate all extra tags to the tag sets of its Way individual ways (which often is not desired).
  2. Manually maintain a Relation relation and its Way way member list to store extra tags. This is done now, but duplicates efforts.
  3. Tag a computed query result. If the queries are part of the db, associated tags will be as well. This is comparable to a relation with a dynamically updated member list.
  4. Use external information sources such as wikidata or wikipedia. Note that this still needs to mix with one of the solutions above, since a wikidata object or wikipedia page on the complete road route can very well be a different one than that for an individual segment/way of that road route.

Currently Overpass returns only objects based on existing geometry in the db. All boundary ways and nodes of area objects in the result set have to exist in the database (or its history). To extend the usefulness of Overpass API, it could however reply computed (derived) geometry when answering certain area queries. E.g. computing landuse=residential areas with a query based on contained elements might be feasible:

For this, all residential buildings within a bbox or administrative boundary (or ..) could be used to compute and return minimum enclosing rings (plus a variable buffer) such that no two buildings within a ring are further apart than a query-defined width. A possible algorithm might

  • retrieve all relevant buildings
  • (A)pick any one to include within a set
  • out of the remaining buildings, find those intersecting with the buffer around what's in the current set
  • repeat the last step until buildings cannot be added anymore
  • if relevant buildings are left, repeat from (A)
    Achievable with current implementation of Overpass API
  • compute minimum enclosing rings (non-convex ones might increase usefulness..) for each set built, eventually adding a buffer width
  • cache and return results
    Achievable using additional tools only, currently not part of Overpass API

Note that such a query generates way and multipolygon objects that do not exist in osm's database. Depending on the implementation (adding buffer width), the same applies to nodes returned by such queries.

The additional capabilities might serve quality assurance and also help to unclutter some of the objects within the db in the long run. It might also initiate a focus shift from debates over mapping practices to querying practice and negotiate some disputes on tag usage/interpretation.

Computing areas from other computed areas

This raises several issues. It means that computed areas may be part of the source data for other queries, or in short: queries processing the result set of other queries. Any implementation would need to employ some form of mechanism to avoid or resolve circular dependencies and a way to determine query execution order.

E.g.: The is_in operator of the Overpass API query language replies all areas an object is contained in.

  • If the static map data of areas such as landuse=residential is replaced by dynamic results of a community-accepted query,
  • then a is_in query would need to consider this dynamic result in the source data it operates on.

The trip point is replacing mapped static areas by eventual query counterparts. Using computed areas simply as a quality assurance measure or as a service to data consumers does not implicate such replacement.

Applications of area types

Applications of area types (area usage):

See Also