Area/The Future of Areas
Not all closed polylines are areas, as this varies based on which other tags are present. A closed way tagged highway=pedestrian is considered to be a circular path unless it has a area=yes tag which turns it into a pedestrianised area, however a closed way tagged with leisure=park is assumed to be an area. Areas can also be described using multiple ways as members of a relation:multipolygon. It is possible within the editors to create parks which are not closed even though these won't normally render unless they are included in a multipolygon. Anyone wishing to tag areas or use areas therefore needs to always check on the default usage of closed ways.
This set of wiki pages describes the problems and confusions created by the current model and suggests some alternative proposed approaches to model areas more clearly, flexibly and simply.
The current situation
There are currently two different ways of modelling areas/polygons in OSM: Way-based polygons and relation-based polygons.
Polygons can be created from closed ways (i.e. ways where the first and last node are the same). This is most often used for small polygons such as parks, buildings or small lakes.
What makes a closed way into a polygon are the tags used. If the tag refers to something that is an area (such as landuse=* or building=*), the way marks an area. If it refers to something that is a line (such as highway=* or railway=*), the way still is a line. So a roundabout might be described by a closed way, but that does not make it into an area. There is an extra tag (area=yes) to turn line features into area features, so a way with highway=pedestrian and area=yes describes a pedestrian area (such as a market place in a city).
Polygons modelled as closed ways can not have inner rings, i.e. they can have no holes. It is not possible to define multipolygons (with several outer rings) this way.
Polygons modelled with relations can have inner rings, i.e. they can have holes. They can also have multiple outer rings so they are proper multipolygons.
Note that there is some confusion where the tags for the multipolygons go. (Tags for the area that the relation makes up should be on the relation, because tags on the individual ways describe the individual way.)
Data wise, the current API version is the main culprit for invalid multipolygon relations, because it allows broken ones to be uploaded. The API should try to assemble them according to the specification and reject a changeset in question if this cannot be done. Mass edits or automated edits are discouraged anyway, so in case these checks cascade a changeset could be rejected early, after a timeout, without waiting for the checks to finish.
Special Case: Coastlines
Special Case: Wide rivers
What makes this case special is that homogeneous ground areas are split up at arbitrary boundaries to limit the extents of representing data objects. Sometimes a split is justified, e.g. at weirs. To get or compute the homogenous surface at a whole, all the patches used to represent it in the database need to be stitched together (joined, aggregated) first.
Rivers that are too wide to be mapped as line features are mapped with waterway=riverbank, either using way-based or relation-based polygons as described above. The tag riverbank comes from the early times of OSM when only the riverbanks were mapped like this, without them being used to render the area (river surface) within.
Sometimes other large areas (such as forests) are arbitrarily broken up into smaller polygons, too.
Problems with the current approach
There are several problems with the current situation:
- The current situation with several different ways of creating areas/polygons is difficult for people to understand and to work with.
- The different ways of solving what is essentially the same problem create problems for software development. More software and more complex software is needed. Software that works with one of the area types does not necessarily work with others.
- changes to mulitpolygon constructs are difficult and often damage the construct
- To create an actual multipolygon that can be rendered or otherwise worked with from relations, ways, and nodes is rather complex. Only a few people have actually implemented software that does this. There are numerous corner cases and possibilities to create invalid multipolygons. Assembling multipolygons from an OSM data file is difficult, because you either need to read relations first and then find the corresponding ways and then the nodes, or you need to keep nodes and ways in memory until you read the relation section. Even worse: Simple polygons (i.e. closed ways) can not be used without checking whether they are actually part of a multipolygon relation.
- Even less software can assemble coastline data into something that is usable for the renderer.
- There is no way to find out if a closed way is a linestring feature or a polygon feature without having a list of tags and tag combinations. It is difficult to write generic software that handles those cases differently.
- evaluation of multipolygons is ill-defined and difficult, producing different renderings and different problems in different maps instead of a similar picture. Most useful software by third parties using OSM data can't evaluate multipolygons at all.
- way-based and relation-based polygons are incompatible. Describing the same real-life feature may require changing the type of polygon description used as more detail is added, this requires a complete rework and is prone to error.
- The current situation where semantic information is needed to figure out whether a way represents an area or a line is awkward and sometimes even undecidable.
- Because multipolygon relations are simply collections to the server/database, it is fairly easy to commit changes that break a multipolygon relation - in other words, because multipolygons are semantic, they break in ways that we do not see with way data (where the topological relationships are enforced in syntax).
- Tagging multipolygons is not straight-forward either. A simple polygon has only one closed way which is holding the tags. For relation-based polygons, the relation feels to be the more logical element to be tagged. However, this is enough for making editing much more complicated: If a polygon is born as a simple one (building), but then evolves later into a relation (building with inner court), then the tags should be transferred. It is also possible that the inner rings are also objects which need their own tags (islands in a lake), similarly parts of multipolygons with many outer rings (islands which are together building up an archipelago).
- Because it is difficult to create large areas in OSM, often several smaller areas are created right next to each other. This often leads to bad rendering when the outline of the area ('casing') is drawn in a different color than the area itself. Lines in the outline color will cross the area. Also it makes labelling and other uses of the data more difficult when there is not one OSM object but several objects describing only one real-world object.
- There is no support for setting defaults (e.g. "this area is almost completely forest" or "speed limits are in mile per hour in this country").
- When clipping data (accurately) to a certain area, e.g. the border of a community or a country, problems arise, when ways are used for different features. Think of a roundabout with a meadow in the center. Both use the same closed way. Now this roundabout is to be clipped with a path going through the middle. The way needs to be cut for the roundabout and replaced by a smaller area for the meadow. You can't do both at the same time.
- For someone new to OSM the term "multipolygon" is confusing: it suggests the association to the OGC-MultiPolygon which – in OGC-terminology – is a geometry (more precisely: a collection of geometries). The OSM-multipolygon however is used to describe features (real world objects). Compare this to other type-tag values. These most often have the character of feature classes and do not relate to geometries: OSM has a “node" but no “Point”, a “way” but no “LineString”, a “route” but no “MultiLineString”. And the word suggests that a multipolygon is a collection of objects of type polygon / Area. However the wiki not describe it as such. This might come from the fact that the concept of the Area#Simple area is restricted to be bounded by closed ways and has not been extended to those bounded by closed routes.
- Feel free to add your own
Requirements are necessary to assess proposals. They can (and should) be classified – see .
|1.||The term „Area“ is used to describe a 2-dimimensional planar feature.|
|2.||The geometry of an area should easily translate into valid OGC-geometries (i.e. polygon or multipolygon).
An area should easily translate into a OGC Simple Feature
|See OGC-standard, 4.19|
|3.||It is a well-defined object type in OSM, so that no semantics / heuristics must be applied to understand an OSM-object as area.||this is not meant to require a new element (besides the existing element “tag, “node”, “way” and “relation”), perhaps tagging a relation or an outer closed ring with type=”area” would do. See the ideas in The Future of Areas/Fixing Multipolygons.|
|4.||An area is defined by its boundaries.|
|5.||Boundaries are defined by ordered lists of nodes, ways, routes or coordinates as defined by OGC. Critical criteria are:|
|- Boundaries are closed rings|
|- Boundaries are simple, i.e. method IsSimple would return True|
|6.||The geometric properties of an area are those of an OGC-Polygon. Critical criteria are:|
|- an area is bounded by one outer boundary and zero or more inner boundaries|
|- the interior of an area is connected||connectedness excludes the archipelago to be considered as one area. That means something like a “multi-area” is needed, a concept which is similar to a route being a “multi-way” (see The Future of Areas/Super Areas).
If - on the other hand – one skips this requirement then it is nevertheless necessary to consequently introduce the concept of a polygon to be able to easily check the validity of each “patch” of the area (each island in the archipelago)
|8.||The validity of the boundaries can be easily checked by the software.||probably that requires the concept of a closed ring (closed way, closed route) to be implemented by software and visible to the mapper.|
|9.||The validity of the area made-up by the boundaries can be easily checked by the software|
|10.||The validity must not be violated inadvertently during editing when the editing box does not include the complete area.
In other words: valid local changes (within a bounding box) should not invalidate the area
Exception: It is acceptable that a violation of the requirement “connected” after an edit is only detected on the backend.
|The exception addresses a scenario where the bounding box does not include an existing touching point. In this case one could morph the boundaries so that a second touching point is created without noticing that now the area is no longer connected.|
|11.||It must be possible that a way (or route) is both part of a boundary and a feature of its own (e.g. the roundabout enclosing a meadow. The fence enclosing a meadow is a bad example because it is only by incidence if the fence posts are exactly on the border)|
|12.||It must be possible to model large areas without the need to split it down to patches.||If patches were allowed they would have to be modelled as OGC-polygons. Then the parts of the boundaries where two patches are stitched together are not part of the boundary of the area and have to be modelled accordingly.|
|13.||There must be a clear (both for software and for the mapper) distinction between a “filled hole in an area” and an “area as part another area”||Examples:
|14.||The implementation must allow to migrate current multipolygons consistent with these requirements into the new implementation||should be trivial for areas currently modelled as closed ways|
|15.||The migration strategy must allow for a coexistence on the current API and a new one.|
|16.||If current-style area objects and new-style area objects coexist for some time the object must “know it’s state”.|
|17.||Any solution must be compatible with a data model change due to the proposal “Geometry for Ways”||See .(https://github.com/osmlab/osm-data-model/blob/master/geometry_for_ways/approaches.md)|
|18.||Solution should allow for area-steps.||Relations/Proposed/Area#Area-steps, steps which are wide and/or irregular|
|Feel free to add your own|
Proposed solutions should ideally address all the problems and issues identified above. Partial solutions may however be useful as a basis.
Here are various proposals for altered, new or extended area types:
|Proposal||Mandates API change||Comments|
|Areas on Nodes||yes|
|Areas on Ways||yes|
|Areas on Nodes or Ways||yes||
|Tagging Outline Ways||yes||
|Triangles (tesselation based areas)||yes||
Computing or deriving areas from other data
Having proposed super areas or super multipolygons above, keep in mind that Relations are not Categories. This means that either of these area defining methods should be employed only when computing or deriving it by a query is hard, computationally expensive or (atm) impossible.
E.g.: For the german primary road network, relations have been submitted to the database that collect primary roads based on their ref=* reference. Overpass API instances have shown that the same collection might be computed using simple database queries (e.g. ).
If the ref=* tags are maintained properly on each way, there is no need to maintain these collecting relations. Unless they are used for e.g. validation checks, they are redundant. For validation checks to work, mappers maintaining ref=* tags on need to be different from those updating the redundant relations. Anything else is an error-prone maintenance burden.
To tag extra information about the complete road route, there are a bunch of possibilities:
- Duplicate all extra tags to the tag sets of its individual ways (which often is not desired).
- Manually maintain a relation and its way member list to store extra tags. This is done now, but duplicates efforts.
- Tag a computed query result. If the queries are part of the db, associated tags will be as well. This is comparable to a relation with a dynamically updated member list.
- Use external information sources such as wikidata or wikipedia. Note that this still needs to mix with one of the solutions above, since a wikidata object or wikipedia page on the complete road route can very well be a different one than that for an individual segment/way of that road route.
Currently Overpass returns only objects based on existing geometry in the db. All boundary ways and nodes of area objects in the result set have to exist in the database (or its history). To extend the usefulness of Overpass API, it could however reply computed (derived) geometry when answering certain area queries. E.g. computing landuse=residential areas with a query based on contained elements might be feasible:
For this, all residential buildings within a bbox or administrative boundary (or ..) could be used to compute and return minimum enclosing rings (plus a variable buffer) such that no two buildings within a ring are further apart than a query-defined width. A possible algorithm might
- retrieve all relevant buildings
- (A)pick any one to include within a set
- out of the remaining buildings, find those intersecting with the buffer around what's in the current set
- repeat the last step until buildings cannot be added anymore
- if relevant buildings are left, repeat from (A)
Achievable with current implementation of Overpass API
- compute minimum enclosing rings (non-convex ones might increase usefulness..) for each set built, eventually adding a buffer width
- cache and return results
Achievable using additional tools only, currently not part of Overpass API
Note that such a query generates way and multipolygon objects that do not exist in osm's database. Depending on the implementation (adding buffer width), the same applies to nodes returned by such queries.
The additional capabilities might serve quality assurance and also help to unclutter some of the objects within the db in the long run. It might also initiate a focus shift from debates over mapping practices to querying practice and negotiate some disputes on tag usage/interpretation.
- Computing areas from other computed areas
This raises several issues. It means that computed areas may be part of the source data for other queries, or in short: queries processing the result set of other queries. Any implementation would need to employ some form of mechanism to avoid or resolve circular dependencies and a way to determine query execution order.
E.g.: The is_in operator of the Overpass API query language replies all areas an object is contained in.
- If the static map data of areas such as landuse=residential is replaced by dynamic results of a community-accepted query,
- then a is_in query would need to consider this dynamic result in the source data it operates on.
The trip point is replacing mapped static areas by eventual query counterparts. Using computed areas simply as a quality assurance measure or as a service to data consumers does not implicate such replacement.
Applications of area types
Applications of area types (area usage):
- Proposed features/Street area and similar
Proposed features/area:highway (both in use)