Area/The Future of Areas
Not all closed polylines are areas, as this varies based on which other tags are present. A closed way tagged highway=pedestrian is considered to be a circular path unless it has a area=yes tag which turns it into a pedestrianised area, however a closed way tagged with leisure=park is assumed to be an area. Areas can also be described using multiple ways as members of a relation:multipolygon. It is possible within the editors to create parks which are not closed even though these won't normally render unless they are included in a multipolygon. Anyone wishing to tag areas or use areas therefore needs to always check on the default usage of closed ways.
This set of wiki pages describes the problems and confusions created by the current model and suggests some alternative proposed approaches to model areas more clearly, flexibly and simply.
The current situation
There are currently two different ways of modelling areas/polygons in OSM: Way-based polygons and relation-based polygons.
Polygons can be created from closed ways (i.e. ways where the first and last node are the same). This is most often used for small polygons such as parks, buildings or small lakes.
What makes a closed way into a polygon are the tags used. If the tag refers to something that is an area (such as landuse=* or building=*), the way marks an area. If it refers to something that is a line (such as highway=* or railway=*), the way still is a line. So a roundabout might be described by a closed way, but that does not make it into an area. There is an extra tag (area=yes) to turn line features into area features, so a way with highway=pedestrian and area=yes describes a pedestrian area (such as a market place in a city).
Polygons modelled as closed ways can not have inner rings, i.e. they can have no holes. It is not possible to define multipolygons (with several outer rings) this way.
Polygons modelled with relations can have inner rings, i.e. they can have holes. They can also have multiple outer rings so they are proper multipolygons.
Note that there is some confusion where the tags for the multipolygons go. (Tags for the area that the relation makes up should be on the relation, because tags on the individual ways describe the individual way.)
Data wise, the current API version is the main culprit for invalid multipolygon relations, because it allows broken ones to be uploaded. The API should try to assemble them according to the specification and reject a changeset in question if this cannot be done. Mass edits or automated edits are discouraged anyway, so in case these checks cascade a changeset could be rejected early, after a timeout, without waiting for the checks to finish.
Special Case: Coastlines
Special Case: Wide rivers
What makes this case special is that homogeneous ground areas are split up at arbitrary boundaries to limit the extents of representing data objects. Sometimes a split is justified, e.g. at weirs. To get or compute the homogenous surface at a whole, all the patches used to represent it in the database need to be stitched together (joined, aggregated) first.
Rivers that are too wide to be mapped as line features are mapped with waterway=riverbank, either using way-based or relation-based polygons as described above. The tag riverbank comes from the early times of OSM when only the riverbanks were mapped like this, without them being used to render the area (river surface) within.
Sometimes other large areas (such as forests) are arbitrarily broken up into smaller polygons, too.
Problems with the current approach
There are several problems with the current situation:
- The current situation with several different ways of creating areas/polygons is difficult for people to understand and to work with.
- The different ways of solving what is essentially the same problem create problems for software development. More software and more complex software is needed. Software that works with one of the area types does not necessarily work with others.
- changes to mulitpolygon constructs are difficult and often damage the construct
- To create an actual multipolygon that can be rendered or otherwise worked with from relations, ways, and nodes is rather complex. Only a few people have actually implemented software that does this. There are numerous corner cases and possibilities to create invalid multipolygons. Assembling multipolygons from an OSM data file is difficult, because you either need to read relations first and then find the corresponding ways and then the nodes, or you need to keep nodes and ways in memory until you read the relation section. Even worse: Simple polygons (i.e. closed ways) can not be used without checking whether they are actually part of a multipolygon relation.
- Even less software can assemble coastline data into something that is usable for the renderer.
- There is no way to find out if a closed way is a linestring feature or a polygon feature without having a list of tags and tag combinations. It is difficult to write generic software that handles those cases differently.
- evaluation of multipolygons is ill-defined and difficult, producing different renderings and different problems in different maps instead of a similar picture. Most useful software by third parties using OSM data can't evaluate multipolygons at all.
- way-based and relation-based polygons are incompatible. Describing the same real-life feature may require changing the type of polygon description used as more detail is added, this requires a complete rework and is prone to error.
- The current situation where semantic information is needed to figure out whether a way represents an area or a line is awkward and sometimes even undecidable.
- Because multipolygon relations are simply collections to the server/database, it is fairly easy to commit changes that break a multipolygon relation - in other words, because multipolygons are semantic, they break in ways that we do not see with way data (where the topological relationships are enforced in syntax).
- Tagging multipolygons is not straight-forward either. A simple polygon has only one closed way which is holding the tags. For relation-based polygons, the relation feels to be the more logical element to be tagged. However, this is enough for making editing much more complicated: If a polygon is born as a simple one (building), but then evolves later into a relation (building with inner court), then the tags should be transferred. It is also possible that the inner rings are also objects which need their own tags (islands in a lake), similarly parts of multipolygons with many outer rings (islands which are together building up an archipelago).
- Because it is difficult to create large areas in OSM, often several smaller areas are created right next to each other. This often leads to bad rendering when the outline of the area ('casing') is drawn in a different color than the area itself. Lines in the outline color will cross the area. Also it makes labelling and other uses of the data more difficult when there is not one OSM object but several objects describing only one real-world object.
- There is no support for setting defaults (e.g. "this area is almost completely forest" or "speed limits are in mile per hour in this country").
- When clipping data (accurately) to a certain area, e.g. the border of a community or a country, problems arise, when ways are used for different features. Think of a roundabout with a meadow in the center. Both use the same closed way. Now this roundabout is to be clipped with a path going through the middle. The way needs to be cut for the roundabout and replaced by a smaller area for the meadow. You can't do both at the same time.
- Feel free to add your own
Proposed solutions should ideally address all the problems and issues identified above. Partial solutions may however be useful as a basis.
Here are various proposals for altered, new or extended area types:
|Proposal||Mandates API change||Comments|
|Areas on Nodes||yes|
|Areas on Ways||yes|
|Areas on Nodes or Ways||yes||
|Tagging Outline Ways||yes||
|Triangles (tesselation based areas)||yes||
Computing or deriving areas from other data
Having proposed super areas or super multipolygons above, keep in mind that Relations are not Categories. This means that either of these area defining methods should be employed only when computing or deriving it by a query is hard, computationally expensive or (atm) impossible.
E.g.: For the german primary road network, relations have been submitted to the database that collect primary roads based on their ref=* reference. Overpass API instances have shown that the same collection might be computed using simple database queries (e.g. ).
If the ref=* tags are maintained properly on each way, there is no need to maintain these collecting relations. Unless they are used for e.g. validation checks, they are redundant. For validation checks to work, mappers maintaining ref=* tags on need to be different from those updating the redundant relations. Anything else is an error-prone maintenance burden.
To tag extra information about the complete road route, there are a bunch of possibilities:
- Duplicate all extra tags to the tag sets of its individual ways (which often is not desired).
- Manually maintain a relation and its way member list to store extra tags. This is done now, but duplicates efforts.
- Tag a computed query result. If the queries are part of the db, associated tags will be as well. This is comparable to a relation with a dynamically updated member list.
- Use external information sources such as wikidata or wikipedia. Note that this still needs to mix with one of the solutions above, since a wikidata object or wikipedia page on the complete road route can very well be a different one than that for an individual segment/way of that road route.
Currently Overpass returns only objects based on existing geometry in the db. All boundary ways and nodes of area objects in the result set have to exist in the database (or its history). To extend the usefulness of Overpass API, it could however reply computed (derived) geometry when answering certain area queries. E.g. computing landuse=residential areas with a query based on contained elements might be feasible:
For this, all residential buildings within a bbox or administrative boundary (or ..) could be used to compute and return minimum enclosing rings (plus a variable buffer) such that no two buildings within a ring are further apart than a query-defined width. A possible algorithm might
- retrieve all relevant buildings
- (A)pick any one to include within a set
- out of the remaining buildings, find those intersecting with the buffer around what's in the current set
- repeat the last step until buildings cannot be added anymore
- if relevant buildings are left, repeat from (A)
Achievable with current implementation of Overpass API
- compute minimum enclosing rings (non-convex ones might increase usefulness..) for each set built, eventually adding a buffer width
- cache and return results
Achievable using additional tools only, currently not part of Overpass API
Note that such a query generates way and multipolygon objects that do not exist in osm's database. Depending on the implementation (adding buffer width), the same applies to nodes returned by such queries.
The additional capabilities might serve quality assurance and also help to unclutter some of the objects within the db in the long run. It might also initiate a focus shift from debates over mapping practices to querying practice and negotiate some disputes on tag usage/interpretation.
- Computing areas from other computed areas
This raises several issues. It means that computed areas may be part of the source data for other queries, or in short: queries processing the result set of other queries. Any implementation would need to employ some form of mechanism to avoid or resolve circular dependencies and a way to determine query execution order.
E.g.: The is_in operator of the Overpass API query language replies all areas an object is contained in.
- If the static map data of areas such as landuse=residential is replaced by dynamic results of a community-accepted query,
- then a is_in query would need to consider this dynamic result in the source data it operates on.
The trip point is replacing mapped static areas by eventual query counterparts. Using computed areas simply as a quality assurance measure or as a service to data consumers does not implicate such replacement.
Applications of area types
Applications of area types (area usage):