Permanent ID

From OpenStreetMap Wiki
Jump to navigation Jump to search

information sign

This is a work in progress, feel free to improve, correct, or suggest ideas on the discussion page

A concept of a "permanent ID" (or Persistent Place Identifier) has been frequently floated in the discussions. This page attempts to document the expectations and requirements for such features.

A permanent ID is some opaque string, whose value will always represent the same OSM feature, such as a street, a building, a country outline, or a point of interest (POI). The ID must remain valid even if the country outline geometry is changed, a POI gets renamed, or if the street gets split into multiple segments — in this case the ID itself can't group the segments, but generates a suggestion to group all with a relation that will preserve the ID instead the first segment.

The Permanent ID represents a "concept"
As editors improve data, any aspect of that data may change. A restaurant node POI may become a building or a relation, and its name may be corrected. A single way representing a street may be split into multiple ways due to a partial speed restriction, or may become two ways separated by a divider. The ID must still point to the original "concept". Consequentially, not every OSM feature needs to have a Permanent ID.
This Permanent ID characteristic is closed with POI concept, linked data concept, "One feature, one OSM element" principle and the Wikidata-tag attribution process.
Support ID resolution
Duplicates happen and other "official names" are welcome. As data gets corrected, multiple Permanent IDs may be discovered to represent the same concept, and should be merged. Yet, both IDs must remain, and point to the same feature.
Other popular identification authorities like ISO, Geonames and Wikidata, or official authorities (country-specific like IBGE ID) have its proper IDs that will be good synonyms for the Permanent IDs. All valid references (the "merged Permanent IDs" and the "external official IDs") are synonyms, so Permanent ID needs support reference ID resolution, converting synonyms into the canonical Permanent ID.
Historical view
It should be possible to view change history for a given Permanent ID, that is a subset of the changeset and can mix elements.
API support
OSM APIs should support basic ID manipulation, such as attaching a new ID to a feature, moving ID from one feature to another, merging duplicates, reverting bad merges, etc.
Can be accomplished as any other tag, but creating restrictions to the "free editing", and with a extra-infrastructure to validate changes.
Tool support
As editors modify existing objects, all editing tools need to help editors with maintaining Permanent IDs. For example, if an editor splits the road into multiple speed zones, editing tool should create a relation with both segments, and ensure the Permanent ID points to that relation.
Data dump support
Permanent IDs should become part of the regular data downloads.
Community education
Editors should always be aware of the Permanent IDs, and should try as much as possible to maintain ID continuity.

Implementation Discussion

Relations as Permanent IDs

A relation ID is the closest to satisfying above requirements. Very often a relation represents a concept (e.g. a country or a road), and tends to change the least. The minimum viable product would be to introduce redirects (could be done as a relation with just one relation member, and a single tag redirect=true). A significant work would still be required for tool support and community education.

PROS: will be a natural reuse for existent relations (most of the OSM features with Wikidata tag are relations) and for "feature evolution", encapsulating important features with a relation.
CONS: stable node or way will need a "fake relation" encapsulating it, creating redundance in the OSM data model and overloading OSM infrastructure.

Towards a perma_id

The only manner to solve the below CONS of "relation ID as Permanent ID" is to create a new ID: let's label it perma_id. And the simplest manner to ensure long term persistence to perma_id is using the Persistent URL's persistence strategy: "redirecting" the assigned perma_id to the current element, so, managing a lookup table with <perma_id,osm_type,osm_id>.

Resolution algorithm

The core of the Permanent ID, after basic decisions of "what ID" and "how to preserve it", will be the resolution mechanism.
The algorithm must to resolve duplicates and synonyms: see "Support ID resolution" at the requirements of the begin of this page.

Here a good starting point for PostgreSQL implementation of a complete "ID resolver" — a better technical term is "URN resolver".

Towards three-step project

Is possible to accomplish, in a reliable manner, all project with "implement-and-discuss-decide steps", for each main decision. As "permanent" is a long time (the ethernity!), each step need to achieve quality and consensus before the next step. Expressing a suggestion using the endpoint of the microservice implementation as reference:

Step-1. To implement http://openstreetmap.org/wd/{wdId}, the Wikidata-OSM resolver. A simple redirection service based in a cache of the lookup-table of OSM elements with reciprocal Wikidata tags.
It will be replace the current "non-persistent relation-restricted service" used in P402 (http://openstreetmap.org/relation/$1).
Step-2. To implement http://urn.openstreetmap.org/{permaId_value}, as the perma_id itself as two supporting tools:
A. a perma_id=permaId_value controlled tag (non-editable and assigned automatically to serial value). There are two assign mechanisms:
A1. assigning with a OSM-user that will be the watcher of the feature.
A2.. monitoring stability of features that received a Wikidata tag. E.g. checking that the same element survived more than 6 months with the same Wikidata tag.
B. To implement the microservice of the endpoimt, as simple redirection service, based in the lookup-table of <permaId_value,osm_type,osm_id>.
Step-3. To implement http://urn.openstreetmap.org/{namespace}:{value} redirector and
  http://urn.openstreetmap.org/{namespace}:{value}/{method} "name or ID" resolver,
a service based in other lookups with official IDs and abbreviations (like ISO 3166-1 or ISO 3166-2:BR). The namespace parameter is like an URN schema, the value can be an official name or a valid ID for the namespace. The method parameter and endpoint is to JSON response, can be only the basic methods', to enhance canonicalization of perma_id.

General Discussion

The OSM-feature life cycle

OSM features are dynamic and are subject to a maturity process:

  • OSM features are dynamic: even an "eternal" mountain in the map can be levelled to the ground by a earthquake next week.
  • Maturity: as in the pub scenario there are a maturity process, so a "life cycle". OSM features can be born from an informal edition and simplified element representation, and can be evolve to a "notable" OSM feature.
    OSM community and other final users not need a Permanent ID of a informal node, and it is at risk of instabilities... But they need the Permanent ID of mature features, that are more stable objects.
    We can imagine a life cycle from informal to formal, from node to relation, from a feature with no tags to a feature with many tags.

So, dynamics and maturity are problems conflicting with the aims of the Permanent ID.

Wikidata-IDs as first step for Permanent IDs

The use of Key:wikidata is, perhaps, the most stable and reliable manner to obtain Permanent ID of relevant spatial features. Example: the oldest German motorway, the A 555, that is the entity Q17061 at Wikidata, so Q17061 is also a "Permanent ID", and as reciprocal use it is pointing to the OSM relation 23092.

The Wikidata reciprocal use solve the main discussed problems:

  1. is valid for any element type (node/way/relation);
  2. persist (is "eternal" and guaranteed with third parties and by Web Semantic);
  3. is a curated process: there are a minimal control for "maturity", "relevance" or "notability".

The tag wikidata={wd_value} never changes, and will continue to point the correct concept at old backups and "dead features". The "curated process" is legitimate and minimal because Wikidata has extremely broad notability requirement — only "clearly identifiable conceptual or material entity" and "an be described using serious and publicly available references" is required. As Wikidata is a separate project, having a solid community with close interests, is a "natural backup" and "natural complementary curation process".

In June 2021 are:

~2,141,937 OSM elements with a wikidata key.
(there are many elements representing same feature — e.g. a relation and a node representating the same building).
~163,273 OSM features (represented by relations) with Wikidata reciprocal use out of a total of ~463,890 relations with a wikidata tag.
Each feature having a permanent Wikidata ID, and a statement OSM relation ID (P402) pointing to the OSM online map.

It is a good "first step" (!) to be mature in the future with perma_id... At now will be easy to create a simple web resolution (example) microservice ..?

The tag attribution is the most complex manner to implement a synonym for current osm_id, but it has the "less impact", so is easy to implement — the best "implement now!" initiative. It is only a first to the Permanent ID implementation. The second step, of course, is to implement the canonical ID, perma_id as suggested below.

Implementation now

See Proposal here.

OSM Permanent ID with geo URI

Given the OSM id alone is not stable enough and represents many concepts, and given not all objects are eligible as Wikidata QID, these are the design guidelines of a new proposal called 'OSM Permanent ID':

  • The version no. of the OSM object is required in order to get a historic state.
  • Coordinates are serving as a good fallback. The international geo URI scheme is a perfect fit for this.

Thus, the 'OSM Permanent ID' becomes the following form, based on the geo URI scheme:

 'geo:'[+|-]<lon>[+|-]<lat>'?q='[node|way|relation]'/'<osm_id>#<osm_version>

where:

  • coordinates (lat/lon) are signed floats (always showing the +/- sign), with no more than 6 digits if possible. In case of a line (linestring) or an area (multipolygon relation) geometry, the coordinates either are taken from any node of the geometry, or calculated e.g. from the center of the geometry.
  • osm_id an unsigned big integer ("digits"). The representation 'relation/1169711' is a well known (e.g. Overpass JSON output produces this as @id).
  • osm_version is an unsigned integer with a hash in front.

Example: "Schloss Kyburg" (Castle Kyburg) which has relation 1169711, version #6 at coordinates 47.45835, 8.74375 - and becomes

 geo:47.45835,8.74375?q=relation/1169711#6

Thus, the 'OSM Permanent ID' is mainly numeric plus some fixed tokens and it has a variable length of a maximum of 42 chars.

(NOTE: geo URIs look like web links (URL), but none of the 'well known' browsers supports this URI scheme yet 'out of the box'.)