Permanent ID

From OpenStreetMap Wiki
Revision as of 18:30, 27 August 2018 by Krauss (talk | contribs) (add implementation)
Jump to navigation Jump to search

information sign

This is a work in progress, feel free to improve, correct, or suggest ideas on the discussion page

A concept of a "permanent ID" (or Persistent Place Identifier) has been frequently floated in the discussions. This page attempts to document the expectations and requirements for such features.

A permanent ID is some opaque string, whose value will always represent the same OSM feature, such as a street, a building, a country outline, or a point of interest (POI). The ID must remain valid even if the country outline geometry is changed, a POI gets renamed, or if the street gets split into multiple segments — in this case the ID itself can't group the segments, but generates a suggestion to group all with a relation that will preserve the ID instead the first segment.

The Permanent ID represents a "concept"
As editors improve data, any aspect of that data may change. A restaurant node POI may become a building or a relation, and its name may be corrected. A single way representing a street may be split into multiple ways due to a partial speed restriction, or may become two ways separated by a divider. The ID must still point to the original "concept". Consequentially, not every OSM feature needs to have a Permanent ID.
This Permanent ID characteristic is closed with POI concept, linked data concept, "One feature, one OSM element" principle and the Wikidata-tag attribution process.
Support ID resolution
Duplicates happen and other "official names" are welcome. As data gets corrected, multiple Permanent IDs may be discovered to represent the same concept, and should be merged. Yet, both IDs must remain, and point to the same feature.
Other popular identification authorities like ISO, Geonames and Wikidata, or official autorities (contry-specific like IBGE ID) have its proper IDs that will be good synonyms for the Permanet IDs. All valid references (the "merged Permanent IDs" and the "external official IDs") are synonyms, so Permanent ID needs support reference ID resolution, converting synonyms into the canonical Permanent ID.
Historical view
It should be possible to view change history for a given Permanent ID, that is a subset of the changeset and can mix elements.
API support
OSM APIs should support basic ID manipulation, such as attaching a new ID to a feature, moving ID from one feature to another, merging duplicates, reverting bad merges, etc.
Can be acomplished as any other tag, but creating restrictions to the "free editing", and with a extra-infrastructure to validate changes.
Tool support
As editors modify existing objects, all editing tools need to help editors with maintaining Permanent IDs. For example, if an editor splits the road into multiple speed zones, editing tool should create a relation with both segments, and ensure the Permanent ID points to that relation.
Data dump support
Permanent IDs should become part of the regular data downloads.
Community education
Editors should always be aware of the Permanent IDs, and should try as much as possible to maintain ID continuity.

Implementation Discussion:

Relations as Permanent IDs

A relation ID is the closest to satisfying above requirements. Very often a relation represents a concept (e.g. a country or a road), and tends to change the least. The minimum viable product would be to introduce redirects (could be done as a relation with just one relation member, and a single tag redirect=true). A significant work would still be required for tool support and community education.

PROS: will be a natural reuse for existent relations (most of the OSM features with Wikidata tag are relations) and for "feature evolution", encapsulating important features with a relation.
CONS: stable node or way will need a "fake relation" encapsulating it, creating redundance in the OSM data model and overloading OSM infrastructure.

Towards a perma_id

The only manner to solve the below CONS of "relation ID as Permanent ID" is to create a new ID: let's label it perma_id. And the simplest manner to ensure long term persistence to perma_id is using the Persistent URL's persistence strategy: "redirecting" the assigned perma_id to the current element, so, managing a lookup table with <perma_id,osm_type,osm_id>.

Resolution algorithm

The core of the Permanent ID, after basic decisions of "what ID" and "how to preserve it", will be the resolution mechanism.
The algorithm must to resolve duplicates and synonyms: see "Support ID resolution" at the requirements of the begin of this page.

Here a good starting point for PostgreSQL implementation of a complete "ID resolver" — a better technical term is "URN resolver".

Towards three-step project

Is possible to accomplish, in a reliable manner, all project with "implement-and-discuss-decide steps", for each main decision. As "permanent" is a long time (the ethernity!), each step need to achieve quality and consensus before the next step. Expressing a suggestion using the endpoint of the microservice implementation as reference:

Step-1. To implement http://openstreetmap.org/wd/{wdId}, the Wikidata-OSM resolver. A simple redirection service based in a cache of the lookup-table of OSM elements with reciprocal Wikidata tags.
It will be replace the current "non-persistent relation-restricted service" used in P402 (http://openstreetmap.org/relation/$1).
Step-2. To implement http://urn.openstreetmap.org/{permaId_value}, as the perma_id itself as two supporting tools:
A. a perma_id=permaId_value controled tag (non-editable and assigned automatically to serial value). There are two asssign mechanisms:
A1. assigning with a OSM-user that will be the watcher of the feature.
A2.. monitoring stability of features that recived a Wikidata tag. Eg. checking that the same element survived more tham 6 months with the same Wikidata tag.
B. To implement the microservice of the endpoimt, as simple redirection service, based in the lookup-table of <permaId_value,osm_type,osm_id>.
Step-3. To implement http://urn.openstreetmap.org/{namespace}:{value} redirector and
  http://urn.openstreetmap.org/{namespace}:{value}/{method} "name or ID" resolver,
a service based in other lookups with official IDs and abbreviations (like ISO 3166-1 or ISO 3166-2:BR). The namespace parameter is like an URN schema, the value can be an official name or a valid ID for the namespace. The method parameter and endpoint is to JSON response, can be only the basic methods', to enhance canonicalization of perma_id.

General Discussion:

The OSM-feature life cycle

OSM features are dynamic and are subject to a maturity process:

  • OSM features are dynamic: even a "ethernal" montain in the map can be extincted by a earthquake next week.
  • Maturity: as in the pub scenario there are a maturity process, so a "life cycle". OSM features can be born from an informal edition and simplified element representation, and can be evolve to a "notable" OSM feature.
    OSM community and other final users not need a Permanent ID of a informal node, and it is at risk of instabilities... But they need the Permanent ID of mature features, that are more stable objects.
    We can imagine a life cycle from informal to formal, from node to relation, from a feature with no tags to a feature with many tags.

So, dynamics and maturity are problems conflicting with the aims of the Permanent ID.

Wikidata-IDs as first step for Permanent IDs

The use of Key:wikidata is, perhaps, the most stable and reliable manner to obtain Permanent ID of relevant spatial features. Example: the oldest German motorway, the A 555, that is the entity Q17061 at Wikidata, so Q17061 is also a "Permanent ID", and as reciprocal use it is pointing to the OSM relation 23092.

The Wikidata reciprocal use solve the main discussed problems:

  1. is valid for any element type (node/way/relation);
  2. persist (is "eternal" and guaranteed with third parties and by Web Semantic);
  3. is a curatated process: there are a mimimal control for "maturity", "relevance" or "notability".

The tag wikidata={wd_value} never changes, and will continue to point the correct concept at old backups and "dead feateures". The "curatated process" is legitime and minimal because Wikidata has extremely broad notability requirement — only "clearly identifiable conceptual or material entity" and "an be described using serious and publicly available references" is required. As Wikidata is a separate project, having a solid community with close interests, is a "natural backup" and "natural complementar curatation process".

In July 2018 there are:

~1,125,000 OSM elements with a wikidata key.
(there are many elements representating same feature — eg. a relation and a node representating the same building).
~63,000 OSM features (represented by relations) with Wikidata reciprocal use.
Each feature having a permanent Wikidata ID, and a statement OSM relation ID (P402) pointing to the OSM online map.

It is a good "first step" (!) to be mature in the future with perma_id... At now will be easy to create a simple web resolution (example) microservice ..?

The tag attribution is the most complex manner to implement a synonym for current osm_id, but it has the "less impact", so is easy to implement — the best "implement now!" initiative. It is only a first to the Permanent ID implementation. The second step, of course, is to implement the canonical ID, perma_id as suggested below.

Implementating now

See Proposal here.