Persistent Place Identifier

From OpenStreetMap Wiki
Jump to: navigation, search

This page is for a systematic review of the theme:
  a unique identifier to identify an OSM feature, and that never changes. (for short a perma_id).

As working definition for OSM feature we can say that it is "a kind of map feature, a stable thing in some (time-space) scale of reference"...

The theme have its concepts and problems/solutions to be discussed, and this article is is used to express and preserve reference-models, consensus and working definitions. Some parts of this theme are under "diffuse discussion" with no nitid consensus, so this article also reflect the diversity of opinions — the article express, when possible, a neutral point of view — and the lack of some solid definitions.

There are also closed proposals, the Permanent ID and the stable.openstreetmap.org server, with nitid objectives and less diffuse discussion.

 

Working definitions

As working definition for OSM feature we can say that it is a "stable thing in some scale of reference"... In detail:

  • is an OSM element: relation, way or node.
    The element is already the container of the core ID, and would also be the container of the perma_id, but they differ in many characteristics:
    1- the datatype (the core ID is an serial integer and perma_id can be non-serial or even hierarchical value like an IP number); 2- the obligation (perma_id is not necessary in all elements); 3- the backup/restore process (core ID will be refreshed with a new value); 4- the move of the perma_id from original element to a new/evolved element, to fit its concept in better editions of the map or reality evolution; 5- the perma_id can be implemented as tag (or even as lookup table) instead core attribute.
  • has public utility (a concept): as an OSM's point of interest concept, it have some tags associated and can be characterized as map feature.
    Is possible to to check "importance" (notability or utility-stability) of the feature, through some objective criterion – or, in the absence of criteria, through voting.
  • has a time-scale of reference to say "is stable about time" ("not changed"). The time-scales for mountains are bigger than a museums, that are bigger than restaurants or pubs.
    • has a time-class: a practical way to assign time-scale to an object. The time-class can be inferred from element's tags and metrics.
      PS: "class geographical" (rivers and mountains) and "class administrative" (countries and cities) objects have different global time-scales. And subclasses for smaller objects: a mountain range have a different time-scale than a little mountain, a city have different time-scale than a country.
    • has creation and extinction criteria to attributes like "creation year" and "extinction year".
      PS: when a natural object like a island is extinct, its perma_id persist, and by the perma_id its geometry can be restored from some "official OSM backup".
  • has error-position reference to say "is stable about position" ("not changed its position"). 1km, 10km, 1m, 5m... each kind of object have an admissible error-position.
  • has error-concept reference to say "is stable about concept" ("not changed its public utility"). Is acceptable to a pub change to a restaurant, but not to change to an hospital. Is acceptable that a city changes its name, but not that changes from "official city" to "non-official" or to "official district of other city".

So, the uniqueness of the perma_id is about this working definition: there are a unique OSM-element with that identifier.

 


 

Non-persistent IDs

There are good candidates to "persistent place-identifier", but all fails in the main property, that is to ensure persistence. In this context of non-permanent IDs, the most important example is the Nominatim's place_id that is "independent of geometry".

Element's OSM_ID

Elements are the main references as "official geometry ID":

  • Relation ID: the unique-ID of an element of the kind "relation",
    as official URL openstreetmap.org/relation/$OSM_RID
    as original XML <relation id="$OSM_RID" changeset="$OSM_CHGID" ...>...</relation>
  • Way ID: the unique-ID of an element
    as official URL openstreetmap.org/way/$OSM_WID
    as original XML <way id="$OSM_WID" changeset="$OSM_CHGID" ... />...</way> (example).
  • Node ID: the unique-ID of an element
    as official URL openstreetmap.org/node/$OSM_NID
    as original XML <node id="$OSM_WID" changeset="$OSM_CHGID" .../> (example).

Nominatim's place_id (internal identifier)

Nominatim is a tool to search OSM data by name or address, and, to operate internally this tool, it uses a lookup table
  <place_id,osm_type,osm_id>
to offer the place_id as an "OSM any element ID".

The osm_type can by any, a relation, way or node. Example: the place_id=178741737 was a record with osm_type=relation and osm_id=62422 in July 2018, that was the Berlin (Q64) concept, pointing to the correct map https://nominatim.openstreetmap.org/details.php?place_id=178741737

NOTICE: the Nominatim's place_id place_id is only an internal parameter of the engine. You cannot use place_id for anything, it is a technical database key and depends on a single Nominatim instance.

OSM external persistence implementations

Implementations that are "non-official", where the implemented perma_id is not a tag neither an XML-attribute of dumps or backups. In the case of an API (eg. an ID-resolver), is "external" in the sense that the URL of its endpoint is not implemented with the openstreetmap.org domain.

Query-to-map

See Query-to-map. Preserves the "permanent name" (name and type) of an OSM feature in the service tools.wmflabs.org/query2map. Use name as main identifier, and key (and types?) as "namespace" for name.

OSMLR

As Github's project opentraffic/osmlr (see also blog presentation) is a complex "backup and lookup" system that ensures persistence of the ID of "almost any stretch of roadways in OpenStreetMap".

Have good historical data, so we can use it tho check our stability hypothesis.

Overpass API/Permanent ID

See Overpass API/Permanent ID ... need better explanation there ... Please help to enhance it.

Non-OSM reference-implementations

Other reference-examples. The main is Wikidata, that have a little coupling with OSM.

As URL

See w:Persistent uniform resource locator (Persistent URL or PURL).

In Wikidata infrastructure

Authority name at Wikipedia Wikidata-city-key
w:ISO 3166-3 P773
w:Brazilian Institute of Geography and Statistics (IBGE) P1585
w:Federal Statistical Office of Germany P439
w:Instituto Nacional de Estadística (Spain) (INE) P772
OSM - The OpenStreetMap as a perma_id authority
See Permanent ID proposal.
P402
w:Geonames P1566
... ...

Persistent (place) unique identifiers (perma_id's) assigned by "Place-ID autorithies".
  (to see a list of all valid authorities, follow the link and click on the "play")

Each authority-ID can be described as: URN-schema (the authority's namespace) and a valid URN in that schema. (see wikipedia's namespace and URN concepts). See a sample on the right side table.

See also w:Office for National Statistics list.


... (under construction)...

Problems and solutions

For each reasonable problem there is a reasonable solution (to be detailed in the future implementation), and so far, within the working definitions elaborated at the beginning of the article, no major problems were detected, which would impair the Persistent Place Identifier.

Defining classes of OSM features

To classify OSM features (when it will be assigned with perma_id) according tags that describe the element, in a more coarse set of map feature, we can imagine some basic groups, labeled by an arbitrary group-number:

0 - administrative map features: for cities, countries, districts, etc.
1 - "relief and hydrography" map features: for rivers, mountains, etc.
2 - transport map features: for all ways, cycleways, train lines, etc.
3 - other: no many others.

Each group have a difference scale-correlation behaviour, so is necessary to characterize group before to characterize time and spatial scales of the OSM feature.

Defining spatial scale of a OSM feature

Examples of scales in Geography
Scale Length Area
Local (micro) 1 m … 1 km 1 m² … 1 km²
Regional (meso) 1 km … 100 km 1 km² … 10,000 km²
Continental (macro) 100 km … 10,000 km 10,000 km² … 100,000,000 km²
Global (mega) > 10,000 km > 100,000,000 km²

There are some usual spatial scale definitions in Geography, and simple database functions (ref. PostGIS) and approximations as ST_Length(), ST_Area or ST_Area(ST_Envelope()) that will automatically classify the element (way or relation) that represents a taged feature.

The use of the scale, by other hand, is to estimate error position, and the "acceptable error" is an subjective criteria. For example maps of some nations of Africa and souh america can accept big changes, enquanto mapas de certas nações da europa podem não aceitar.

Defining time scale of a OSM feature

Time scales here is not about "Geologic time scale" neither the usual orders of magnitude in time units. Is a razonable choice of units for each general type ...

... (under construction)...

Assigning the perma_id

The rules to say "ok this OSM feature can be assigned to a perma_id", because (supposing to) we can't assign a perma_id to all nodes of the OSM map, there are a "preservation cost", so we must to reduce or to avoid exaggerations.

Supposing all elements passed in a simple "stability check" and potential watchers before assign, there are two main ways to assign:

  • Automatic assign: by Wikidata tag, and/or "importance threshold" to cut non-relevant map features.
  • Human decision: voting pull, in a scale-related watchers (city or country) local community.

Ideal and practical position-reference

The "has error-position reference" property (see begin of the page) to ensure that a OSM feature not changed its position — with OSM-user edits in the map, or with some natural evolution of the reality.

The ideal is transformation like TopoJSON, ST_Simplify, etc. but, for practical and low-cost implementation, the only "last position in the map before changes" that we need to check is the centroid (eg. PostGIS's ST_PointOnSurface) or the BBOX, and validate changes against some error-position criteria (see "Defining spatial scale" above).

The "change validation" algorithm is not so simple... And can be implemented in only one or in many moments of the workflow:

  • On an OSM's editor: ideal as pre-processing some basic validation and warning user...
  • On the OSM's Editing API: the correct locus for ensure continuous control and quality.
  • On a quality-control tool: a "long time" checker (eg. each year) and review task. System low-impact, software low-cost, but human high cost. Ideal for first experiments with perma_id.

... (under construction)...


FAQ and perhaps false criticisms

Frequently Asked Questions and frequent criticism with, perhaps (on the facts explained in this page), false premises.

... (under construction)...

...

... (under construction)...