Proposal:UUID

From OpenStreetMap Wiki
Revision as of 20:10, 19 September 2015 by Reneman (talk | contribs)
Jump to navigation Jump to search
UUID
Proposal status: Abandoned (inactive)
Proposed by: Delta foxtrot2
Tagging: uuid=*
Applies to: node, area, relation
Definition: Universally Unique Identifier (UUID) for linking OSM objects with external databases
Statistics:

Draft started:
Proposed on: 2010-06-03

Description

Sometimes it is desirable to link to OSM objects from the outside. Sites like Flickr currently achieve this by using OSM table IDs, but these IDs are transitory, and can easily change when a node is converted to an area, merged with other nodes or the OSM object is simply deleted.

There are several ideas how this could be solved. One such idea is assigning each linkable object one (or more) random unique identifiers which are hoped to remain constant. This method solves the problem of identifying objects but not the problem of linking to them from the outside (additional services like XAPI are still required). It doesn't protect objects against deletion. It is currently very rarely used.

This page does not represent a community consensus. It is a proposal only. UUID tagging is rarely used.

There are other ideas for solving this problem, among them the idea of fuzzy queries like those supported by Query-to-map.

About UUID

 Universally Unique Identifier (UUID) is standardised by RFC 4122, v4 UUIDs are pseudo-randomly generated and unlike v5 UUIDs may be better at preventing or reducing collisions, as RFC 4122 and more specifically libraries that implement RFC 4122 already cover methods of pseudo-randomness to generate UUIDs.

Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx with any hexadecimal digits for x but only one of 8, 9, A, or B for y. e.g. f47ac10b-58cc-4372-a567-0e02b2c3d479. For validation purposes, you can use the following regular expression to verify if UUIDs are valid or not:

  • ^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-4[0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$

With suitable entropy being used, the risk of a collision (a pair of duplicate UUIDs being produced), is only likely to occur if 1 billion UUIDs are produced per second for the next 100 years.

While there is a very very small possibility of collisions occurring, the same code to check for improper duplicate of UUIDs can be used to monitor for collisions, this information could then be displayed on a page similar to the node duplication page could be generate periodically. There will need to be checks for deleted or removed UUIDs to make sure they are only removed under valid circumstances.

Mapping in decentralised conditions

Currently it's not possible to map in a disconnected fashion as part of a larger group effort, however it might be possible using UUIDs, for things like emergency situations such as Haiti, there is potentially a lot of people on the ground with over lap between groups.

Once the first person tags an object like buildings on the ground building, they could then print all the information as a QR Code along with a UUID tag, at which point others would be able scan this information into their device and then add any additional information needed.

Once an internet connection becomes available software could then search for the UUID of the building to see if it has been already uploaded by someone else or upload it if not found.

Generating UUIDs against Objects

The most obvious solution would be to have an editor generate them, and this could easily be done via a dialog similar to existing presets, but this may not be the best solution when you could simply have them generated automatically on use, people requesting object identifiers could feed a script or a bot the node, way or relation ID and that bot or script would return the current applicable UUID, or generate a new UUID and return it as well as updating the OSM object with the new UUID.

OSM Object ID Rules

  • External sites and others using internal OSM DB IDs should migrate to using UUIDs, internal references are transitory and longevity cannot be assumed.
  • UUIDs should be unique to a single object in the database, if multiple objects need to share a UUID a relation should be created and the UUID added to the relation instead.
  • UUIDs should be generated as randomly as possible to reduce the risk of collisions.
  • A wiki page, or something similar, should be generate using the UUID and duplicating the description from the database, these UUIDs can then be directly linked to, external reference IDs of objects from imports should also be noted on this page.
  • UUIDs should never be deleted unless the thing the UUID refers to no long exists.
  • UUIDs should never be repurposed.

Example

For example, the UUID for the Texas School Book Depository would move to a new location when the operator of the book depository moved to a new location. However due to the historical significance the building where Oswald shot Kennedy, the building itself would have it's own UUID which wouldn't move when the book depository operator moved.

For example:

  • building=warehouse
  • name=The Texas School Book Depository
  • uuid:building=21d906f1-7a93-49f5-beee-7c126b840a85
  • uuid:operator=e4d7e9a4-14a0-4047-bdab-ba647c772511

While in general most buildings shouldn't need their own UUIDs, third party sites such as Flickr might prefer the building UUID to that of the tenant, for photos of the building, so it may be useful to generate a list of UUID tag types and have them all generated at once.

UUID Tagging

The reason for uuid:<tag> is to prevent conflicts with any existing uuid=* tags, a similar approach was taken with HOT UUID information.

Third party UUID generators

It was suggested we use Freebase.com Freebase GUIDs, however these aren't really suitable as they use GUIDs in the same way as OSM has node/way/relation IDs, Freebase uses MIDs to uniquely identify objects in a similar manner that this proposal is suggesting for OSM objects, but they are relatively small bit sizes and would have a higher probability of collisions. Also Freebase, like OSM, only wants to store objects on interest to it's user base, so they don't really want objects that may be of interest to wikipedia or Flickr dumped into their system, just like we don't want junk being dumped into the OSM DB just to generate object IDs.

SameAs.org is some what interesting in terms of interlinking disparate database IDs, but they don't use IDs specifically, instead they group URLs from various sites together, if we end up building a UUID to tracking database we could have objects accessible by URL via the UUID and then submit these URLs to sameas.org.

At present there is no known third parties that generate UUIDs for generic objects, and even if there was there would need to be due diligence used to verify the stability of the generator as it could be very problematic during times of crisis, such that happened in Haiti earlier this year. At this stage it would be best to have apps generate their own UUIDs as needed rather than relying on third parties.

UUID Implementions

See Also