Open Data License/Metadata Layers - Guideline

From OpenStreetMap Wiki
Jump to navigation Jump to search

These are community guidelines, so please put your comments on the discussion page or inline in this page.

Background: What's the problem?

Considering a site which collects restaurant reviews, which is subjective data outside the core ideas of OSM (see Verifiability), but the site also hosts an OSM-derived database and links to OSM nodes via their IDs (primary keys). Would the database be considered Derivative or Collective?

If the database is considered derivative then the reviews would have to be released. We assume that any linkage to private data, such as the user records, isn't necessary to release as that data wouldn't be linked directly to ODbL data. And it would be pointless, as then no-one would ever be able to use OSM data with their own user base.

The Guideline

Proposed Metadata Guideline

Possible solutions

Fairhurst doctrine

The "Fairhurst Doctrine" (previously discussed on the mailing list) solves this by saying that any mapping of machine-produced primary keys (e.g: OSM node IDs to review IDs), as it doesn't "represent ... significant investment", doesn't meet the definition of "Substantial", therefore is not a Derivative Database.

However, there must clearly be limits to this. For example; the OSM data could be loaded and the schema extended to have an "extra_node_tags" table, which is then used for any further data added to the node. This should be considered an addition to the database, despite the attempt to make it look like linking via primary keys.

[Remark Oliver: I really like this first concept. I think to limit the principle it should exclude cases where "extra_node_tags are solely used for the purposes of bypassing the share-a-like principle". There should be nicer term in the legal language. A similar aspect exist in contract law where the "arm's length principle" is used to avoid tax payments or other consequences.]


Another solution could be to define derivativeness by whether the linked data (for example, key/value pairs) is something OSM is "interested" in collecting, or having contributed back. The result would, of course, be dynamic. It was suggested that a list be drawn up with things which are "always wanted in OSM", "never wanted in OSM" and a grey area for refinement in-between. It might also be possible to derive a definition from the Map features page, other documentation or statistics on existing data.

However, this would almost certainly present a moving target and could be indistinct. It's possible this would discourage people from using the data for fear that their linked data could become "interesting" at some future point, and have to be released.