Open Data License/Trivial Transformations - Guideline

From OpenStreetMap Wiki
Jump to navigation Jump to search

Community Guidelines - Trivial transformations

These are community guidelines, so please put your comments on the discussion page or inline in the Open issues, use cases and discussion section of this page.

Background: What's the problem?

The ODbL relies on language from the EU Database Directive, which isn't perfectly clear about when a database is derivative and when it isn't. This is because the laws are relatively new and untested in court. Therefore it is important that we can provide guidelines about what we consider to be derivative and what isn't.

See the section towards the end of the page about the language currently in the ODbL and possible consequences of that.

We therefore define a term "trivial transformation", (there's no reason it has to be called that) which covers alterations of OpenStreetMap data which are not considered interesting or useful enough to warrant the conditions of a derivative database. The word "trivial" should not be interpreted as technically trivial, but is intended to be "trivial" in the sense of modifications or additions to the data. For example, adding or correcting data would not ever be considered "trivial".

The guideline and examples

Status: Endorsed by the OSMF board 2014-06-06. Read the formal text.

Open issues, Use cases and discussion

Any text here is not part of the formal or proposed guideline!

The following are questions raised about uses of OSM data, and the implications of the proposed guidelines can be compared against them:

Rendering databases

Rendering databases, for example those produced by Osm2pgsql, are clearly databases and clearly derived from OSM data. Under the ODbL, is it necessary to release either the rendering database itself (which would probably be extremely large) or the algorithm (in machine-readable format?) to produce it? This is fine for Osm2pgsql databases, as the source is already open, but is it too onerous for proprietary rendering databases? If nothing has been added to the data, is it more useful to the project to have these extra users in our ecosystem?

Conclusion: A rendering database should be considered a trivial transformation provided that it is purely created from an algorithmic recasting of the original data with the intent to make rendering 100% OpenStreetMap-based maps or map layers easier and faster, since no information has been added. MikeCollinson (talk) 16:19, 11 March 2014 (UTC)

Routing databases

Routing databases are similar to rendering databases, in that they are an output of OSM data and would be databases by the definition in the ODbL and EU Database Directive. Routing databases are more interesting, however, as they operate on a much smaller subset of the data, do not necessarily transform the data and could be considered (or implemented) as indexes in addition to the original database.

Conclusion: A routing database may be considered a trivial transformation provided that it does not include or rely on any other external source of information to help provide a better route.

Geocoding databases

Geocoding databases are even more like indexes and, in the simplest case, could just be a modified schema with an index on the "name" tag.

Conclusion: A "geocoding database" should NOT be considered a trivial transformation. It is not clear from the above text what such a database means in practice. Normal some other data is being geocoded, so it is not just an algorithmic transformation. Geocoding is being considered for a separate community guideline.

Coordinate transforms

The natural coordinate system for OSM is WGS84 lat/lon. If a database were produced in a different coordinate system, but was otherwise identical to the OSM database, would it be considered "trivial" in the sense that nothing useful or interesting had been done to it?

Conclusion: Yes, it is a trivial transformation. The change is purely algorithmic and, in this case, exactly duplicatable by publicly available algorithms and mathematical formulae.

Loading database dumps

Loading a database dump is extremely unlikely to produce the exact same database (in terms of disk layout, index structures, etc...) as the original database. But very little would be gained from considering this to be non-trivial, since any common format dump (e.g. SQL or OSM-format XML) would likely be identical to the input. Is it worth considering changes database implementation (although probably covered by ODbL and EU Database Directive) to be non-trivial, given that this would put an obligation to redistribute or release the machine-readable algorithm (i.e: code) on every user of OSM data?

Conclusion: This is a trivial transformation. The data format is being changed but the information conveyed about the physical observations has not. Variations in such things as disk layout provide no value in the usefulness nor extent of the data. A key test is: Can the information be converted back into OpenStreetMap's common OSM-format XML (or whatever replaces it in the future) and still be directly comparable to an original?

Specialised (e.g. mobile) format databases

Many devices have specialised requirements or restrictions that necessitate specialised formats. In converting OSM data to these specialised formats it is likely that a database is created (it's a "collection of material arranged in a systematic or methodical way and individually accessible by electronic or other means"). ODbL already has "technical measures" clauses to ensure that the data isn't encrypted or otherwise restricted from re-use, so anyone wishing to keep their format proprietary already can as long as they also redistribute in an open format. Is it worth putting further restrictions on those open formats by requiring them to also release their code (presumably a format specification would be required in any case)?

[Remark Oliver: I think we should also make some negative examples for clarification, which are other cases than Derived Work respectively "trivial transformations" e.g. putting a POI layer on top of a map > collective data base]

What the ODbL says

The current situation seems to be directly addressed by ODbL clause 4.6b, which states (in part):

If You Publicly Use a Derivative Database or a Produced Work from a Derivative Database, You must also offer to recipients of the Derivative Database or Produced Work a copy in a machine readable form of:

a. The entire Derivative Database; or
b. A file containing all of the alterations made to the Database or the method of making the alterations to the Database (such as an algorithm), including any additional Contents, that make up all the differences between the Database and the Derivative Database.

This would seem to mean that any database ("collection of material arranged in a systematic or methodical way and individually accessible by electronic or other means") would require the release of the entire database, alterations or algorithm. Given that any alteration of any sort falls under the definition of a Derivative Database ("a database based upon the Database, and includes any translation, adaptation, arrangement, modification, or any other alteration of the Database"), including changes to the schema or indexing ("modifying the Database as may be technically necessary to use it in a different mode or format" counts as "Use"), it would seem that any such database is a Derivative Database.

This raises some interesting questions about how far we would like the ODbL to extend. Clearly, it already extends far enough that, because loading an OSM dump into a database will almost certainly cause it to be formatted differently, any database could be a Derivative Database. However, this seems counter-intuitive and counter-productive. The intent of the "share-alike" part of our license is to ensure any improvements or interesting modifications are shared, but not necessarily the trivial ones, as this becomes onerous and begins to put people off using it, which was the point of releasing it under an open license in the first place.

[Remark Oliver: I think it would be helpful to clarify the statement "You must also offer to recipients of the Derivative Database or Produced Work a file containing all of the alterations". How does this offering need to look like? Is it an active offering or an offering on request? I often hear in discussions that one could "encode" attributes so that the value for the recipient is almost zero.]