Open Data License/Alteration Files - Guideline

From OpenStreetMap Wiki
Jump to navigation Jump to search


Community guidelines - Alteration files

These are community guidelines, so please put your comments on the discussion page or inline in this page.

Background: What's the problem?

ODbL clause 4.6b, states (in part):

If You Publicly Use a Derivative Database or a Produced Work from a Derivative Database, You must also offer to recipients of the Derivative Database or Produced Work a copy in a machine readable form of:

a. The entire Derivative Database; or
b. A file containing all of the alterations made to the Database or the method of making the alterations to the Database (such as an algorithm), including any additional Contents, that make up all the differences between the Database and the Derivative Database.

What does this offering need to look like? Is it an active offering or an offering on request? We've heard discussions that one could "encode" attributes so that the value for the recipient is almost zero.

The guideline

This is at the proposal stage in our process - it may change after discussion by the OpenStreetMap community

Examples

Open issues, use cases and discussion

Any text here is not part of the formal or proposed guideline!

There are a number of things left open by the license text that would be good to clarify:

The “offer”

  • What form of “offer” is required?
  • The license specifically says in 4.6:

    The Derivative Database (under a.) or alteration file (under b.) must be available at no more than a reasonable production cost for physical distributions and free of charge if distributed over the internet.

    This would mean all internet distributions would have to be free of charge no matter what volume of data is involved. The likely side effect of this is that for high volume data sets people will opt for physical distribution since they can at least charge reasonable costs then although technically internet distribution would in most cases be preferred by both sides.

Alternations

The problem of encoding is to some extent already addressed by 4.7 in the license. Beyond that it would be reasonable to:

  • have a term similar to the GPL term of “preferred form of the work for making modifications” requiring any data to be made available in such form;
  • require distribution in an “open” file format, i.e. in a format reusable with any software complying to its technical specifications.
    This means that the technical specifications of the file format must be themselves published under a free/libre licence or in the public domain, without requiring any reverse engineering of this format using some restricted tools/softwares/technics that may be proprietary (and possibly patented such as some proprietary decryption algorithms, even you you provide the decryption keys), and that to get these specifications legally it should not be required to pay excessive amounts, exceeding the effective cost of conveying these specifications to the requester as this would be an effective form of commercialisation, and provided that these specifications can then be freely redistributed without prior authorization. However these specifications can still be (and will frequently) copyrighted and any form of redistribution will require attribution of their author(s).
    Such specifications could be RFC's, or "open" wikis (with open licences for their content), or in common formats used on multiple platforms and supported by multiple softwares made by various people or companies (such as OpenDocument, HTML, or archives containing multiple files compressed with algorithms like ZIP, deflate, gzip)... Any specification that can be processed by software available under GPL, or BSD licence, or in public domain will be suitable and will qualify the specifications are sufficient to define what is the "open" file format they are describing.
    However this does not mean that the specifications need to be available in English: they may be written in another non-proprietary humane language of the world, and even in other scripts (such as Cyrillic or Arabic). Specifications do not need to be translated but can be freely translated by anyone else that will preserve the original credits.
    Some technical specifications are published and too frequently used, but are unfortunately not open even if they are "standards". This is the case of many (most?) ISO standards, too often covered by exclusive patents and various exclusive rights, that require an excessive payment to get a licence to use them, and that cannot be freely republished, and are also impossible to translate in another language by anyone else than their licensee(s). This is also the case of technical specifications that you cannot reproduce or that have a physical limitation in time or space (you can only consult them privately for your own use), or that require signing a non-disclosure form, even if you don't have to pay anything to get access to them in a dedicated place.

Both of these would equally apply to the distribution of the full derivative database of course.

Algorithms

The possibility of making available an algorithm is most vague in the license. In particular the following is unclear:

  • Would it be fine to make available a binary blob of the algorithm for some specific computer platform designed to specifically reproduce this particular derivative database from one particular version of the original database? Even if making available the source code is required a malicious data user might be inclined to offer only an inefficient version of the code making reproduction of the derivative database very costly for anyone.
  • What kind of documentation and readiness for use is required? Some baseline what level of support the recipient can expect and the provider has to expect to give would be good.
  • What license terms may be imposed on use of the algorithm? Theoretically you could provide an algorithm but forbid any use of it (or impose some arbitrary conditions).

Use case: OSM + PD data

One probably quite common use case is merging OSM with public domain data from other sources. Even though the other data is freely available share-alike applies if the combination of the two data sets is non-trivial. If this is high volume data it might be preferred, possibly by both sides - to make available the algorithm instead of distributing large data sets.

  • IMHO, "public domain" data (from other sources) are NOT enforcing any share-alike restriction, they don't even have any attribution requirement (unless stated or required by laws in your juridiction). This means that this public data can be freely republished under another licence, notably the ODbL, provided they are really "public domain".
It is still fair to give a reasonable attribution in OSM by citing the source (and for massive imports, this is a requirement and must be discussed with the OSM community, and such imports will also need to comply with our Contributor Terms).
However outside these imports in OSM, creating any derived product featuring OSM data + PD data, does not require any authorization from the OSM community, this is possible directly by anyone that complies to the ODbL terms (without needing to accept the contributor terms, and without needing to cite the source for these PD data), and it will just be fair (not required except in some juridictions, but it's a common good practice) for the derived product to name also the public domain source to give a reasonable credit to this additional source (but this additional data may have been freely altered by your application and may not be exactly the original version from the source).
Note that "public domain" does not protect very well against reappropriations (i.e. being taken by someone to create exclusive copyrights or patents). Laws may consider that such reappropriation is legal provided there's been some minimal "substantial" changes which are much more weakly defined by laws than by what our OSM licence and guidelines explain. For this reason, it is better to protect your work from reappropriations by requiring attributions: this will not only protect you, but also your own reusers of the derived products from later claims by someone appropriating your work: this is what the "CC-BY" license (or similar one-clause BSD or MIT licenses) provides and secures for the long term, by using an explicit "copyright" statement separately from the very liberal license. But the sui generis database right has created another possiblity of reappropriation not covered by copyright: for that we prefer open data licenses (ODbL, LO/OL or similar state-sponsored open data licences compatible with ODbL) defining what is "subtantial change" so that the copyright cannot be voided by later claims by third parties making very minor changes (such as trivial/automated transformations of format).
Verdy_p (talk) 16:28, 9 May 2017 (UTC)