Open Data License/Suggested Changes

From OpenStreetMap Wiki
Jump to: navigation, search

This is an attempt at collecting suggestions for changes to the 1.0RC1 draft of the ODbL.

If you have major grievances with the license and would like to, say, replace the full text by the CC-BY-SA license text or a Public Domain dedication, then please refrain from putting these suggestions here and open another page (suggest Open Data License/Alternatives) for that. This page should be used by people who agree to the ODbL in principle but see some problems.

The "database dump" problem

Description of problem

Provided that the aforementioned "interim derivative database problem" is solved, then the current draft requires that you make any derived database available on which you base a Produced Work. This is impractical or even impossible in scenarios where you e.g. apply minutely diffs to your PostGIS database, creating a new derived database every minute - you cannot possibly make full PostGIS dumps available every minute. Some scenarios might also make it impossible to provide dumps altogether, e.g. if you load your data into a proprietary data analysis program which does some magic indexing and internal transformations of the data, and you want to publish Produced Works based on the data (say the program outputs cool images), but you cannot provide the derived database that sits inside the program.

Possible solution #1: Grouping changes

The "minutely diffs" situation could be fixed by allowing batched changes, e.g.

  • Add sentence at the end of 4.6: "If practical considerations stand in the way of making available the full current Derivative Database or description of alterations, then making available a full Derivative Database or description of alterations at most one week old will meet the requirements of this section."

This sounds a bit silly in the context of 4.6 alone (because if you give someone a derived database then you should trivially be able to give him the matching alteration file and not one that is a week old!), but it makes sense when combined with the previous suggestion of adding 4.3b which re-uses 4.6 as requirements for making available interim derivative DBs.

This does not solve the basic problem of someone who e.g. uses Mapnik to serve tiles having to make available a PostGIS dump, but it at least removes the requirement to do so every minute.

Possible solution #2: Allowing software as alteration description

The problem could also be solved by allowing an algorithm to describe changes:

  • Insert 4.6c: "A computer program or algorithmic description of alteration rules enabling a knowledgeable recipient to re-create the Derivative Database from publicly available sources."

This would allow you to simply distribute the osm2pgsql source code if someone asks for the contents of your derived database. It would however still not help in situations where your derived data base for example lives inside some piece of software that you cannot publish.

Possible solution #3: Mere transformation does not create derivative

  • Under 1.0, "Definition of Capitalised Words", change the definition of "Derivative Database" to read:
"Derivative Database" – Means a database based upon the Database, and includes any translation, adaptation, arrangement, modification, or any other alteration of the Database or of a Substantial part of the Data. This includes, but is not limited to, Extracting or Re-utilising the whole or a Substantial part of the Data in a new Database. Pure mechanical re-arranging of the Database, such as converting between media types or re-arranging the Database for optimised access, does not count as creating a derivative database for the scope of this license.

(Extremely ugly wording, please fix me; the idea is to convey that if you make any qualitative input to the database then it becomes derivative whereas if you only juggle around the bits you already have then it doesn't.)

This solution would probably suit OSM's needs best but potentially opens the door to lots of arguments about what is derivative and what not!

My attempt at a wording. I'm trying to convey the idea that alteration of the structure of the data, or the way the data is stored, isn't important. What seems to be central to the idea of Share-Alike is that the correction of errors, or the addition of "useful" data is shared. --Matt 00:16, 5 March 2009 (UTC)

"Derivative Database" – Means a database based upon the Database, and includes any modification or alteration of the Data or of a Substantial part of the Data or addition of a Substantial amount of new Data. Pure mechanical re-arranging of the Database, such as converting between media types or re-arranging the Database for optimised access, does not count as creating a derivative database for the scope of this license where such a process introduces no new Data.

Should any new definition also explicitly mention subsets of the data as not needing to be republished? By the definitions proposed above, strictly speaking, downloading only the data for a bbox or for a single country does more than just rearranging or reencoding the data -- it effectively introduces the new data of which portion was selected -- and thus may require publication.

Unfortunately, a subset of a database can effectively add useful information: imagine a republished dataset that only includes the nodes with some independently-sourced property (e.g. all intersections with traffic lights, or all ways with bike lanes). This subset databases would effectively combine OSM data (the intersection locations, or the names of highways with bike lanes) with other data (the information that the selected elements have some property). If all subsets are allowed, one could sell the combined database without share-alike. In this case, if the subset of the nodes selected were expressed as a query, the query itself would contain the non-free data. For subsets of a dataset, the subset database is no more useful than the original database plus the query, since anyone could re-run it to get the same result (provided the query is deterministic and a free/widely available interpreter for the query language exists). This could potentially argue in favor of a rule that says that subset databases need only publish the query that generated them, rather than the resulting database. Personally, I'm more in favor of allowing any subset to be generated without republication for the time being, and not worrying about such a low-bandwidth attack vector unless it becomes a problem. Such an activity would also contravene the clear intent of the license, and to the extent that judges are humans rather than computers, it may not need to be addressed explicitly. --Speight 08:51, 7 March 2009 (UTC)

Possible solution #4: No share-alike for interim derivatives

This whole problem goes away if we remove the demand mentioned in the first section (make interim derivative database share-alike), but this is unlikely to be palatable to OSM.

"Licensor" problem

Description of problem

When someone creates a derived work, apparently they are the licensor of the new database. The ODbL even contains the wording "If you license the Derivative Database", and this means that you are the licensor. The licensor can authorize a proxy to delcare other licenses declared compatible. This would allow them to declare a BSD style license as compatible and get around the Share-Alike provisions.

Possible solution #1

Remove clause 4.4 d, or 4.4 a iii and 4.4 d.

Possible solution #2

Make sure the original licensor remains the licensor of derived databases.

"Governing Law" problem

Description of problem

Unlike most contracts, the ODbL does not contain a Choice Of Law clause determining by which laws it is governed. According to the lawyers, a choice of law provides greater predictability.

Possible solution #1

Choose UK or US law.

Possible solution #2

Don't change anything. The choice of law only affects contract law, but not intellectual property law. The approach of not setting a certain jurisdiction is the same as taken in the Creative Commons “unported” licenses.