Forking

From OpenStreetMap Wiki
Jump to: navigation, search

This page discusses the pros and cons of creating coexisting forks in the OSM database under alternative licenses. The goal of this page is to provide the strategic working group with clear options on possible action. Discussion of the contents of this page should be conducted on the strategic mailing list or the talk page. Comments in the incorrect sections may be moved to the talk page to keep the page coherent.

A fork is a separate GIS dataset with a separate API which may start as a complete copy of the main CC-BY-SA database, or as an extract based on contributors that granting their permission or to start from scratch. The original database may be referred to as the "trunk" and the forked database as the "branch". In the context of OSM, the "trunk" may be considered the default dataset for new users.

Contents

Background

Note: This section should focus on factual information that is not disputed. Please move all disputed claims to the section on arguments.

Currently the OSM database is licensed under CC-BY-SA. The licenses determines how the database may and may not be used. There are many alternative licenses, each with a different set of restrictions. Broadly speaking, CC-BY-SA includes a general disclaimer, a requirement for attribution and a requirement to share data back with the community. Some licenses include a non-commercial clause. The project was founded to provide "open" data but it is controversial as to which license best suits the project goals.

Possible licenses include:

OpenStreetMap creates mapping data for the benefit of the map users. As it says on the front page of the wiki: "OpenStreetMap creates and provides free geographic data such as street maps to anyone who wants them." Although the existing license suits many users, not all users want to or are able to abide by the attribution and share alike terms. Their difficulty with any particular license might be practical, legal or political in nature. Also, different users have different requirements for the type and properties of GIS data (coverage, feature richness, accuracy, etc). If map users were faced with a choice of databases, they can generally only select a single database that is most suitable. The amount of usable data is the amount in the data set a user selects, rather than the overall totality of all data held by OSM under the individual licenses.

Forking involves creating a copy or extract of the main database and then maintaining and editing these two branches independently. As edits accumulate, the contents of the database tend to diverge. Although they will often contain the same highways and POIs, their object IDs will be different (unless UUIDs are used). Data in one database does not correspond with data in the other database, or only weakly correspond, even though they might represent the same real world objects. In some cases data can be transferred from database to another but legal restrictions make data sharing one way only or impossible. Share alike licenses tend not to be compatible with other share alike licenses. The divergence of the data will also make the transfer of data into an already mapped area a laborious and very tedious task. However, this has been attempted in several instances and with varying degrees of success during the import of external databases and this causes localised convergence of databases. Total divergence of the data is unlikely, as well mapped data that was in the original parent database only changes slowly. Given that the two databases are independent and migrating the data between them might be impractical or impossible, the only way to update both databases may be to survey an area multiple times.

OSM and OSMF have limited resources. Databases and websites need money, hosting, hardware, administration and developers to function correctly. Particularly, the sysadmins are burdened with an already complicated system. Assuming that the OSM tools are to be used in common with forks, this would require the databases to be maintained to keep in step with API changes, and this involves additional effort of sysadmins. The addition of adding a fork within OSM may require additional software development, if user names are to be kept in common or other changes are needed to suit the forks specific requirements. This particularly affects the central server software. Software development consumes developer resources.

Whether forks would use the same mapping conventions, including map feature tags, is an open question. Changing the mapping conventions would make migrating data more difficult. However, specialised tagging systems may address perceived weaknesses in the current tagging system.

This page primarily discusses forks within the OSM community's umbrella. However, several people have voiced the desire to start independent forks. If forks were conducted within OSM, decisions effecting the forks could be taken within OSM. Common user names shared between fork databases might reduce confusion. Externally hosted forks would have more independence.

The OSM community's view on the best license is split, although there seems to a majority that will accept relicensing to CT/ODbL. The issue of forking has been raised on the mailing lists, with calls for and against forks. There does not appear to be consensus at the time of writing.

Given this overview of the situation, the best course for future action is disputed. The fundamental question: Do the disadvantages of forking outweigh the advantages? Or visa versa?

Possible Scenarios

There are an unlimited number of license possibilities, but some commonly suggested options include:

There are also a wide range of possible approaches to implementing a fork. The approach used should be determined by the level of interest and available resources, and likely falls between these two extremes:

Host a fork on a server, which is also used for other OSM activities. This can only cater for a small number of users. There is no map renderer of the data.

Duplicating of all OSM web, tile and API servers to cater for a large number of users.

Arguments for and against

TODO: link each to the talk page for rebuttals

Multiple Databases Fulfil Different Mapping End Users Needs

As stated on the wiki front page: "OpenStreetMap creates and provides free geographic data such as street maps to anyone who wants them." (emphasis mine) Users come in a wide range of circumstances, including individual users, commercial users, NGOs, charities, researchers, etc. They use maps in a diverse range of activities. Their requirements and priorities for data, and the license that is used, is therefore different. Data licenses come in various forms, each determines how data may or may not be used. OSM aims to provide data to anyone who wants it, not just those that can abide by a single licensing regime.

Some institutions can only publish PD data and therefore can only use PD as a data source (e.g. USGS). Although their data has been imported to OSM, share alike licenses are not suitable for their use.

Many governments and institutions have released data that may be incomparable with what ever license OSM is currently using. Some of this data is useful for end users. To avoid this data import being excluded, multiple licenses are needed to import the data, allow the community to maintain the data and for the end users to use the end result. If OSM shifts to ODbL/CTs, this might apply to a CC-BY and CC-BY-SA data sources, which are the most popular licenses for GIS data sharing (but this is to be confirmed).

Companies or academics may want to merge proprietary data with OSM and distribute the result. The proprietary data cannot be released either because it is commercially valuable, or the data has been provided with license restrictions. Both CC-BY-SA and ODbL/CTs forbid this, because of their share-alike terms. However, a non-share-alike approach enables greater user flexibility, and is somewhat analogous to the popular BSD software license.

The ODbL is legally complex. For a user with usual requirements and without legal resources may be unable to determine the legal implications of a complex license, such as ODbL (See CC's comments). An alternative fork may have the data a user requires and the legal simplicity. This encourages use of OSM data.

Some users may be unable or unwilling to attribute OSM. This may be for aesthetic reasons, GUI design simplicity or some other motivation that we can't guess at. Having a fork with a very permissive license would enable use of OSM data.

OSM Resources Would be Better Used Elsewhere

[1] [2]

The single existing database that OSM has created is continually increasing in size and complexity. The number of mapping editors and users is rising and will put additional load in the servers. This demands additional hardware, hosting, development, system administration and other resources to be used to maintain. Many of these resources are already stretched, particularly the sysadmins.

Given that forking requires additional hardware, sysadmins, etc, this will detract scarce resources from the main database. Forking would not attract significant additional resources to maintain it. The needs of the main database should take precedence over a fork.

TODO: Link to mapping resources, lower accuracy issue.

TODO: provide more concrete examples

Some mapping contributors prefer different licenses

Mapping contributors are involved with OSM because they agree with its general aims. OSM aims to provide "open" mapping data. This are various schools of thought on what openness actually means and how it might be achieved. There is some disagreement within the OSM contributors. Some are ardent share alike supporters, some are ardent public domain supporters and some have no strong view. There are also divisions as to which license best achieves share alike and PD. If a single approach is adopted, a section of potential mappers are alienated. Also, mappers that have not yet discovered the project are excluded if the license doesn't suit their choice. To keep most mappers happy within the project, and to attract the most new mapping contributors, forking is better than any single licensed approach.

Including the issue of Not Forking Effectively Coerces People Who Want a Different License.

A Single Database Provides Focus

A goal of OSM is to create the most complete map of the world. A single database is the best way to achieve these, as resources and mapping can me more focused on providing a single, integrated database.

[3]

Forking Would Fracture the Community

[4]

Forking Might Prevent a Split in the Community

Diversity is good, analogy with OSM tools

OSM has multiple editors, renderers, loggers, navigation apps, routing applications. This has only illustrated how diversity is a good thing. By analogy, different licenses might improve the project. This diversity is an indication that the project is healthy and is an end in itself. If this were not the case, hadn't we better remove the alternative OSM tools?

Forks Will Confuse Mapping Contributors

[5]

Forks Will Confuse Map Data End Users

[6]

Confusion Surrounding Forks Can Be Mitigated

Different Databases Makes the Project Appear Divided

[7] [8]

OSM Has, or Can Obtain, Sufficient Resources

Including the point that forking may attract additional resources and interest.

OSMF Opposition Would Prevent a Fork

[9]

Forking Will Increase Quantity of Abandoned Data

[10]

Re-licensing Would Create Abandoned Data, Which Can Only Be Maintained by a Fork

[11]

Forking will Compromise Completeness and Accuracy

[12]

Don't Put All Your Eggs in One Basket

We don't know what the future holds in terms of legislation or the future of GIS. To maximise robustness to unpredicted events and requirements, diversifying our database licenses would enable the most flexibility.

Independent Fork Rather Than an OSM Hosted Fork

OSM Hosted Fork Rather Than an Independent Fork

License Proliferation is Bad

Other Resources

OSM Fork discussion group, Strategic working group mailing list

CC-BY-SA fork on this wiki

PD Fork mailing list

List of forks in planning or production

Personal tools
Namespaces
Variants
Actions
site
Toolbox