Forking

From OpenStreetMap Wiki
Jump to navigation Jump to search

Forking is a well known concept in software development (wikipedia:Fork (software development)) but the same can apply to open data projects, and to OpenStreetMap. The "right to fork" is part of what makes an open license a powerful guarantee of long-lasting freedom for the data you are contributing.

You can create a copy of the whole database (as permitted by the open license) and start up a separate project in which this new copy is maintained and grown in a new direction. Maintenance and growth of OpenStreetMap will continue in parallel.

This might involve "forking the community", meaning forming a breakaway faction of the community who would move to the new project and carry out this maintenance growth as a similar open internet collaboration. Because that reduces the size of our community and creates on-going "competition" for contributors and users, this might naturally be interpreted as a hostile act against our community. On the other hand, if we have a group of dissatisfied folks among our community, cutting them loose to find their own path may be quite healthy. Some people have advocated a very forking-friendly approach, in which OSMF resources are shared across multiple forks.

This page used to be part of an extensive discussion of such pros and cons, which took place during the ODbL license change process of 2012. At that time discussion on the strategic mailing list and earlier versions this page were intending to provide the strategic working group with analysis and options on possible action, if indeed any action by our organisation would have been appropriate (This was primarily discussion around supporting forks within the OSM community's umbrella). As it turned out a breakaway faction did eventually get themselves sufficiently organised to do an independent fork on their own initiative: fosm.org. This remains the one and only example of a full OpenStreetMap fork.

Some technical details of forking

A fork involves a separate GIS dataset with a separate API which may start as a complete copy of the main OpenStreetMap database, or as an extract based on contributors that granting their permission or to start from scratch. The original database may be referred to as the "trunk" and the forked database as the "branch". In the context of OSM, the "trunk" may be considered the default dataset for new users.

Beyond the basic underlying database and API, duplicating of all OSM web, tile and API servers, and scaling these to cater for a desired number of users, is a massive technical undertaking.

Forking involves creating a copy or extract of the main database and then maintaining and editing these two branches independently. As edits accumulate, the contents of the database tend to diverge. Although they will often contain the same highways and POIs, their object IDs will be different (unless UUIDs are used). Data in one database does not correspond with data in the other database, or only weakly correspond, even though they might represent the same real world objects. In some cases data can be transferred from database to another, but if different licenses are involved then legal restrictions may make data sharing one way only or impossible. In any case, the divergence of the data will also make the transfer of data into an already mapped area a laborious and very tedious task. This has been attempted in several instances and with varying degrees of success during the import of external databases and this causes localised convergence of databases.

A fork doesn't necessarily need to use the same mapping conventions (indeed that may be the reason for a fork). Changing the mapping conventions would make migrating data more difficult.