Open Data License/Geocoding - Guideline

From OpenStreetMap Wiki
Jump to: navigation, search

Proposed guidelines

Background: What is the problem?

When a third party database is geocoded with OpenStreetMap data, what are the share alike implications for the third party database?

Definition: Geocoding, Geocoding Results, and Geocoder

Geocoding as it pertains to this guideline is a process by which external data is used to construct a query by which an OpenStreetMap database is searched. The resulting response is a Geocoding Result. Geocoding Results are then stored either permanently or temporarily together with the external data used for querying. Geocoding Results can be latitude/longitude pairs, full or partial addresses and or point of interest names. A Geocoder is the program offering Geocoding functionality.

"Geocoding yields produced work" alternative

Geocoding Results are a Produced Work by the definition of the ODbL (section 1.):

“Produced Work” – a work (such as an image, audiovisual material, text, or sounds) resulting from using the whole or a Substantial part of the Contents (via a search or other query) from this Database, a Derivative Database, or this Database as part of a Collective Database.

This is further reiterated in section 4.5 b.:

Using this Database, a Derivative Database, or this Database as part of a Collective Database to create a Produced Work does not create a Derivative Database for purposes of Section 4.4;

As Geocoding Results are a Produced Work, they do not trigger the share-alike clauses of the ODbL. A database of Geocoding Results is a database of Produced Works and as such does not trigger the share-alike clauses of the ODbL either.

However, you must attribute OpenStreetMap properly as described in section 4.3:

Notice for using output (Contents). Creating and Using a Produced Work does not require the notice in Section 4.2. However, if you Publicly Use a Produced Work, You must include a notice associated with the Produced Work reasonably calculated to make any Person that uses, views, accesses, interacts with, or is otherwise exposed to the Produced Work aware that Content was obtained from the Database, Derivative Database, or the Database as part of a Collective Database, and that it is available under this License.

"Collective Database" alternative

A database of Geocoding Results is a derivative database by the definition of the ODbL (section 1.):

“Derivative Database” – Means a database based upon the Database, and includes any translation, adaptation, arrangement, modification, or any other alteration of the Database or of a Substantial part of the Contents. This includes, but is not limited to, Extracting or Re-utilising the whole or a Substantial part of the Contents in a new Database.

A database of Geocoding Results must therefore, where it is publicly used, be shared under the ODbL, and attribution must be given.

The derivative database consists of the data that has been used as the input data for the geocoding process, as well as the data that has been gained from OpenStreetMap in the process. Any additional data that may be linked to this data, even sitting in the same logical database table, is however not considered to be part of the derivative database (instead it forms a collective database together with the derivative database) and therefore, does not have to be shared under the ODbL.

Examples

(1) Adding coordinates to store locations

Consider a chain retailer's database of store locations with store names and addresses (street, house number, ZIP, state/province, country). The addresses are used to search corresponding latitude / longitude coordinates in OpenStreetMap. The coordinates are stored next to the store locations in the store database (forward Geocoding). OpenStreetMap.org's Nominatim based Geocoder is used. The store locations are being exposed to the public on a store locator map using Bing maps.

"Geocoding yields produced work" alternative

The geocoded store locations database remains fully proprietary to the chain retailer. The map carries a notice "(c) OpenStreetMap contributors" linking to http://www.openstreetmap.org/copyright.

This use case complies with the OpenStreetMap license. It would not if it did not credit OpenStreetMap based on section 4.3 of the ODbL.


"Collective Database" alternative

The addresses used as input to Nominatim, as well as the geocoded store locations so computed, together form a derivative database and must be shared under ODbL on request. Further information about the stores that has not been retrieved from OSM, for example, opening times or user comments, do not have to be shared as they form a collective database with the ODbL-licensed addresses.

Would the display map be the only use case? Is this even relevant for this resulting database being a Derived Database and the Share-Alike implications? See the paragraph on the disctinction between a geocoding result and a collection, below. Martijn van Exel (talk) 14:22, 15 July 2014 (UTC)

(2) Adding location names to photos

A mobile photo application uses the current device location (a latitude/longitude coordinate) to perform a search query OpenStreetMap for a corresponding city name (Reverse geocoding). The resulting place name (city, neighborhood, street name or POI name) is embedded into the photo image file (e.g. in the form of JPEG headers). OpenStreetMap's Nominatim is used. The mobile photo application credits OpenStreetMap with "Location information: (c) OpenStreetMap contributors".

"Geocoding yields produced work" alternative

The photo database including the Geocoding Results retrieved through Nominatim is proprietary to the application maker.

This use case complies with the OpenStreetMap license. It would not if it did not credit OpenStreetMap based on section 4.3 of the ODbL.

"Collective Database" alternative

Individual photos and their headers are produced works; however, if a database is created that contains all the photo locations and Geocoding Results, and that database is publicly used, then those parts of the database that were used as input to, or retrieved as output from, the geocoding process form a derivative database that must be shared under ODbL on request.

If the database is not publicly used, but instead just useds by the photographer to organise their pictures, then share-alike does not apply. This holds true even if many photographers all use the services of one application provider who stores the photos on behalf of the photographers and who might or might not keep every photographer's data in the same database. For the purpose of the ODbL there is one derivative database for each client and it is not publicly used.

(3) Searching on a map

A map-based navigation application offers a map search. Users enter addresses and point of interest names in a search box which are in turn used to search OpenStreetMap for corresponding coordinates. Search query results are cached server side and on-device for performance reasons. The navigation application credits OpenStreetMap with "(c) OpenStreetMap contributors".

"Geocoding yields produced work" alternative

The caches used on servers and end user devices containing OpenStreetMap Geocoding Results are proprietary to the application maker.

This use case complies with the OpenStreetMap license. It would not if it did not credit OpenStreetMap based on section 4.3 of the ODbL.

"Collective Database" alternative

The caching of Geocoding results forms a derivative database that must be shared under ODbL on request when it is publicly used.

A cache on the end user's device is not publicly used and therefore does not have to be shared. A cache on a server that is shared between all users is publicly used, and must be shared under ODbL on request. The volatile nature of a cache makes this sharing requirement a very theoretical one; see the ODbL issue page for context.

(4) Verifying and cleaning addresses

(This use case does not technically constitue geocoding and therefore might be out of scope for a geocoding guideline.)

Consider a public database of election donors with names and addresses (street, house number, ZIP, state/province, country). A program performs a search query on OpenStreetMap to verify that given addresses exist (address verification) and where similar ones exist, it updates the corresponding record in the customer database in-place (address cleaning). The election database is then shared publicly with a notice "Location information (c) OpenStreetMap contributors".

"Geocoding yields produced work" alternative

The database is provided free for download under an "attribution-only" license.

This use case complies with the OpenStreetMap license. It would not if it did not credit OpenStreetMap based on section 4.3 of the ODbL. In this example, the geocoded election database is provided under an open license but it could also be kept proprietary and it would still comply with the ODbL.

It would not be possible to distribute the database as a "public domain" or CC0 database because this would violate section 4.3 of the ODbL which requires that "You must include a notice ... reasonably calculated to make any Person that uses ... the Produced Work aware that Content was obtained from the Database" - a PD distribution, even with the required attribution, would make it trivial for downstream users to strip attribution and redistribute and would therefore not be "reasonably calculated" in the spirit of this letter.

"Collective Database" alternative

The database is provided free for download under the ODbL because it has been derived from OpenStreetMap.

A distribution under an attribution-only license or as "public domain" is not possible.


(5) Enriching a geocoding database

A geo services vendor provides a Geocoder based on an improved OpenStreetMap database through a Geocoding web API. Geocoding results returned by the web API contain the notice "(c) OpenStreetMap contributors". The improved OpenStreetMap database is a Derivative Database, so complying with section 4.2 of the ODbL, the vendor makes it available for download on the their web site under the ODbL license.

This use case complies with the OpenStreetMap license. It would not if it did not credit OpenStreetMap or if the improved OpenStreetMap database was not made available under the ODbL.

(6) Address Search on a Connected Navigation Device

Users of the navigation application send an address search query to a cloud/server based proprietary Geocoder. The Geocoder has access to separate and isolated map databases, one of which is solely OSM data. The other database is governed by a non-open license and its contents could not be shared back. If the address is accurately found in the OSM database the location is sent back to the navigation application. If the address is not found in the OSM database then another database is searched, and that result returned. No comparison is done between the OSM and other databases. A user may choose to save search results, which may include results from the geocoding operation, in their navigation device as a personal favorite place. The search results are not saved in any other way. The navigation application credits OpenStreetMap with “©OpenStreetMap contributors”.

This use case complies with the OpenStreetMap license.

(7) Address Search for Points of Interest Databases

A database operated on a server by a vendor of navigation devices contains names, addresses and other information about points of interest (POIs) that are entirely from non-OSM sources. The POI list is geocoded by a proprietary Geocoder. The Geocoder has access to separate and isolated map databases, one of which is solely OSM data.

If the address is accurately found in the OSM database the location is linked to the POI and these POIs and locations are stored in a separate database "A". If the address is not found in the OSM database then the other map databases are searched, the resulting location is linked to the POI and these POIs and locations are stored in another separate database "B". No comparison is done between the OSM and other databases in the geocoding process. The POI databases "A" and "B" are accessed by a connected navigation device.

A user may also choose to save POIs in their navigation device as a personal favorite place.

"Geocoding yields produced work" alternative

The geocoded POIs in database "A" are considered Produced Works. The navigation application credits OpenStreetMap with “©OpenStreetMap contributors”.

This use case complies with the OpenStreetMap license.

"Collective Database" alternative

Those parts of the OSM-derived POI database "A" that were used as input to, or retrieved as output from, the geocoding process form a derivative database that must be shared under ODbL on request.

The database "B" derived entirely from other sources is not affected.

The navigation application credits OpenStreetMap with “©OpenStreetMap contributors”.

The database "C" that may come into existence on the user's device when they save POIs augmented by addresses derived from OSM is also a derivative database but because it is not publicly used, it does not have to be shared.

This use case complies with the OpenStreetMap license.

Open Issues and Use Cases

Mailing list

This proposal has been discussed on the "legal-talk" mailing list. A record is available online.

Distinction between "a Geocoding Results" and "a collection of Geocoding Results" / No reverse engineering

Geocoding a single address may yield a produced work, but if you do this many times and collect the results in a database (for example, adding coordinates to your customer database) you are clearly producing a derived database.

Otherwise I could take a list of all addresses in a given city, run it by the Geocoder, and I'd essentially have a full copy of OpenStreetMap's database of address points that is no longer under ODbL.

This effect is even more dangerous if the definition of Geocoding Result is chosen to mean "latitude/longitude pairs, full or partial addresses and or point of interest names", because then it would even be possible to "discover" POIs in OpenStreetMap by simply geocoding every possible address, and generate a POI extract that is no longer under ODbL. If the guideline is intended to cover reverse Geocoding as well, then essentially a sweep scan of all possible coordinates could re-build the complete OSM database.

A geocoding guideline should make it clear to what extent you can re-combine these produced works in a database without triggering share-alike of the database; for example by putting a size limit on the number of Geocoding Results that may be so combined, for example: "If repeated geocoding requests are made against an ODbL database, and if the results of these requests are not just used transiently but stored in a database, then while the individual geocoding results are not under ODbL, the resulting database falls under ODbL as soon as it contains a substantial extract of the original database (see "Substantial" guideline)."

I believe that we had this reverse-engineer discussion during the license change and that the majority opinion was that the above is self-evident; using produced works to re-combine them into a database will automatically lead to ODbL, no matter if we explicitly say it or not. (Examples were discussed where you'd use a black-and-white bitmap tile - produced work - to trace roads off of.) However I believe that for the sake of clarity we should explicitly state this, lest people think they can essentially build an address database out of OSM Geocoding Results and keep it proprietary.

The Failover Issue and Publishing Derived Datasets

Example 7 glosses over a point that has been raised for example by Steve Coast in the past: are failed geocoding results really free of OSM intellectual property? For clarity: we are not discussing on the fly gecoding as there is no database created and nothing to share.

I don't believe there is a clear and conclusive answer to the above and there is a certain danger of getting in to "how many angels can dance on the head of a pin" type of discussions, so I believe that it boils down to: with what is the OSM community happy? Naturally with the backdrop of the ODbL in mind.

I suggest something very simple: that the set of failed addresses (or more general: input data) should be shared with the OSM community.

Now you might ask why would we be interested in failed addresses? On the one hand these can be mined, just as the successfully geocoded ones, for additional information, for example for house number -> post codes relationships and on the other hand the list of failed addresses is obviously helpful for quality assurance.

And I believe that this, particularly the later point, creates a win-win situation for the organisation doing the geocoding and for OSM. The win for the geocoding organisation is that more of its addresses will be found in OSM and the reliance on third party datasets will be reduced.

Now assuming that a consensus forms around the above, there is still a slightly touchy issue in that companies may not want to be identified as the source of specific addresses. To resolve this I propose providing a facility by which such input datasets can be provided to the community and published anonymously (there is at least one system in existence that could simply be cloned to provide this facility). Note: all of the above only applies to datasets that are being publicly used so there can't be an expectation of a high level of data privacy to start with.