Import/Guidelines

From OpenStreetMap Wiki
Jump to: navigation, search
Help
Available languages
English 日本語

As always in this project there are few hard and fast rules. But there are some guidelines. Of course, all of this is open to discussion, please use the Mailing lists and discussion page.

You can also look at the catalogue of imports to see how other people have done it.

Contents

Create a community

OpenStreetMap is all about building a great map by attracting a large community of mappers. While data imports can help with improving coverage rapidly, recent simulations suggest that imported data can cause problems with the growth of a community. It's actually far more important to go out and run lots of mapping parties, get lots of publicity out there, and get local people on the ground.

Make sure data license is OK

We are only interested in 'free' data. We must be able to release the data with our OpenStreetMap License. Obviously we are allowed to use public domain data sources, of which there are quite a few, but beyond that, it gets more complicated.

Even though our license (at the current time) is 'Creative Commons Attribution-ShareAlike 2.0' this does NOT necessarily allow us to import other CC-BY-SA-2 data. There is potential problem with the attribution requirement. We can offer some attribution: We can credit them on our website (not on the homepage, but in the Contributors page here on the wiki) We can link information about them in relation to the user account performing import edits, meaning the editing history will allow people to trace the source of the data donation. We can also set their name in the 'source' tags of our underlying data. This is perhaps more prominent, but may be removed by editors doing further mapping work. The credit to the "author" stops there. What we certainly cannot do is require end-users of our data/renderings to give credit to the particular data donor. With this in mind, our attribution may not be sufficient legally speaking and might actually be considered unsatisfactory by the original "authors" of the data.

The same problem applies to any data which is described as "public domain apart from we require credit" (We had this doubt about LINZ data and Canadian government data for example)

Licensing issues can be hideously complicated. It is important to realise that many aspects of complying with this license are open to interpretation. The current license doesn't apply very clearly to geodata, or indeed to wiki-style collaboration.

If you can seek full approval from the creators of the data for it to be imported into OSM, being sure to explain what this means in terms of modification, redistribution and getting / not getting credit. ...then this is regarded by many to be sufficient (e.g This is what we have for AND Data)

Additionally, we often find that data that purports to be available under a compatible license has been ultimately derived from sources that we consider to be non-free. For example, although some geodata is available from Wikipedia under a Creative Commons license it is a widely held belief in OSM that some of the data is simply derived from Google Maps, and therefore not actually available under that stated license. In such cases it is an established community norm to not import data whose provenance is uncertain, regardless of the stated license. Better safe than sorry.

Another important consideration is the proposed re-licensing to the Open Database License which is now under serious discussion. Consider how your import may be affected, and if you are approach organisations asking for permission, then make them fully aware of the situation. Although tricky to explain, it's easier than having to re-approach these people later.

You should not claim an additional copyright for yourself as the importer. For example, if you import public domain data, you should not seek to restrict the use of your imported data. Your import account must not refuse any permissions that were given by the original creators of the data you're importing.

Discuss import with community

It is important to discuss your proposed import with the community at every step. First of all add an entry to Potential Datasources. Here you can briefly describe what you have found out about the licensing of the data, and the data's accuracy with respect to data we already have. If you need more space, link through to a new wiki page about the data source.

Discuss your import on the imports@openstreetmap.org mailing list and with appropriate local communities. Many local communities have their own wiki pages and/or a Mailing lists. Coordinate with other people with similar plans.

Even if the same or similar import has been discussed before, you should still discuss it with the local community. This means that they are aware of your plans and can raise any issues or clashes before any damage occurs. This is especially true if the data has been available for a long time and has not yet been imported - this does not mean it is acceptable to proceed without discussion with the local community.

Always start by discussing the investigation you have done into licensing and accuracy. If the consensus is that the data doesn't meet our criteria, don't be disappointed. Label it as rejected on the Potential Datasources page, and give the reasons. Documenting such decisions is a helpful contribution in itself. If people are happy with it, move on to discussing implementation of import scripts etc.

Document your import on the wiki

If you are going ahead with your import, please create a page about it on the wiki, with all the details. Create an entry on the Import/Catalogue page and link from there to your page. Also link to your page from local Mapping Projects pages. The page should have the following details:

And as the import progresses

Use a dedicated user account

Create a new user for the import. Do not use your standard OSM user account. This can be very useful, as the user page can be used to collate data relating to the source and contact details for the import. Furthermore, it means that attribution can often be carried in the account's display name, or in the account's user page, which is better than putting it as a tag, as the user's editing history is a permanent record of the source and doesn't interfere with tags or increase the size of the database as much. For these reasons, creating a dedicated user account is preferable to using a source=* tag.

Define your own tag prefix

You probably have some meta data like the IDs used for your original data. Define your own prefix and use that on all the tags for this data. The TIGER import for instance uses the "tiger:" prefix. The original ID of a TIGER object is tagged as "tiger:tlid".

Don't go overboard with meta-data. Your data source may have many many fields but, OSM data elements with many many tags can be difficult to work with. Strike a balance. Figure out (discuss!) what fields the OSM community are interested in.

Keep server resources in mind

Make sure you don't overload the server when importing large amounts of data. The TIGER import had to be spread out over several months to not kill the central server! Import the data in small installments or otherwise slow down your import scripts. If in doubt, talk to the System Administrators.

Don't screw up the data!

This should really go without saying, but don't screw up the OpenStreetMap data! Always think about it from the point of view of ordinary OpenStreetMap contributors working in Potlatch and never assume that those people will happily clear up your mess. If you don't have experience of working in Potlatch yourself, then you shouldn't be performing imports. JOSM tends to be slightly better for untangling messy data, but it's still fiddly. In any case most users (particularly new users) are using Potlatch. Will your data spoil their experience of OpenStreetMap editing? If so, we don't want it.

Do not ignore existing data and import new data over the top. In general it is a bad idea to put data on top of data (see data notes below), but also you must always remember that existing data may be data that a real user cares about and is maintaining. You might try to determine this with automated/semi-automated methods, and treat the existing data accordingly. For example in areas where real users are working, you might decide to leave the existing data alone.

If an import goes wrong, or you needed to interrupt an upload half way through, this should be cleaned up (reverted) immediately. If help is needed, contact Imports and/or Talk. But the import won't go wrong because you tested it carefully on the test database – right?

If you don't know how to revert an import, don't do the import in the first place.

Specific data guidelines

Don't put data on top of data

Unlike traditional GIS systems, OpenStreetMap has no concept of layers. Data on top of data is just a mess. It's a kind of mess which makes it very difficult for real users to work in the normal OpenStreetMap editors. The Duplicate nodes map reveals imports have not followed this guideline (Rogue TIGER importers caused a lot of this, but sadly there are many more recent messed up imports)

If your data is in a layered traditional GIS format, you'll need to take a different approach. Perhaps merge the layers and calculate the best aggregate tags, but you can always avoid directly importing data, and instead set up a source for users to manually and selectively import from, or a WMS to trace over (like Natural Resources Canada -Toporama)

Consider simplifying

Shapefiles often include too many details, i.e. more nodes than necessary to represent curves, or more than two nodes representing a perfectly straight line. You'll see this particularly with large landuse areas that have nodes a few meters apart or appear jagged because the resolution isn't fine enough or is too fine. Tools such as Map Shaper can be used to simplify shapefiles that have too many details. Remember to think about how the data looks and can be worked with in Potlatch.

Personal tools
Namespaces
Variants
Actions
site
Toolbox