Import/Guidelines

From OpenStreetMap Wiki
Jump to: navigation, search
Available languages
Deutsch English español français italiano 日本語 português português do Brasil

Please keep in mind that the current OSM database represents an extremely large amount of work by the volunteers of the OSM community. Because imports usually refer to the mass input of large amounts of data, whether through an automated process by 1 person or a carefully curated process by a team of people, there's an increased risk of larger-scale damage to the database. Hence, it is critical that all imports are approached with caution and the proper amount of planning.

These guidelines are intended to help people that are interested in importing data into the OpenStreetMap database while at the same time protecting the data contained in the OSM database. Many OSM activities support a wide degree of latitude in end-user discretion, but imports are a bit more sensitive and require more careful planning.

Imports have the potential to introduce significant problems into the OSM database and should be considered thoroughly. Also, attracting a large community of mappers is considered by many as a core necessity for building a great map. While data imports can help with improving coverage rapidly, simulations suggest that imported data can cause problems with the growth of a community. Going out and running lots of mapping parties, getting lots of publicity out there, and geting local people on the ground might sometimes be a better long-term strategy than importing data.

Following these guidelines is not guarantee that an import will be acceptable and not following them does not mean there will be problems with an import. However, these guidelines embody many lessons learned throughout the history of OpenStreetMap and should be reviewed by anyone interested in importing data while protecting the existing OSM database.

The Data working group is tasked by the OSMF to detect and stop imports that do not comply with guidelines. So, not following these guidelines may put your account at risk of being blocked (http://www.openstreetmap.org/user_blocks).

Of course, all of this is open to discussion, such as on the imports@ mailing list and this discussion page.

Process

If you think your city/county/state/country government, a non-profit, or some other organization or person has great data that could be used to improve the quality of OpenStreetMap, here's what you need to know. We'll start with a quick overview of the process to get you started on the right path. Most of these areas are expanded in further sections of this page and on related pages.

Step 1 - Prerequisites

  1. Gain familiarity with the basics of OpenStreetMap, including editing, such as adding details of your neighbourhood. Consider following the beginners' guide.
  2. Learn about the history of imports and the concerns surrounding them. See: Import/Past Problems
  3. Identify data you'd like to import. This might be street centerlines, building outlines, waterways, addresses, etc.

Step 2 - Community Buy-in

  1. It is recommended that before any actual work is performed on the import that contact is made with the community to see if there is interest in importing the data. Different geographic areas in OSM have different acceptance levels for imports. The exact same kind of data set might be welcomed in one area and be rejected in another.
  2. Discuss your plan. Email the OSM community to notify them of your plans, including a link to your wiki page. You can do this with an email to (at a minimum) imports@openstreetmap.org, talk-(your country)@openstreetmap.org, and the OSM group specific to the the area directly impacted by the import. This will help gain the benefit of past experiences, which may include having already reviewed the data you're considering for import. Check for local user groups, local chapters, and country-specific mailing lists.
  3. Be prepared to answer questions from the community. Discuss with the community the suitability of each layer for importing. Some data can be readily imported without much difficulty, while others are far more difficult (e.g. street centerlines). Also some are broadly accepted for import, while others haven't had much consensus (e.g. parcel boundaries).
  4. More complex and large-scale imports should be reviewed with the assistance of more technically-oriented and experienced OSM volunteers.
  5. It is required that the local community accept the import. Without local buy-in, the import can't proceed.

Step 3 - Documentation

  1. It is required to obtain proper permissions and licenses to use the data in OSM from the data owner. If the license of the data is not compatible with the OSM Open Database License, you can not use the data. Many localities already have progressive open data policies. Others have data policies that are almost open, but have conflicts with issues like prohibitions on commercial use or requirements for attribution. Sometimes, getting permission to use data, even if the existing license might seem prohibitive, is as simple as asking the appropriate authority if they are willing to comply with the terms of the OSM Open Database License. See Import/GettingPermission for example emails that touch on important issues.
  2. It is required to register your permissions and project by adding a line to the table at Import/Catalogue.
  3. Cite contributions by the data owners, if necessary to cite them, at: Contributors.
  4. It is required to write a plan for your import in the OSM wiki. Create a wiki page outlining the details of your plan. This plan must include information such as plans for how to convert the data to OSM XML, dividing up the work, how to handle conflation, how to map GIS attributes to OSM tags, how to potentially simplify any data, how you plan to divide up the work, revert plans, changeset size policies, and plans for quality assurance. An example for this can be found at Import/Plan Outline

Step 4 - Import Review

  1. It is required to post your import for review on the imports@openstreetmap.org mailing list. Don't upload any data until the project has been reviewed first.
  2. If possible, prepare the data and make it available for review.

Note that imports@ mailing list posting is closed for non-subscribers. If you truly want discussion there, it should be open list.

Step 5 - Uploading

  1. Follow your plan.
  2. Track your progress.
  3. Provide updates to the community on your efforts.
  4. Let everyone know when you're done.
  5. It is required to use a dedicated user account.

Make sure data license is OK

We are only interested in 'free' data. We must be able to release the data with our OpenStreetMap License. Obviously we are allowed to use public domain data sources, of which there are quite a few, but beyond that, it gets more complicated.

OpenStreetMap moved to the Open Database License in September 2012. Your data must therefore be compatible with that. In addition, you must be able to agree to the Contributor Terms for your import account, which includes provisions to relicense under another free and open license if the community wishes it.

You must not claim an additional copyright for yourself as the importer. For example, if you import public domain data, you must not seek to restrict the use of your imported data. Your import account must not refuse any permissions that were given by the original creators of the data you're importing.

Please also note the details of attribution requirement. We can offer some attribution: we can credit them on our website (not on the homepage, but in the Contributors page here on the wiki, and on www.openstreetmap.org/copyright for very large-scale contributions). We can link information about them in relation to the user account performing import edits, meaning the editing history will allow people to trace the source of the data donation. We can also set their name in the 'source' tags of our underlying data. This is perhaps more prominent, but may be removed by editors doing further mapping work. The credit to the "author" stops there. What we certainly cannot do is require end-users of our data/renderings to give credit to the particular data donor. With this in mind, our attribution may not be sufficient legally speaking and might actually be considered unsatisfactory by the original "authors" of the data.

We often find that data that purports to be available under a compatible license has been ultimately derived from sources that we consider to be non-free. For example, although some geodata is available from Wikipedia under a Creative Commons license it is a widely held belief in OSM that some of the data is simply derived from Google Maps, and therefore not actually available under that stated license. In such cases it is an established community norm to not import data whose provenance is uncertain, regardless of the stated license. Better safe than sorry.

Discuss import with community

It is important to discuss your proposed import with the community at every step. First of all add an entry to Potential Datasources. Here you can briefly describe what you have found out about the licensing of the data, and the data's accuracy with respect to data we already have. If you need more space, link through to a new wiki page about the data source.

Discuss your import on the imports@openstreetmap.org mailing list and with appropriate local communities. Many local communities have their own wiki pages and/or a Mailing lists. Coordinate with other people with similar plans.

Even if the same or similar import has been discussed before, you should still discuss it with the local community. This means that they are aware of your plans and can raise any issues or clashes before any damage occurs. This is especially true if the data has been available for a long time and has not yet been imported - this does not mean it is acceptable to proceed without discussion with the local community.

Always start by discussing the investigation you have done into licensing and accuracy. If the consensus is that the data doesn't meet our criteria, don't be disappointed. Label it as rejected on the Potential Datasources page, and give the reasons. Documenting such decisions is a helpful contribution in itself. If people are happy with it, move on to discussing implementation of import scripts etc.

Imports related to humanitarian issues, disaster response, or development should consult the HOT (ideally on the HOT Mailing list).

Document your import on the wiki

If you are going ahead with your import, please create a page about it on the wiki, with all the details. Create an entry on the Import/Catalogue page and link from there to your page. Also link to your page from local Mapping Projects pages. The page should have the following details:

  • Datasource accuracy and licensing (also summarised on Potential Datasources)
  • Import/Software you plan to use. Share the source code you are using.
  • Exactly how data will be translated from another format into OSM format
  • How the resulting data will look. Exact tags being used.
  • Link to sample data imported on the test database.
  • User name of the account performing the import, and other details of how the changesets will be tagged

And as the import progresses

  • Link to example data imported on the live database.

Use a dedicated user account

Create a new user for the import. You must not use your standard OSM user account. The user page for the account should be used to collate data relating to the source and contact details for the import. Furthermore, it means that attribution can often be carried in the account's display name, or in the account's user page, which is better than putting it as a tag, as the user's editing history is a permanent record of the source and doesn't interfere with tags or increase the size of the database as much. For these reasons, creating a dedicated user account is preferable to using a source=* tag. For distributed/community imports, have each person make their own import account, for example "your osm user name"_import. It is not required that each import be done under the same user account.

Not complying to this rule is one of the reasons that could lead to your account be temporarily blocked by the DWG

Consider your tags

Your import should use tags which are familiar to the OSM community, rather than inventing its own set of tags.

You may have some metadata like the IDs used for your original data. If this metadata will be useful to OSM, then define your own prefix and use that on those metadata tags. The TIGER import for instance uses the "tiger:" prefix. The original ID of a TIGER object is tagged as "tiger:tlid".

However, don't go overboard with metadata. OSM is only interested in what is verifiable. This doesn't include (for example) foreign keys from another database, unless those are absolutely necessary for maintaining the data in future. Your data source may have many many fields, but OSM data elements with many many tags can be difficult to work with. Strike a balance. Figure out (discuss!) what fields the OSM community are interested in.

Keep server resources in mind

Make sure you don't overload the server when importing large amounts of data. The TIGER import had to be spread out over several months to not kill the central server! Import the data in small installments or otherwise slow down your import scripts. If in doubt, talk to the System Administrators.

Don't screw up the data!

This should really go without saying, but don't screw up the OpenStreetMap data! Always think about it from the point of view of ordinary OpenStreetMap contributors working in iD and Potlatch and never assume that those people will happily clear up your mess. If you don't have experience of working in iD and Potlatch yourself, then you shouldn't be performing imports. JOSM tends to be slightly better for untangling messy data, but it's still fiddly. In any case most users (particularly new users) are using iD and Potlatch. Will your data spoil their experience of OpenStreetMap editing? If so, we don't want it.

Do not ignore existing data and import new data over the top. In general it is a bad idea to put data on top of data (see data notes below), but also you must always remember that existing data may be data that a real user cares about and is maintaining. You might try to determine this with automated/semi-automated methods, and treat the existing data accordingly. For example in areas where real users are working, you might decide to leave the existing data alone.

If an import goes wrong, or you needed to interrupt an upload half way through, this should be cleaned up (reverted) immediately. If help is needed, contact Imports and/or Talk. But the import won't go wrong because you tested it carefully on the test database – right?

If you don't know how to revert an import, don't do the import in the first place.

Specific data guidelines

Don't put data on top of data

Unlike traditional GIS systems, OpenStreetMap has no concept of layers. Data on top of data is just a mess. It's a kind of mess which makes it very difficult for real users to work in the normal OpenStreetMap editors. The Duplicate nodes map reveals imports have not followed this guideline (Rogue TIGER importers caused a lot of this, but sadly there are many more recent messed up imports)

If your data is in a layered traditional GIS format, you'll need to take a different approach. Perhaps merge the layers and calculate the best aggregate tags, but you can always avoid directly importing data, and instead set up a source for users to manually and selectively import from, or a WMS to trace over (like Natural Resources Canada -Toporama)

Consider simplifying

Shapefiles often include too many details, i.e. more nodes than necessary to represent curves, or more than two nodes representing a perfectly straight line. You'll see this particularly with large landuse areas that have nodes a few meters apart or appear jagged because the resolution isn't fine enough or is too fine. Tools such as Map Shaper can be used to simplify shapefiles that have too many details. Remember to think about how the data looks and can be worked with in Potlatch.


See also