Import/Guidelines

From OpenStreetMap Wiki
< Import(Redirected from Import guidelines)
Jump to: navigation, search
Available languages — Import/Guidelines
Deutsch English español français italiano português português do Brasil 日本語

The import guidelines, along with the Automated Edits code of conduct, should be followed when importing data into the OpenStreetMap database as they embody many lessons learned throughout the history of OpenStreetMap. Imports should be planned and executed with more care and sensitivity than other edits, because poor imports can have significant impacts on both existing data and local mapping community. The Data Working Group is tasked by the OSMF to detect and stop imports that do not comply with guidelines. So, not following these guidelines may put your account at risk of being blocked.

Imports should not be seen as an alternative to building the mapping community, running mapping parties and generating publicity to engage with more contributors. Of course, all of this is open to discussion, such as on the imports mailing list and this discussion page.

Process

If you think your city/county/state/country government, a non-profit, or some other organization or person has great data that could be used to improve the quality of OpenStreetMap, here's what you need to know. We'll start with a quick overview of the process to get you started on the right path. Most of these areas are expanded in further sections of this page and on related pages.

Step 1 - Prerequisites

  1. Gain familiarity with the basics of OpenStreetMap, including editing, such as adding details of your neighbourhood.
  2. Review what can go wrong with imports.
  3. Identify data you'd like to import. This might be street centerlines, building outlines, waterways, addresses, etc and the data license requirements.

Step 2 - Community Buy-in

  1. It is recommended that before any actual work is performed on the import that contact is made with the community to see if there is interest in importing the data. Different geographic areas in OSM have different acceptance levels for imports. The exact same kind of data set might be welcomed in one area and be rejected in another.
  2. Discuss your plan. Email the OSM community to notify them of your plans, including a link to your wiki page. You can do this with an email to (at a minimum) imports@openstreetmap.org, talk-(your country)@openstreetmap.org, and the OSM group specific to the the area directly impacted by the import. This will help gain the benefit of past experiences, which may include having already reviewed the data you're considering for import. Check for local user groups, local chapters, and country-specific mailing lists.
  3. Be prepared to answer questions from the community. Discuss with the community the suitability of each layer for importing. Some data can be readily imported without much difficulty, while others are far more difficult (e.g. street centerlines). Also some are broadly accepted for import, while others haven't had much consensus (e.g. parcel boundaries).
  4. More complex and large-scale imports should be reviewed with the assistance of more technically-oriented and experienced OSM volunteers.
  5. You must not import the data without local buy-in.

Step 3 - License approval

  1. You must obtain proper permissions and licenses to use the data in OSM from the data owner. If the license of the data is not compatible with the OSM Open Database License, you can not use the data. Many localities already have progressive open data policies. Others have data policies that are almost open, but have conflicts with issues like prohibitions on commercial use or requirements for attribution. Sometimes, getting permission to use data, even if the existing license might seem prohibitive, is as simple as asking the appropriate authority if they are willing to comply with the terms of the OSM Open Database License. See Import/GettingPermission for example emails that touch on important issues.

Step 4 - Documentation

  1. You must register your permissions and project by adding a line to the table at Import/Catalogue.
  2. You must write a plan for your import in the OSM wiki. Create a wiki page outlining the details of your plan. This plan must include information such as plans for how to convert the data to OSM XML, dividing up the work, how to handle conflation, how to map GIS attributes to OSM tags, how to potentially simplify any data, how you plan to divide up the work, revert plans, changeset size policies, and plans for quality assurance. An example for this can be found at Import/Plan Outline
  3. If required by the data owners, you must add an acknowledgement of the list to Contributors.

Step 5 - Import Review

  1. You must post a review of your import on the imports@openstreetmap.org mailing list. Don't upload any data until the project has been reviewed first.
  2. If possible, prepare the data and make it available for review.

Note that imports@ mailing list posting is closed for non-subscribers. If you truly want discussion there, it should be open list.

Step 6 - Uploading

  1. Follow your plan.
  2. Track your progress.
  3. Provide updates to the community on your efforts.
  4. Let everyone know when you're done.
  5. You must use a dedicated user account.

Key considerations

Discuss your proposed import

It is important to discuss your proposed import with the community at every step. First of all add an entry to Potential Datasources. Here you can briefly describe what you have found out about the licensing of the data, and the data's accuracy with respect to data we already have. If you need more space, link through to a new wiki page about the data source.

Discuss your import on the imports@openstreetmap.org mailing list and with appropriate local communities. Many local communities have their own wiki pages and/or a Mailing lists. Coordinate with other people with similar plans.

Even if the same or similar import has been discussed before, you should still discuss it with the local community. This means that they are aware of your plans and can raise any issues or clashes before any damage occurs. This is especially true if the data has been available for a long time and has not yet been imported - this does not mean it is acceptable to proceed without discussion with the local community.

Always start by discussing the investigation you have done into licensing and accuracy. If the consensus is that the data doesn't meet our criteria, don't be disappointed. Label it as rejected on the Potential Datasources page, and give the reasons. Documenting such decisions is a helpful contribution in itself. If people are happy with it, move on to discussing implementation of import scripts etc.

Imports related to humanitarian issues, disaster response, or development should consult the HOT (ideally on the HOT Mailing list).

Document your import

If you are going ahead with your import, please create a page about it on the wiki, with all the details. Create an entry on the Import/Catalogue page and link from there to your page. Also link to your page from local Mapping Projects pages. The page should have the following details:

  • Datasource accuracy and licensing (also summarised on Potential Datasources)
  • Import/Software you plan to use. Share the source code you are using.
  • Exactly how data will be translated from another format into OSM format
  • How the resulting data will look. Exact tags being used.
  • Link to sample data imported on the test database.
  • User name of the account performing the import, and other details of how the changesets will be tagged

And as the import progresses

  • Link to example data imported on the live database.

Ensure that the data license is OK

We are only interested in 'free' data. We must be able to release the data with our OpenStreetMap License. Obviously we are allowed to use public domain data sources, of which there are quite a few, but beyond that, it gets more complicated.

OpenStreetMap moved to the Open Database License in September 2012. Your data must therefore be compatible with that. In addition, you must be able to agree to the Contributor Terms for your import account, which includes provisions to relicense under another free and open license if the community wishes it.

You must not claim an additional copyright for yourself as the importer. For example, if you import public domain data, you must not seek to restrict the use of your imported data. Your import account must not refuse any permissions that were given by the original creators of the data you're importing.

Please also note the details of attribution requirement. We can offer some attribution: we can credit them on our website (not on the homepage, but in the Contributors page here on the wiki, and on www.openstreetmap.org/copyright for very large-scale contributions). We can link information about them in relation to the user account performing import edits, meaning the editing history will allow people to trace the source of the data donation. We can also set their name in the 'source' tags of our underlying data. This is perhaps more prominent, but may be removed by editors doing further mapping work. The credit to the "author" stops there. What we certainly cannot do is require end-users of our data/renderings to give credit to the particular data donor. With this in mind, our attribution may not be sufficient legally speaking and might actually be considered unsatisfactory by the original "authors" of the data.

We often find that data that purports to be available under a compatible license has been ultimately derived from sources that we consider to be non-free. For example, although some geodata is available from Wikipedia under a Creative Commons license it is a widely held belief in OSM that some of the data is simply derived from Google Maps, and therefore not actually available under that stated license. In such cases it is an established community norm to not import data whose provenance is uncertain, regardless of the stated license. Better safe than sorry.

Use a dedicated user account

Create a new user for the import. You must not use your standard OSM user account. The user page for the account should be used to collate data relating to the source and contact details for the import. Furthermore, it means that attribution can often be carried in the account's display name, or in the account's user page, which is better than putting it as a tag, as the user's editing history is a permanent record of the source and doesn't interfere with tags or increase the size of the database as much. For these reasons, creating a dedicated user account is preferable to using a source=* tag. For distributed/community imports, have each person make their own import account, for example "your osm user name"_import. It is not required that each import be done under the same user account.

Not complying to this rule is one of the reasons that could lead to your account be temporarily blocked by the DWG

Use the right tags

Your import should use tags which are familiar to the OSM community, rather than inventing its own set of tags.

You may have some metadata like the IDs used for your original data. If this metadata will be useful to OSM, then define your own prefix and use that on those metadata tags. The TIGER import for instance uses the "tiger:" prefix. The original ID of a TIGER object is tagged as "tiger:tlid".

However, don't go overboard with metadata. OSM is only interested in what is verifiable. This doesn't include (for example) foreign keys from another database, unless those are absolutely necessary for maintaining the data in future. Your data source may have many many fields, but OSM data elements with many many tags can be difficult to work with. Strike a balance. Figure out (discuss!) what fields the OSM community are interested in.

Don't put data on top of data

Unlike traditional GIS systems, OpenStreetMap has no concept of layers. Data on top of data is just a mess. It's a kind of mess which makes it very difficult for real users to work in the normal OpenStreetMap editors. The Duplicate nodes map reveals imports have not followed this guideline (Rogue TIGER importers caused a lot of this, but sadly there are many more recent messed up imports)

If your data is in a layered traditional GIS format, you'll need to take a different approach. Perhaps merge the layers and calculate the best aggregate tags, but you can always avoid directly importing data, and instead set up a source for users to manually and selectively import from, or a WMS to trace over (like Natural Resources Canada -Toporama)

Consider simplifying

Shapefiles often include too many details, i.e. more nodes than necessary to represent curves, or more than two nodes representing a perfectly straight line. You'll see this particularly with large landuse areas that have nodes a few meters apart or appear jagged because the resolution isn't fine enough or is too fine. Tools such as Map Shaper can be used to simplify shapefiles that have too many details. Remember to think about how the data looks and can be worked with in Potlatch.

Keep server resources in mind

Make sure you don't overload the server when importing large amounts of data. The TIGER import had to be spread out over several months to not kill the central server! Import the data in small installments or otherwise slow down your import scripts. If in doubt, talk to the System Administrators.

Don't screw up the data!

This should really go without saying, but don't screw up the OpenStreetMap data! Always think about it from the point of view of ordinary OpenStreetMap contributors working in iD and Potlatch and never assume that those people will happily clear up your mess. If you don't have experience of working in iD and Potlatch yourself, then you shouldn't be performing imports. JOSM tends to be slightly better for untangling messy data, but it's still fiddly. In any case most users (particularly new users) are using iD and Potlatch. Will your data spoil their experience of OpenStreetMap editing? If so, we don't want it.

Do not ignore existing data and import new data over the top. In general it is a bad idea to put data on top of data (see data notes below), but also you must always remember that existing data may be data that a real user cares about and is maintaining. You might try to determine this with automated/semi-automated methods, and treat the existing data accordingly. For example in areas where real users are working, you might decide to leave the existing data alone.

If an import goes wrong, or you needed to interrupt an upload half way through, this should be cleaned up (reverted) immediately. If help is needed, contact Imports and/or Talk. But the import won't go wrong because you tested it carefully on the test database – right?

If you don't know how to revert an import, don't do the import in the first place.

See also