|It has been proposed that this page or section be merged with Mechanical Edit Policy. (Discuss)|
|It has been proposed that this page or section be merged with Automated Edits code of conduct. (Discuss)|
Imports have the potential to introduce significant problems into the OSM database and should be considered thoroughly.
These guidelines are intended to help people that are interested in importing data into the OpenStreetMap database while at the same time protecting the data contained in the OSM database. Many OSM activities support a wide degree of latitude in end-user discretion, but imports are a bit more sensitive and require more careful planning.
These guidelines are not definitive - they are not laws. As such, following them is not guarantee that an import will be acceptable and not following them does not mean there will be problems with an import. There are no hard and fast rules; there is no rule that says: "If you do this, then you will definitely get blocked", and there is no rule that says "if you do this, then you will definitely not get blocked". However, these guidelines embody many lessons learned throughout the history of OpenStreetMap and should be reviewed by anyone interested in importing data while protecting the existing OSM database.
The Data working group is tasked by the OSMF to detect and stop imports that do not comply with guidelines. So, not following these guidelines may put your account at risk of being blocked (http://www.openstreetmap.org/user_blocks).
If you think your city/county/state/country government, a non-profit, or some other organization or person has great data that could be used to improve the quality of OpenStreetMap, you've come to the right place!
Please keep in mind that the current OSM database represents an extremely large amount of work by the volunteers of the OSM community. Because imports usually refer to the mass input of large amounts of data, whether through an automated process by 1 person or a carefully curated process by a team of people, there's an increased risk of larger-scale damage to the database. Hence, it is critical that all imports are approached with caution and the proper amount of planning.
Here's a quick checklist to get you started on the right path. Most of these areas are expanded in further sections of this page and on related pages. The first few steps aren't very technical, any motivated person can begin this process, and this wiki should provide guidance to assist the more technical areas of less-sophisticated imports. More complex and large-scale impacts should be reviewed with the assistance of more technically-oriented and experienced OSM volunteers.
- Gain familiarity with the basics of OpenStreetMap, including editing, such as adding details of your neighbourhood. Consider following the beginners' guide.
- Learn about the history of imports and the concerns surrounding them. See: Import/Past Problems
- Identify data you'd like to import. This might be street centerlines, building outlines, waterways, addresses, etc.
- Obtain proper permissions and licenses to use the data in OSM from the data owner. If the license of the data is not compatible with the OSM Open Database License, you can not use the data. Many localities already have progressive open data policies. Others have data policies that are almost open, but have conflicts with issues like prohibitions on commercial use or requirements for attribution. Sometimes, getting permission to use data, even if the existing license might seem prohibitive, is as simple as asking the appropriate authority if they are willing to comply with the terms of the OSM Open Database License. See Import/GettingPermission for example emails that touch on important issues.
- Register your permissions and project by adding a line to the table at Import/Catalogue.
- Cite contributions by the data owners, if necessary to cite them, at: Contributors.
- Write a plan for your import. Create a wiki page outlining the details of your plan. This plan should include information such as plans for how to convert the data to OSM XML, dividing up the work, how to handle conflation, how to map GIS attributes to OSM tags, how to potentially simplify any data, how you plan to divide up the work, revert plans, changeset size policies, and plans for quality assurance. An example for this can be found at Import/Plan Outline
- Discuss your plan. Email the OSM community to notify them of your plans, including a link to your wiki page. You can do this with an email to (at a minimum) firstname.lastname@example.org, talk-(your country)@openstreetmap.org, and the OSM group specific to the the area directly impacted by the import. This will help gain the benefit of past experiences, which may include having already reviewed the data you're considering for import. Check for local user groups, local chapters, and country-specific mailing lists.
- Be prepared to answer questions from the community. Discuss with the community the suitability of each layer for importing. Some data can be readily imported without much difficulty, while others are far more difficult (e.g. street centerlines). Also some are broadly accepted for import, while others haven't had much consensus (e.g. parcel boundaries).
- Follow your plan.
- Track your progress.
- Provide updates to the community on your efforts.
- Let everyone know when you're done.
Create a community
OpenStreetMap is all about building a great map by attracting a large community of mappers. While data imports can help with improving coverage rapidly, simulations suggest that imported data can cause problems with the growth of a community. It's actually far more important to go out and run lots of mapping parties, get lots of publicity out there, and get local people on the ground.
Make sure data license is OK
We are only interested in 'free' data. We must be able to release the data with our OpenStreetMap License. Obviously we are allowed to use public domain data sources, of which there are quite a few, but beyond that, it gets more complicated.
OpenStreetMap moved to the Open Database License in September 2012. Your data must therefore be compatible with that. In addition, you must be able to agree to the Contributor Terms for your import account, which includes provisions to relicense under another free and open license if the community wishes it.
You must not claim an additional copyright for yourself as the importer. For example, if you import public domain data, you must not seek to restrict the use of your imported data. Your import account must not refuse any permissions that were given by the original creators of the data you're importing.
Please also note the details of attribution requirement. We can offer some attribution: we can credit them on our website (not on the homepage, but in the Contributors page here on the wiki, and on www.openstreetmap.org/copyright for very large-scale contributions). We can link information about them in relation to the user account performing import edits, meaning the editing history will allow people to trace the source of the data donation. We can also set their name in the 'source' tags of our underlying data. This is perhaps more prominent, but may be removed by editors doing further mapping work. The credit to the "author" stops there. What we certainly cannot do is require end-users of our data/renderings to give credit to the particular data donor. With this in mind, our attribution may not be sufficient legally speaking and might actually be considered unsatisfactory by the original "authors" of the data.
We often find that data that purports to be available under a compatible license has been ultimately derived from sources that we consider to be non-free. For example, although some geodata is available from Wikipedia under a Creative Commons license it is a widely held belief in OSM that some of the data is simply derived from Google Maps, and therefore not actually available under that stated license. In such cases it is an established community norm to not import data whose provenance is uncertain, regardless of the stated license. Better safe than sorry.
Discuss import with community
It is important to discuss your proposed import with the community at every step. First of all add an entry to Potential Datasources. Here you can briefly describe what you have found out about the licensing of the data, and the data's accuracy with respect to data we already have. If you need more space, link through to a new wiki page about the data source.
Discuss your import on the email@example.com mailing list and with appropriate local communities. Many local communities have their own wiki pages and/or a Mailing lists. Coordinate with other people with similar plans.
Even if the same or similar import has been discussed before, you should still discuss it with the local community. This means that they are aware of your plans and can raise any issues or clashes before any damage occurs. This is especially true if the data has been available for a long time and has not yet been imported - this does not mean it is acceptable to proceed without discussion with the local community.
Always start by discussing the investigation you have done into licensing and accuracy. If the consensus is that the data doesn't meet our criteria, don't be disappointed. Label it as rejected on the Potential Datasources page, and give the reasons. Documenting such decisions is a helpful contribution in itself. If people are happy with it, move on to discussing implementation of import scripts etc.
Document your import on the wiki
If you are going ahead with your import, please create a page about it on the wiki, with all the details. Create an entry on the Import/Catalogue page and link from there to your page. Also link to your page from local Mapping Projects pages. The page should have the following details:
- Datasource accuracy and licensing (also summarised on Potential Datasources)
- Import/Software you plan to use. Share the source code you are using.
- Exactly how data will be translated from another format into OSM format
- How the resulting data will look. Exact tags being used.
- Link to sample data imported on the test database.
- User name of the account performing the import, and other details of how the changesets will be tagged
And as the import progresses
- Link to example data imported on the live database.
Use a dedicated user account
Create a new user for the import. You must not use your standard OSM user account. The user page for the account should be used to collate data relating to the source and contact details for the import. Furthermore, it means that attribution can often be carried in the account's display name, or in the account's user page, which is better than putting it as a tag, as the user's editing history is a permanent record of the source and doesn't interfere with tags or increase the size of the database as much. For these reasons, creating a dedicated user account is preferable to using a source=* tag. For distributed/community imports, have each person make their own import account, for example "your osm user name"_import. It is not required that each import be done under the same user account.
Not complying to this rule is one of the reasons that could lead to your account be temporarily blocked by the DWG
Define your own tag prefix
You probably have some meta data like the IDs used for your original data. Define your own prefix and use that on all the tags for this data. The TIGER import for instance uses the "tiger:" prefix. The original ID of a TIGER object is tagged as "tiger:tlid".
Don't go overboard with meta-data. Your data source may have many many fields but, OSM data elements with many many tags can be difficult to work with. Strike a balance. Figure out (discuss!) what fields the OSM community are interested in.
Keep server resources in mind
Make sure you don't overload the server when importing large amounts of data. The TIGER import had to be spread out over several months to not kill the central server! Import the data in small installments or otherwise slow down your import scripts. If in doubt, talk to the System Administrators.
Don't screw up the data!
This should really go without saying, but don't screw up the OpenStreetMap data! Always think about it from the point of view of ordinary OpenStreetMap contributors working in Potlatch and never assume that those people will happily clear up your mess. If you don't have experience of working in Potlatch yourself, then you shouldn't be performing imports. JOSM tends to be slightly better for untangling messy data, but it's still fiddly. In any case most users (particularly new users) are using Potlatch. Will your data spoil their experience of OpenStreetMap editing? If so, we don't want it.
Do not ignore existing data and import new data over the top. In general it is a bad idea to put data on top of data (see data notes below), but also you must always remember that existing data may be data that a real user cares about and is maintaining. You might try to determine this with automated/semi-automated methods, and treat the existing data accordingly. For example in areas where real users are working, you might decide to leave the existing data alone.
If an import goes wrong, or you needed to interrupt an upload half way through, this should be cleaned up (reverted) immediately. If help is needed, contact Imports and/or Talk. But the import won't go wrong because you tested it carefully on the test database – right?
If you don't know how to revert an import, don't do the import in the first place.
Specific data guidelines
Don't put data on top of data
Unlike traditional GIS systems, OpenStreetMap has no concept of layers. Data on top of data is just a mess. It's a kind of mess which makes it very difficult for real users to work in the normal OpenStreetMap editors. The Duplicate nodes map reveals imports have not followed this guideline (Rogue TIGER importers caused a lot of this, but sadly there are many more recent messed up imports)
If your data is in a layered traditional GIS format, you'll need to take a different approach. Perhaps merge the layers and calculate the best aggregate tags, but you can always avoid directly importing data, and instead set up a source for users to manually and selectively import from, or a WMS to trace over (like Natural Resources Canada -Toporama)
Shapefiles often include too many details, i.e. more nodes than necessary to represent curves, or more than two nodes representing a perfectly straight line. You'll see this particularly with large landuse areas that have nodes a few meters apart or appear jagged because the resolution isn't fine enough or is too fine. Tools such as Map Shaper can be used to simplify shapefiles that have too many details. Remember to think about how the data looks and can be worked with in Potlatch.