Additional Import Guidelines and Tips for New Members

From OpenStreetMap Wiki
Jump to: navigation, search

Tips for New Importers

This is general guidance on what needs to be done to import data into OpenStreetMap. Every import is different, and this does not attempt to explain all circumstances.

Where to get help

Your first ports of call for getting help with this process are:

• The Import Guidance on the OpenStreetMap wiki: http://wiki.openstreetmap.org/wiki/Import

• The OSM Foundation Import Support WG mailing list http://wiki.openstreetmap.org/wiki/Foundation/Import_Support_Working_Group

Before undertaking an import, there are a number of considerations to take into account.

Licensing

The OpenStreetMap project operates under strict licensing rules. At the time of writing all data must be compatible with Creative Commons CC-BY-SA 2.0 licence, although the OSM Foundation is considering a possible re-licensing under a different licence. The data that you wish to import must be compatible with the OpenStreetMap license.

Some examples of compatible licenses are:

• You own the copyright. It is data that you have collected yourself, or you are acting on behalf of a company that owns all relevant rights (copyright, database rights etc.).

• Public Domain. The data is not under any form of copyright or similar. Products of the US Federal Government are usually PD. Note that "in the public domain" is used synonymously with "publicly available" in the UK, which doesn't mean the same think.

• Compatibly Licensed. Either CC-BY-SA or similar that is completely compatible Note that strict requirements for attribution are rarely possible in a global project. In any case, any uncertainties in licensing can be clarified by the OSM Legal-Talk Mailing list. See http://wiki.openstreetmap.org/wiki/Mailing_Lists

Appropriateness

OSM is a project to map the world, and so the type of data you are importing is important. OSM has guidelines such as the "on the ground rule", "verifiability" and so on to guide volunteers. It is suggested here that if the data is cannot be verified (or independently collected) by a sufficiently expert volunteer (for example, non-physical features like underground features such as buried pipelines, air corridors) are not appropriate for the project. Typical GIS features such as roads, schools etc. are of course appropriate.

Ownership

OSM is not a geodata mirror that collects third-party data sources for easy access. When something is imported, the project takes ownership of the data. If whoever created the data set you are about to import will not continue to maintain it, or if there is reason to believe that the OSM community will do a much better job at maintaining it than the original producer, then OSM can take ownership of the data and incorporate it. If, on the other hand, the data set is actively maintained at the source (e.g. a government data set where a new edition is published every year), then it is likely that OSM will only ever be a second-rate mirror of that data set - don't import such data.

Quality

OpenStreetMap has a high accuracy requirement of around 3-5m (approx. 1:6,000 map scale or better). Additionally, it has a high recency requirement, so data should be accurate to the current date. Additionally, attributes to the data should be correct to a high standard. Although the volunteers in the project will maintain imported data it is not considered best practise to import data that is lower in resolution, timeliness or attribute quality than would otherwise be collected by a skilled volunteer.

Duplication

Imported data must not duplicate data already in OpenStreetMap. OSM has a strict "one feature per real world object" so it is inappropriate to import e.g. a second copy of the road network. At the same time it is worth bearing in mind that the project may convert data between representations to improve the quality - for example, many schools are initially mapped as point features but later converted to areas, at which stage the point feature is removed.

Data Format

OpenStreetMap has its own XML data format (.osm, currently version 0.6) which is required for importing data into the system. The data primitives are known as nodes, ways and relations. Simple areas are stored as closed-loop ways. Areas-with-holes ("Multipolygons" in OSM lingo) are represented as ways and relations. Details can be found at http://wiki.openstreetmap.org/wiki/Data_Primitives

Shapefiles cannot be imported directly, but instead can be converted to .osm format before importing. There are many utilities for converting geodata to osm format, see http://wiki.openstreetmap.org/wiki/Converting_map_data_between_formats

At the time of writing, there are multiple programs for converting shapefiles to .osm format for imports. See http://wiki.openstreetmap.org/wiki/Import/Shapefile

Attributes

Attributes in OSM are stored in the form of key-value pairs ("tags"). There is no fixed ontology, so people unfamiliar with OSM tagging should seek advice from experienced members of the community.

Your source data attributes will need converting into appropriate tags. Meta-data (such as survey dates) are often inappropriate. You should consider the objects you are importing as assistance to the community, and they may be edited many times, split, merged and otherwise rearranged. Meta-data such as attribution can be stored, if necessary, on the changeset information (the OSM meta-data for the initial import).

Connectivity

OpenStreetMap is a topological database suitable for routing. Much GIS data is not, and is stored as individual linestrings. It is critically important that the imported data is properly connected, i.e. nodes are reused at junctions between every way meeting at that junction. This is usually taken care of by the conversion program, but is important to check

Import

When the data is in the OSM format, there are a number of stages of review to complete before upload. If you are uncertain of anything, then seek advice from the OSMF Import Support WG.

Add to the import catalogue

In order to keep track of information regarding imports, OSM keeps record of imports on the wiki. You should add details regarding your import, including contact details etc. at http://wiki.openstreetmap.org/wiki/Import/Catalogue

Check the output format

You should make available both the conversion scripts and the example output files (i.e. your converted files in .osm format). These can then be scrutinised by members of the community. When they are available for download, document them on the wiki and contact the Import Support WG and ask for a member of the community to check that the conversion process has happened properly. You can also examine the output yourself using an OSM editor that can load standalone .osm files (e.g. JOSM).

Confirm with local mappers

Using the relevant regional OSM mailing list or by contacting individual mappers through the OSM messaging system, you should confirm with existing mappers in the area of your import that they are happy to see it occur. To get a list of mappers for an area, you may find the OSM Mapper product from ITOWorld. See http://wiki.openstreetmap.org/wiki/OSM_Mapper for details.

Break data into manageable chunks

The data should be broken into manageable chunks, usually by geographic area. For example, US counties are an appropriate size. This allows test areas to be imported with a known geographic extent. Smaller imports (e.g. university campuses) can be kept together as one import.

Consider the existing data

It is likely that there will be existing data in OSM in the area of interest. In these cases there may be overlap between the data you wish to import and the existing data. The process of importing data without regard for existing data is known as "blind imports" and heavily discouraged.

The recommended course of action is to only import data that does not have a corresponding feature already in OSM. This will necessitate removing features from the import before it is uploaded.

In rare circumstances volunteer data may be deleted and replaced with imports. This must only be done with the agreement of the volunteers concerned.

Be prepared to back-out ("revert") incorrect imports

It is recommended that you are aware of the proceedures to revert data if there is a problem with the import. Working with experienced members of the community is recommended, since reverting is, as of the time of writing, not the easiest thing to achieve in OSM. Be prepared to fix-up errors and improve the data

There is an expectation in OSM that imported data is meant to be enhanced and improved after initial import. It is recommended here that you should lead this effort, perhaps by adding additional data or improving geometries using the aerial imagery or resurveying data. In any case, you should remain subscribed to the mailing lists in order to answer subsequent questions.

Use a dedicated account for imports

It is recommended to use a separate user account while uploading the data. This allows it to be more easily identified. You will need to register with the OSM website again.

Run scripts or use editing software to import the data

When all the preparations have been completed, it's time to upload the .osm file(s) to the server. If the data covers a small area, it is best to use an editor (e.g. JOSM) to download the existing data and merge in the import. This can then be saved to the main server. Larger imports will need scripts to handle the size of the data. For details on the most commonly used script see http://wiki.openstreetmap.org/wiki/Bulk_upload.py

Guidelines for Data Import Sources

This page gives guidelines for what types of data sources are useful to OpenStreetMap. These guidelines are initially targeted at the USA (contiguous states).

Road Data - routing

One of the key components that is missing from OSM is high-quality routing graph information such as addressing, turn restrictions, speed limits, and so on. This is often not collected by GIS departments since it does not affect spatial matters such as landuse, maps etc. So far there does not appear to be any public domain datasources with this information. Any dataset with data relating to vehicle routing will be useful

- Data in any format will be useful

- Data should be less than 15 years old

Road Data - centerlines

Road data for the entire USA has been imported already, sourced from TIGER data. However, the geometries of this data are poor, and it is often out of date. The connectivity and coverage is good. It is quite common for GIS departments to collect this data. Road centre-line data is what we have at the moment, where each road is represented by a line.

- Data should be accurate to within the last 3 years

- Data should be gathered at a positional accuracy of 1:1,000 (~1m) or better to improve what we have already in OSM

- Road names should be present, and in full rather than abbreviated

- Attributes not found on TIGER data will be useful

- Data will most likely be in shapefiles, although other formats may also be used

Road Data - polygons

Not many datasets have roads represented as areas (i.e. enough detail to see how wide the road is, what shape the junctions are etc).

- Data should be gathered at a positional accuracy of 1:1,000 (~1m) or better

- Data will most likely be in shapefiles, although other formats may also be used

- Raster data may be useful too

Rural Data There will often be a distinction between urban and rural data. In general, we can make use of older and lower resolution data in rural areas. The main exception to this is tourist or sports areas such as State/National Parks, ski resorts and so on.

-Data that is hard to gather such as streams may be useful, but national datasets already exist at usable resolution.

Building Information

Building information is commonly available, and makes the map look nice if it is of sufficient detail. If it's not, it looks bad and is laborious to correct and no easier than tracing off of imagery

- Data should be gathered at a positional accuracy of 1:1,000 (~1m) or better

Point of Interest Information

Many points of interest can be easily collected by OSM contributors, so any dataset needs to be extensive and comprehensive to be worth importing. - Much data from GNIS has already been imported, but it is both old and inaccurate

- Data should be gathered at a positional accuracy of 1:12,000 (~10m) or better

- Roadside features (such as bus stops, letter boxes etc) should be at a higher resolution (circa 1:6000, ~5m) and larger features (e.g. schools, placenames) may be useful at lower resolution (circa 1:24,000, ~20m)

- Extended attributes (e.g. opening hours) is very useful

Commonly found, not-so-useful data The following list is source data that is of little use to us, but commonly of use to GIS professionals.

- Digital Elevation Models (DEMs) or Digital Terrain Models (DTMs)

- Things derived from DEMs such as contours, hillshading

- Extracts of national datasets e.g. state extracts of GNIS, TIGER, NED, USGS sets

- Agricultural data such as soil sampling, vegetation