Import/Software

From OpenStreetMap Wiki
Jump to: navigation, search
Available languages
English español français 日本語

Depending on the format of your source data, you'll need some software to convert it into the OSM format.


Process

This is one example of how the importing process can be done. Individual imports might differ.

  1. Look at source data and determine mapping from attributes used in the source data to tags used for OSM.
  2. Write new or tailor existing software to do the mapping from source data to one or more OSM XML files. Use unique negative IDs for nodes, ways and relations.
  3. Load OSM XML file into JOSM and check data. Repeat steps 1 to 3 until everything looks right.
  4. Upload data using JOSM or bulk uploader.


Negative IDs

Every object in the OSM database has a unique ID. It is assigned by the OSM database when the object is created. Object IDs are, for example, used to relate ways to the nodes they contain.

When creating your own OSM XML file with data that should be uploaded you can not use normal IDs, because you don't have them. After all, the central database doesn't know about the objects yet. But you still have to give your object an ID, otherwise you can't create the proper relationships between nodes, ways and relations.

The common trick to solve this is using negative IDs for you local files. When uploading, the objects are created one by one in the database. You'll get the proper ID from the database and use that one instead of your own negative ID in the future.

This way of doing things is used by JOSM for instance.

Importing Data

There are several ways to add data in bulk to OSM. Each one has its strengths and weaknesses. One thing is crucially important - do not test against the live API. There are development instances of the openstreetmap server available for you to check that your process/method/script works first.

Here are various methods of doing a bulk import.

via JOSM

One of the earliest and most common forms of bulk import is via JOSM. Some program takes the import data and converts it to JOSM file format. A user can then open JOSM, load the file and hit the upload button.

This method is fairly reliable and allows for the user to view and manipulate the data prior to upload. The disadvantage is that JOSM has to be able to load data, which becomes impractical once the data reaches several megabytes. Also there have been reports that when uploading lots of data, the process usually gets stuck after a few thousand objects, making it not very useful for large scale uploads.

This cannot be run on the server because JOSM is an interactive program.

via JOSM supported by osmsync

osmsync is a library for comparing external data to that already in osm, then producing a JOSM file for human review. The utility works best with specific datasets designed for long term maintenance via osmsync.

via direct API manipulation

This method is rather uncommon in general, however it is used by Almien coastlines. There the shapefile is processed record by record and it immediatly creates the objects. The main downside of this method is that no record is generally made of what has been uploaded and created and since it is done on the fly, any interruption generally does not know how far it got. The actual changeset is ambiguous since it is also generated on the fly.

In general this method should be discouraged.

via Osmosis

Osmosis is a program which does general manipulation of OSM data. Amongst its many other features it allows the application of a "change file" to a database. It understands osmChange file format.

However, it can only apply directly to a database, not to the API and currently lacks features like placeholders during the creation of objects (negative IDs) and referential integrity of the database objects. However, for applying changes where the IDs of objects are known beforehand and to a local database it is unbeatable.

These tools show a lot of long-term promise.

(FIXME: Explain why this is a promising way to way to do large uploads, seeing as it cannot do uploads at all.)

via bulk_upload.py

bulk_upload.py is a tool written in Python which is intended to replace bulk_upload.pl and support the API v0.6 (which bulk_upload.pl does not). Like bulk_upload.pl, it intelligently stores progress. It also divides changesets into parts so that the upload tasks are smaller. However, it does not appear to support the osmChange format and therefore cannot be used for edits.

via upload.py

upload.py is a yet another set of python scripts for uploading changes to OpenStreetMap, currently using API v0.6. The focus here is on recoverability from network errors and other problem situations so that it is almost always possible to resume a failed import without creating duplicates or other artifacts. Instead of doing the entire upload with one command these scripts are building blocks that work at a lower level.

via bulk_upload.pl (outdated)

The Perl script bulk_upload.pl was used for both the AND and TIGER imports and for uploading some coastlines. This method basically follows the concept of taking a changefile and applying it to the OSM database. bulk_upload.pl no longer works with the current version of the API (unless somebody rewrites it to create changesets)

Within the OSM Server Network

Large data sources (like TIGER) can sometimes be uploaded from clients with very low latency and very high bandwidth to the main OSM servers.

(FIXME: Are there any such clients actually available? Or is this just a pipedream? Perhaps one could use the dev server?)

See also