Using planet.osm

From OpenStreetMap Wiki
Jump to: navigation, search

Using planet.osm downloads involves manipulating large files of compressed XML (Planet.osm). This page details use of replication diffs to keep a planet file up-to-date.

Note that it is recommended not to choose .osm XML format for downloading and/or processing OSM data but to choose .pbf format (or perhaps .o5m format) instead. This page describes how to use the classic all-purpose tool Osmosis to keep a planet file up-to-date. However there are other ways and other programs which can do the same – some of them are even much faster and easier to handle than Osmosis – but they do not offer the same wide spectrum of functionalities Osmosis does. The following sections will show how the stuff is done with Osmosis and how it can be done with osm-c-tools (a set of the tools osmconvert, osmfilter and osmupdate).


Things to do once

There are some things you only need to do once, or very seldom.

Download planet.osm

The first thing you need is a full planet.osm file. These can be downloaded from several places. See the Planet.osm#Downloading page.

Using osm-c-tools:

Try to get a .pbf formatted file. This is smaller than .osm XML file and it can be processed faster.

Change compression

The planet.osm file is compressed using bzip2. The file will probably look like planet-latest.osm.bz2. Bzip2 is good for saving disk space, but it takes more CPU processing. I've found that I'm willing to give up more disk space in order to get the commands to run faster. Since these files are so large, it takes quite a while to run most of the commands. The following command will convert planet-latest.osm.bz2 to planet.osm.gz

nice bzcat planet-latest.osm.bz2 | nice gzip -9 -c > planet.osm.gz

If you prefer speed above saving disk space, you even can run gzip with option -1 instead of -9. Once this finishes, feel free to delete planet-latest.osm.bz2 - we'll use the .gz one from now on.

Using osm-c-tools:

Same as with Osmosis, it is recommended to reformat the downloaded file. For faster processing we decide to take advantage of .o5m data format. Hence this format conversion is now in order:

osmconvert planet-latest.osm.pbf -o=planet.o5m

Rename to avoid name collisions

The first thing I do is rename planet.osm.gz to planet.old.osm.gz. I'll be generating planet.osm.gz later, and I don't want to clobber the existing one since it takes a really long time to download the whole thing.

mv planet.osm.gz planet.old.osm.gz

Using osm-c-tools:

Renaming the file to avoid overwriting is a good idea for osm-c-tools toolchain as well.

mv planet.o5m planet_old.o5m

Initialize working directory

First up is to initialize the working directory. This should be the same directory you've saved the planet.osm.gz file to.

osmosis --rrii workingDirectory=.

Using osm-c-tools:

This step is not necessary.

Edit the osmosis configuration

You'll see a configuration.txt file in the working directory after its initialized. This file defines where you get updates from, and how many to download at once. I prefer to use the hour replicate data since I only run updates periodically. I also limit osmosis to grab 5 days at a time.

# The URL of the directory containing change files.
#baseUrl=http://planet.openstreetmap.org/hour-replicate
baseUrl=http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/hour-replicate
 
# Defines the maximum time interval in seconds to download in a single invocation.
# Setting to 0 disables this feature.
maxInterval = 345600

Using osm-c-tools:

This step is not necessary.

Find the correct state.txt file

The state.txt file contains information about what version of planet.osm you have. You need to get the state.txt file that corresponds to the planet-latest.osm.bz2 file you downloaded. There are state files under the hour-replicate/000/* directories of the planet server. As long as you pick one from before you downloaded the planet.osm file, you're OK. For example:

wget http://planet.openstreetmap.org/replication/hour/000/003/000.state.txt -O state.txt

Using osm-c-tools:

This step is not necessary.

Things you can loop on indefinitely

The following steps can be repeated as often as you like. You could start from the very beginning each time, but it generally takes a long time to pull down the entire planet.osm file. The following steps are a process where you can update planet.osm with just the diffs necessary to bring it up to date. This should take much less time than starting over.

Note on /tmp space

When files are being manipulated, osmosis stores temporary data under the /tmp directory. I happened to have a small /tmp volume, and there were problems with the space filling up. To specify a different directory for osmosis to use, set the following environment variable:

export JAVACMD_OPTIONS="-Djava.io.tmpdir=/dir/to/osm/tmp"

Using osm-c-tools:

There is really not much space needed for temporary data. However, if you like, you can change the tempfiles' location: there is a --tempfiles= option for osmupdate and a -t= option for osmconvert.

Download updates

Once you have the configuration.txt and state.txt files ready, run this command to download the necessary updates. This command may take an hour or more to download all of the necesary files

osmosis --rri workingDirectory=. --wxc update.osm.gz

After this command completes, you'll notice that the state.txt file gets updated with the most recent one. osmosis still knows where you left off and if you run the above command again, it'll grab the next batch of updates where it left off from last time. However, if you want to start over, just re-download the original state.txt that goes with your planet.osm file.

Using osm-c-tools:

This step is not necessary because the downloading is done together with the merging in one run (see below).

Prepare data

The update file may contain more than one change for the same entity. You can't merge changes in with a file like that. To fix this, run a simplify-change osmosis step, like this:

nice gzip -d -c update.osm.gz | nice osmosis --read-xml-change file=/dev/stdin --simplify-change --write-xml-change file=- | nice gzip -9 -c > update.unique.osm.gz

Using osm-c-tools:

This step is not necessary because the "simplifying" is done together with the merging in one run (see below).

Merge changes

At this point, you should have planet.osm.gz and update.unique.osm.gz. Time to merge the updates into the planet.osm file. This command will do the merge:

nice gzip -d -c planet.osm.old.gz | nice osmosis --read-xml-change file="update.unique.osm.gz" --read-xml enableDateParsing=no file=/dev/stdin --apply-change --write-xml file=- | nice gzip -9 -c > planet.osm.gz

Using osm-c-tools:

The following command will determine the age of the old planet data and download the appropriate changefiles and simplifying the changefiles and merging them with the old planet file, generating a new planet file.

osmupdate planet_old.o5m planet.o5m

To exclude minutely changefiles and therefore use only daily and hourly ones you can apply these options: --day --hour

If you like to get some detailed information about the processing, use this option: -v

Extract regions

Now that the planet.osm file is up to date, its time to (re)extract the regions you want. You can use either a bounding box or a polygon to define your region. For the sake of simplicity, I'll show how to use the box. The two points you need are the top-left and bottom-right points of the box. Coordinates can be found by looking at the osm map itself and use the permalink link, or any other method you prefer, such as the information freeway. Once you have the coordinate, the following command will extract that region into a separate file. This example is for a box around the United States:

nice gzip -d -c planet.osm.gz | nice osmosis --read-xml enableDateParsing=no file=/dev/stdin --bounding-box top=50.2475 left=-125.0234 bottom=24.1872 right=-66.5762 --write-xml file=- | nice gzip -9 -c > united_states.osm.gz

Using osm-c-tools:

osmconvert planet.o5m -b=-125.0234,24.1872,-66.5762,50.2475 -o=united_states.o5m

Alternatively, when using a bounding polygon:

osmconvert planet.o5m -B=united_states.poly -o=united_states.o5m

Do it all at once

You can save time by running all these steps in one go:

osmosis --rx my-old-file.osm --rri --simc --ac --bb top=50.2475 left=-125.0234 bottom=24.1872 right=-66.5762 --wx my-new-file.osm

Using osm-c-tools:

osmupdate united_states.o5m -B=united_states.poly -o=united_states.o5m

Converting resulting files to .osm XML format

This format conversion is not necessary if you have used the conventional way with Osmosis because you will already have (huge) OSM data files in .osm XML format. However, if you want to save disk space or support faster subsequent processing you might want to convert these data into .pbf format. This can also be done with Osmosis of course.

Using osm-c-tools:

osmconvert united_states.o5m -o=united_stated.osm

Or if you want your data to have .pbf format:

osmconvert united_states.o5m -o=united_stated.pbf