Using planet.osm

From OpenStreetMap Wiki
Jump to: navigation, search

Using planet.osm downloads involves manipulating large files of compressed XML (Planet.osm).

This page details use of replication diffs to keep a planet file up-to-date. It should perhaps be merged with Planet.osm/diffs#Using the replication diffs

Contents


Things to do once

There are some things you only need to do once, or very seldom.

Download planet.osm

The first thing you need is a full planet.osm file. These can be downloaded from several places. See the Planet.osm#Downloading page.

Change compression

The planet.osm file is compressed using bzip2. The file will probably look like planet-latest.osm.bz2. Bzip2 is good for saving disk space, but it takes more CPU processing. I've found that I'm willing to give up more disk space in order to get the commands to run faster. Since these files are so large, it takes quite a while to run most of the commands. The following command will convert planet-latest.osm.bz2 to planet.osm.gz

nice bzcat planet-latest.osm.bz2 | nice gzip -9 -c > planet.osm.gz

If you prefer speed above saving disk space, you even can run gzip with option -1 instead of -9. Once this finishes, feel free to delete planet-latest.osm.bz2 - we'll use the .gz one from now on.

Rename to avoid name collisions

The first thing I do is rename planet.osm.gz to planet.old.osm.gz. I'll be generating planet.osm.gz later, and I don't want to clobber the existing one since it takes a really long time to download the whole thing.

Initialize working directory

First up is to initialize the working directory. This should be the same directory you've saved the planet.osm.gz file to.

osmosis --rrii workingDirectory=.

Edit the osmosis configuration

You'll see a configuration.txt file in the working directory after its initialized. This file defines where you get updates from, and how many to download at once. I prefer to use the hour replicate data since I only run updates periodically. I also limit osmosis to grab 5 days at a time.

# The URL of the directory containing change files.
#baseUrl=http://planet.openstreetmap.org/hour-replicate
baseUrl=http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/hour-replicate

# Defines the maximum time interval in seconds to download in a single invocation.
# Setting to 0 disables this feature.
maxInterval = 345600

Find the correct state.txt file

The state.txt file contains information about what version of planet.osm you have. You need to get the state.txt file that corresponds to the planet-latest.osm.bz2 file you downloaded. There are state files under the hour-replicate/000/* directories of the planet server. As long as you pick one from before you downloaded the planet.osm file, you're OK. For example:

wget http://planet.openstreetmap.org/hour-replicate/000/003/000.state.txt -O state.txt

Things you can loop on indefinitely

The following steps can be repeated as often as you like. You could start from the very beginning each time, but it generally takes a long time to pull down the entire planset.osm file. The following steps are a process where you can update planet.osm with just the diffs necessary to bring it up to date. This should take much less time than starting over.

Note on /tmp space

When files are being manipulated, osmosis stores temporary data under the /tmp directory. I happened to have a small /tmp volume, and there were problems with the space filling up. To specify a different directory for osmosis to use, set the following environment variable:

export JAVACMD_OPTIONS="-Djava.io.tmpdir=/dir/to/osm/tmp"

Download updates

Once you have the configuration.txt and state.txt files ready, run this command to download the necessary updates. This command may take an hour or more to download all of the necesary files

osmosis --rri workingDirectory=. --wxc update.osm.gz

After this command completes, you'll notice that the state.txt file gets updated with the most recent one. osmosis still knows where you left off and if you run the above command again, it'll grab the next batch of updates where it left off from last time. However, if you want to start over, just re-download the original state.txt that goes with your planet.osm file.

Prepare data

The update file may contain more than one change for the same entity. You can't merge changes in with a file like that. To fix this, run a simplify-change osmosis step, like this:

nice gzip -d -c update.osm.gz | nice osmosis --read-xml-change file=/dev/stdin --simplify-change --write-xml-change file=- | nice gzip -9 -c > update.unique.osm.gz

Merge changes

At this point, you should have planet.osm.gz and update.unique.osm.gz. Time to merge the updates into the planet.osm file. This command will do the merge:

nice gzip -d -c planet.osm.old.gz | nice osmosis --read-xml-change file="update.unique.osm.gz" --read-xml enableDateParsing=no file=/dev/stdin --apply-change --write-xml file=- | nice gzip -9 -c > planet.osm.gz

Extract regions

Now that the planet.osm file is up to date, its time to (re)extract the regions you want. You can use either a bounding box or a polygon to define your region. For the sake of simplicity, I'll show how to use the box. The two points you need are the top-left and bottom-right points of the box. Coordinates can be found by looking at the osm map itself and use the permalink link, or any other method you prefer, such as the information freeway. Once you have the coordinate, the following command will extract that region into a separate file. This example is for a box around the United States:

nice gzip -d -c planet.osm.gz | nice osmosis --read-xml enableDateParsing=no file=/dev/stdin --bounding-box top=50.2475 left=-125.0234 bottom=24.1872 right=-66.5762 --write-xml file=- | nice gzip -9 -c > united_states.osm.gz

Do it all at once

You can save time by running all these steps in one go:

osmosis --rx my-old-file.osm --rri --simc --ac --bb top=50.2475 left=-125.0234 bottom=24.1872 right=-66.5762 --wx my-new-file.osm
Personal tools
Namespaces
Variants
Actions
site
Toolbox