Using planet.osm
| It has been proposed that this page or section be merged with Daily update an OSM XML file. (Discuss) |
Using planet.osm downloads involves manipulating large files of compressed XML (Planet.osm).
This page details use of replication diffs to keep a planet file up-to-date. It should perhaps be merged with Planet.osm/diffs#Using the replication diffs
Contents |
Things to do once
There are some things you only need to do once, or very seldom.
Download planet.osm
The first thing you need is a full planet.osm file. These can be downloaded from several places. See the Planet.osm#Downloading page.
Change compression
The planet.osm file is compressed using bzip2. The file will probably look like planet-latest.osm.bz2. Bzip2 is good for saving disk space, but it takes more CPU processing. I've found that I'm willing to give up more disk space in order to get the commands to run faster. Since these files are so large, it takes quite a while to run most of the commands. The following command will convert planet-latest.osm.bz2 to planet.osm.gz
nice bzcat planet-latest.osm.bz2 | nice gzip -9 -c > planet.osm.gz
If you prefer speed above saving disk space, you even can run gzip with option -1 instead of -9. Once this finishes, feel free to delete planet-latest.osm.bz2 - we'll use the .gz one from now on.
Rename to avoid name collisions
The first thing I do is rename planet.osm.gz to planet.old.osm.gz. I'll be generating planet.osm.gz later, and I don't want to clobber the existing one since it takes a really long time to download the whole thing.
Initialize working directory
First up is to initialize the working directory. This should be the same directory you've saved the planet.osm.gz file to.
osmosis --rrii workingDirectory=.
Edit the osmosis configuration
You'll see a configuration.txt file in the working directory after its initialized. This file defines where you get updates from, and how many to download at once. I prefer to use the hour replicate data since I only run updates periodically. I also limit osmosis to grab 5 days at a time.
# The URL of the directory containing change files. #baseUrl=http://planet.openstreetmap.org/hour-replicate baseUrl=http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/hour-replicate # Defines the maximum time interval in seconds to download in a single invocation. # Setting to 0 disables this feature. maxInterval = 345600
Find the correct state.txt file
The state.txt file contains information about what version of planet.osm you have. You need to get the state.txt file that corresponds to the planet-latest.osm.bz2 file you downloaded. There are state files under the hour-replicate/000/* directories of the planet server. As long as you pick one from before you downloaded the planet.osm file, you're OK. For example:
wget http://planet.openstreetmap.org/hour-replicate/000/003/000.state.txt -O state.txt
Things you can loop on indefinitely
The following steps can be repeated as often as you like. You could start from the very beginning each time, but it generally takes a long time to pull down the entire planset.osm file. The following steps are a process where you can update planet.osm with just the diffs necessary to bring it up to date. This should take much less time than starting over.
Note on /tmp space
When files are being manipulated, osmosis stores temporary data under the /tmp directory. I happened to have a small /tmp volume, and there were problems with the space filling up. To specify a different directory for osmosis to use, set the following environment variable:
export JAVACMD_OPTIONS="-Djava.io.tmpdir=/dir/to/osm/tmp"
Download updates
Once you have the configuration.txt and state.txt files ready, run this command to download the necessary updates. This command may take an hour or more to download all of the necesary files
osmosis --rri workingDirectory=. --wxc update.osm.gz
After this command completes, you'll notice that the state.txt file gets updated with the most recent one. osmosis still knows where you left off and if you run the above command again, it'll grab the next batch of updates where it left off from last time. However, if you want to start over, just re-download the original state.txt that goes with your planet.osm file.
Prepare data
The update file may contain more than one change for the same entity. You can't merge changes in with a file like that. To fix this, run a simplify-change osmosis step, like this:
nice gzip -d -c update.osm.gz | nice osmosis --read-xml-change file=/dev/stdin --simplify-change --write-xml-change file=- | nice gzip -9 -c > update.unique.osm.gz
Merge changes
At this point, you should have planet.osm.gz and update.unique.osm.gz. Time to merge the updates into the planet.osm file. This command will do the merge:
nice gzip -d -c planet.osm.old.gz | nice osmosis --read-xml-change file="update.unique.osm.gz" --read-xml enableDateParsing=no file=/dev/stdin --apply-change --write-xml file=- | nice gzip -9 -c > planet.osm.gz
Extract regions
Now that the planet.osm file is up to date, its time to (re)extract the regions you want. You can use either a bounding box or a polygon to define your region. For the sake of simplicity, I'll show how to use the box. The two points you need are the top-left and bottom-right points of the box. Coordinates can be found by looking at the osm map itself and use the permalink link, or any other method you prefer, such as the information freeway. Once you have the coordinate, the following command will extract that region into a separate file. This example is for a box around the United States:
nice gzip -d -c planet.osm.gz | nice osmosis --read-xml enableDateParsing=no file=/dev/stdin --bounding-box top=50.2475 left=-125.0234 bottom=24.1872 right=-66.5762 --write-xml file=- | nice gzip -9 -c > united_states.osm.gz
Do it all at once
You can save time by running all these steps in one go:
osmosis --rx my-old-file.osm --rri --simc --ac --bb top=50.2475 left=-125.0234 bottom=24.1872 right=-66.5762 --wx my-new-file.osm