Diffs are now ODbL licensed at a more permanent location having completed the #Redaction period
This page is for information about diffs of the OpenStreetMap data.
The diffs provided at http://planet.osm.org are small compressed xml files in OsmChange format that contain the changes in the OpenStreetMap data over some period in time. They are stored by their granularity as follows:
|Minute (replication/minute)||Launched every minute.||Full History||Transaction Id|
|Hour (replication/hour)||Launched every hour at 2 minutes past the hour.||Full History||Transaction Id (Aggregation of Minute)|
|Day (replication/day)||Launched at 00:05 UTC every day.||Full History||Transaction Id (Aggregation of Hour)|
| Daily (not updated during redaction period)
Note: This extract misses data from long-running transactions and is being phased out in favour of the Day Transaction Id extract above.
|Launched at 00:35 UTC. Completes at approximately 01:00 UTC.||Delta||Date Aligned|
| Historical Daily
Note: These are the only extracts that allow you to retrieve the complete Open Street Map history.
|Launched at 01:00 UTC. Job is configured with 24 hour extract delay leading to a total of 25 hours to minimise chance of missed data.||Full History||Date Aligned|
Full History extracts contain all changes to entities for the extract period. They may contain multiple versions of some entities if those entities were modified multiple times within that extract period. For example, a given node may be modified twice in a minute leading to two version of the node being included in the minute extract.
Delta extracts only contain the changes necessary to patch a dataset to the current version. They only include the latest version of an entity for a given extract period. These extracts are being phased out in favour of full history extracts because delta data can be derived from full history data if required.
Date Aligned extracts use the timestamp fields of database objects to determine which records to include in the extract. This has the advantage of producing extracts where the time period of its contents are easily identified. The major disadvantage of these is that they may miss data due to long-running transactions committing data with timestamps lying within time periods that have already been extracted. To minimise the chance of this occurring, date aligned extracts are run with a time delay. Unless a very long delay is used some data will be missed.
Transaction Id extracts use internal PostgreSQL database transaction identifiers to determine which records to extract. These identifiers allow all changed records to be extracted with zero artificial time delays. The downside to these extracts is that they are not exactly date aligned. The timestamps follow these rules:
- The timestamp field specified in a replication state file is guaranteed to be greater than or equal to the maximum timestamp contained in the data file.
- A data file may contain data with timestamps that are equal to or earlier than the timestamp of the previous state file.
These timestamp rules mean that the timestamp specified in a state file can be reliably used to identify the starting point for patching a dataset. The patching tool must cope with receiving duplicate data that already exists in the dataset.
Daily files organisation
The daily files are named by the time period they cover, with the format:
The data in the file is the change in data between midnight on the first and second days, as identified by the timestamps on the current data. Because of the delay in creating the file it is very unlikely, but possible, that some data may be missing.
Minute, Hour, and Day Files Organisation
Each file with hourly or minutely granularity is identified by a nine-digit sequence number. The sequence number is split into groups of three digits and can be found in the following location:
Where the sequence number N = AAA*1000000 + BBB*1000 + CCC. For example, the most recent hourly diff at the time of writing has a sequence number of 3,067 and its location is 000/003/067. Each OsmChange file is accompanied by a state.txt file which contains the following information:
|sequenceNumber||3067||The sequence number of the change/state file.|
|txnMaxQueried||0||The maximum transaction ID which is included in the diff. (NOTE: Doesn't seem to be used for the hourly diffs)|
|timestamp||2010-03-27T17\:00\:00Z||The timestamp when the diff was generated.|
|txnReadyList||916201159,916203039||Unknown - seems to be unused. The previously active transaction ids that can now be queried?|
|txnMax||916203060||The maximum transaction ID at the time the diff was generated, usually the same as txnMaxQueried.|
|txnActiveList||916201159,916203039||The list of transaction IDs between this state and the previous state which have not been committed yet. (NOTE: Doesn't seem to be used for the hourly diffs).|
The numbers are sequential, but are not necessarily aligned with any clock time. To find out the time associated with a particular diff it is necessary to read the timestamp from the associated state file.
For a period since April 1st we have been in a redaction period. The replication diffs are available at a different location: http://planet.openstreetmap.org/redaction-period/ and files at the normal location are no longer updating.
During this period a team have been working on rebuilding the database to ready it for license changeover to ODbL. In fact redactions didn't take place until July 2012. Redaction means removing all the data which cannot be re-licensed under ODbL. The volume of data changes during this period was very high, and consisted largely of deletions. Depending your use case, you may well choose not to consume the redaction period diffs, and wait until some future date when you are happy with the progress of remapping. At this point you can now take a full planet download (which is licensed under ODbL) and re-initialise your database with this, before continuing to sync using diffs. In fact even if you are consuming redaction period diffs, you should reinitialise in this way when the license changes over, for legal reasons.
Using the replication diffs
The most common way to use the replication diffs is via Osmosis, which will automatically download the relevant diffs and combine them to provide all the changes since it was last run. The diffs can be consumed directly, but this can introduce unnecessary complexity and is not recommended.
Alternatively, the osmupdate can be used to create cumulated diff files (.osc, .osc.gz, .o5c, .o5c.gz). The program will download all necessary diffs between a given timestamp and now. Depending on this period of time, minutely, hourly and daily diffs will be downloaded and processed. It is faster than Osmosis and a bit easier to handle but it lacks a lot of functionality Osmosis provides, for example, osmupdate cannot update databases or write full history diffs (there will be only the newest version of each object in the output file).
Instructions and example scripts to operate minutely-updated Mapnik can be found on the Minutely Mapnik page.
More details can be found in the read change interval section of the Osmosis documentation. Briefly, this is the way it works:
osmosis --rrii workingDirectory=.This will initialise the current directory as a replication workspace, creating a
configuration.txtfile. By default, this is initialised to minutely diffs, so if you want hourly you should edit the file so that it references the hourly replication diffs URL. For simplicity the rest of this will assume you want minutely diffs, but simply replace minute with hour to use the hourly diffs. Download a state file:
This will be the most recent state file, which may not be suitable for your needs. To reset it to an earlier state all you need to change is the sequenceNumber entry. To find the appropriate sequence number by timestamp you can either look through the diff files (the file timestamp is almost always the same as the timestamp in the file), or use Peter Körner's tool.
Now that Osmosis is set-up, whenever you need diffs you can run:
osmosis --rri workingDirectory=. --wxc foo.osc.gz
This will put all the changes between the previous sequence number and the most up-to-date into the file foo.osc.gz, and there is a parameter in configuration.txt to control the maximum time range of diffs to download and combine at once (defaults to 1 hour).
There is a detailed description at osmupdate Wiki page. You also can consult the help page of the program (option --help). Here is a short example how to create a planet change file for the time range between November 1, 2011, 21:59 (UTC) and today:
./osmupdate 2011-11-01T21:59:00Z cumulated_changefile.osc.gz
To update an OSM data file you can use this command:
./osmupdate old_file.pbf new_file.pbf
The program will first determine the age of the old file. This is usually done by reading the file's header. If the header does not contain a file timestamp, the whole file will be scanned to get the latest object timestamp. The new file will be created with a file timestamp so that the (automatic) scanning may be needed only for the first time you update this file. Because of this you do not need to download or edit any state files manually.
Retrieving a File's Timestamp
Sooner or later you will detect an OSM file on your local disk drive and try to find out of which date the stored OSM data are. If you are lucky, the file name will contain date and time. If not, you may want to try to read the files's timestamp or to analyze the file's contents to find the latest recorded dataset timestamp. Other than inspecting an XML file header manually, the program osmconvert can help you getting the required information. For example:
./osmconvert file_with_timestamp.osm.pbf --out-timestamp 2011-08-01T23:50:00Z ./osmconvert file_without_timestamp.o5m --out-timestamp (invalid timestamp)
./osmconvert germany.osm.pbf --out-statistics timestamp min: 2005-07-05T02:14:17Z timestamp max: 2011-07-31T19:59:46Z lon min: -20.0712330 lon max: 21.1441799 lat min: 47.0830289 lat max: 59.9982830 nodes: 78138447 ways: 11342322 relations: 176024 node id min: 1 node id max: 1380816490 way id min: 92 way id max: 123952798 relation id min: 159 relation id max: 1693098
I'm nearly sure, Osmosis is able to perform these tasks too. Please, someone who has more experience with Osmosis, complete this description. Thanks!
Regionally limited diffs
Processing planet diffs is often an high server ressources consuming process while not everyone needs world coverage.
- http://download.openstreetmap.fr/replication/ provides diffs (.osc.gz) restricted to Europe, France and french overseas regions and territories (You can contact User:Sletuffe or User:Jocelyn for more information)