This page contains a collection of information relating to technical aspects of Osmosis replication.
Full History Changes versus Delta Changes
Historically, Osmosis replication provided a way to update a set of data from one point in time to another. In other words, it aimed to make it efficient to keep a local copy of the planet up to date. As a result of this, the changesets (often referred to as "diff" files) only contained the differences between two points in time.
As the idea of replication matured and it became a more integral part of the OSM landscape, it became desirable to not only describe the differences in data between two points in time, but to include all data that changed between two points in time. This may be best explained with an example.
Let's say that a node with id 100 was created on Monday. This action created version 1 of the node. If a planet file were produced at the end of Monday, it would include version 1 of node 100. Let's say that on Tuesday the node was modified twice creating version 2 and version 3 of the node. If a new planet were created at the end of Tuesday, it would include version 3 of node 100. If a old-style delta style changeset was produced for the Tuesday period it would contain only version 3 of node 100 allowing a planet produced at the end of Monday to be patched efficiently to become an end of Tuesday planet. On the other hand, if a full history changeset was produced for the Tuesday period, it would contain both version 2 and version 3 of node 100 allowing a consumer to determine all changes that had occurred on Tuesday.
Original changesets were always delta style changesets. Over time, all replication jobs are all moving to full history format changesets due to the additional flexibility they provide. The Osmosis --simplify-change task allows full history changesets to be collapsed into delta style changesets for those tasks that require delta style data.
Replication and Changesets
The term "changeset" can have several meanings in OSM (Open Street Map). The term was originally used by Osmosis to refer to a set of changes contained in a change file, and Osmosis still uses the term in this way. An Osmosis changeset is often referred to as a "diff" and contains a complete set of changes occurring between two points in time. The OSM API has since introduced its own concept of a changeset which refers to a set of changes performed by a single user against the API. An OSM API changeset is not transactional and may be used to represent a set of actions performed over a period of time. Changes performed by OSM API changesets overlap with other changesets being performed at the same time. The two changesets both represent in OSM data, and can both be stored using the OSC (Open Street Map Change) change file format, but they are largely incompatible concepts and should not be confused. An Osmosis changeset may contain data from many different API changesets. Similarly, an API changeset may result in changes being captured in multiple Osmosis changesets.
Time-aligned versus Transaction-aligned
Osmosis uses two very different techniques to define changeset boundaries. It may use time boundaries capturing all changes occurring between two points in time, or it may use transaction boundaries representing all changes between two transaction commit points in the database.
Time-aligned replication boundaries are the original and simplest mechanism to both implement and understand. Identifying changeset data requires a simple query of historical tables based on date ranges. The downside of this method is that each time interval must be queried a long time after the fact to avoid missing data. This is due to the fact that a database record may be created with a timestamp matching the current time, but the database transaction may not be committed until several minutes or even hours later. Delayed commits mean that a replication job reading data based on timestamps can only query time ranges that are many hours in the past to avoid missing data. In practice this typically requires a time lag of over 24 hours to avoid missing changes, and even a 24 hour delay provides no guarantees. The advantage of time-aligned replication is that it contains well-defined data that makes it easy to identify and consume the correct changesets. Each file can be given a file based on the time period it represents.
Transaction-aligned replication boundaries are a newer technique that is more complicated, but more powerful. Identifying changeset data is now based on querying for data based on which transactions created it. Only transactions that are known to have committed are queried, and those still in-flight are queried in subsequent transactions. This results in changesets that contain data with timestamps that are non-deterministic. Each changeset will contain data with timestamps anywhere from the present to many hours in the past depending on how long transactions were in-flight. Transaction-aligned changesets cannot be given nice date-based filenames, and instead are given names based on monotonically increasing numbers. Each changeset file is given a number that is one greater than the previous changeset. Each changeset is accompanied by a state file that includes information such as the timestamp of the newest record in the changeset to make it easier to identify which changeset should be used. The downside of transactional changesets is that fine-grained changesets containing an approximate time period cannot be generated after the fact, in other words minute changesets can only be produced with reasonable accuracy if they are created every minute as transactions are committed.
The majority of replication jobs should now use transaction-aligned changeset boundaries. The except are historical changesets produced well after the fact where long-running transactions can be ignored.
This section provides some details on how replication jobs are configured on the planet.openstreetmap.org server.
All replication jobs run under the bretth user on the planet.openstreetmap.server. The following directory structure is used in the bretth home directory:
- bin - This directory is placed on the user's PATH, and contains symlinks to applications installed in the user's home directory.
- planet - A symlink to the planet file storage directory. All replication jobs storing data under the planet folder access it via this symlink to simplify moving the storage location in the future.
- app - All user-installed applications are stored under this directory.
- app/osmosis - This is a checkout of the Osmosis source tree from git. An osmosis symlink is created in the user's bin directory pointing to the package/bin/osmosis script in this directory. All replication jobs use this Osmosis installation.
- app/osmosis2 - The same as the osmosis directory above, but used for testing Osmosis upgrades. An osmosis-2 symlink is created in the user's bin directory.
- app/replicate-xxxx - Each replication job has a directory used to maintain its configuration and relevant state.
- app/replicate-xxxx/data - Each replication job has a data sub-directory. This is typically a symlink into the relevant planet storage directory (eg. ~/planet/redaction-period/minute-replicate).
All replication jobs are launched via cron. This is the minute replication cron entry:
* * * * * cd ~/app/replicate;~/bin/osmosis -q --replicate-apidb authFile=dbAuth.txt allowIncorrectSchemaVersion=true --write-replication workingDirectory=data
Only the minute replication job accesses the database directly. The hourly job consumes minute replication data, and the daily job consumes hourly replication data. The hourly and daily cron entries are listed below:
2 * * * * cd ~/app/replicate-hour;~/bin/osmosis -q --mrf workingDirectory=. 5 * * * * cd ~/app/replicate-day;~/bin/osmosis -q --mrf workingDirectory=.
Consumers of replication data typically use the --read-replication-interval task in Osmosis. More details are available on the Planet.osm/diffs#Using_the_replication_diffs page.
Streaming replication allows data to be "pushed" from the server to clients soon after it becomes available. This is achieved through the use of a HTTP connection that is initiated by the client, but held open and used by the server to push data to the client as soon as it becomes available.
There are two server processes running:
- Database reader - This process acts in a similar way to minute replication, but instead of being launched via cron the process remains running and polls the database at much shorter intervals (typically somewhere between 1 and 10 seconds). It runs a HTTP server used by the second process to allow notifications when a new replication sequence becomes available.
- Streaming server - This process runs a HTTP server used by consumers to access replication data. This process connects to the database reader task over HTTP so that it knows when new data is available to serve.
The command line for the database reader and streaming server processes are shown below:
# Database reader osmosis -q --replicate-apidb iterations=0 minInterval=10000 maxInterval=60000 authFile=dbAuth.txt --send-replication-sequence port=8081 --write-replication workingDirectory=data # Streaming server osmosis -q --send-replication-data dataDirectory=data port=8080 notificationPort=8081
The above command lines cause the streaming server process to listen on port 8080 where it is proxied to the Internet via Apache at URL . The database reader task listens on port 8081 and is only accessible internally to the server with a firewall preventing external access.
This section provides some details on how data from the replication streaming server can be consumed by a replication client.
A basic command line for dumping all changes to the console is shown below. Note that this will track progress via a state.txt file written to the current directory. Running this command line again will cause all data produced since the last invocation to be downloaded.
osmosis --receive-replication host=planet.openstreetmap.org port=80 pathPrefix=replication/streaming --replication-to-change --write-xml-change -
A more useful command line for writing to a pgsnapshot database would look like this:
osmosis --receive-replication host=planet.openstreetmap.org port=80 pathPrefix=replication/streaming --replication-to-change --write-pgsql-change authFile=dbAuth.txt
To start the replication from a specific time, enter the replication beginning time into the following command to initialise a local state file.
curl http://planet.openstreetmap.org/replication/streaming/replicationState/<yyyy-MM-dd-HH-mm-ss> > state.txt
Replication servers can be setup in a caching hierarchy allowing bandwidth to be distributed.
Each caching server acts as a client to an upstream replication server, and appears as a normal replication server to its clients. In a similar fashion to the main replication server, it requires two separate processes with one downloading from the upstream server, and the other serving data to clients. The command lines are listed below:
# Upstream client osmosis --receive-replication host=planet.openstreetmap.org port=80 pathPrefix=replication --send-replication-sequence port=8081 --write-replication workingDirectory=data # Streaming server osmosis -q --send-replication-data dataDirectory=data port=8080 notificationPort=8081
In the above example, the upstream client is notifying the streaming server of new data using port 8081. The streaming server is sending data to clients using port 8080. It is preferable to use Apache as a proxy server in front of the streaming server due to the minimal nature of the built-in HTTP server. The above example will store all data in a data sub-directory, and will store a state.txt file in the current directory to track what data has been downloaded from upstream.
Streaming Replication Wire Protocol
This section provides some details on the wire format used by streaming replication.
The streaming server supports streaming of two types of data, state-only streaming, or full state and data streaming. Both types of stream use the same wire protocol, but in the case of state-only streaming the replication data associated with replication state is omitted. State-only streaming is primarily useful for debugging or monitoring of a running server. This section will only describe the full state and data streaming protocol as state-only streaming is a subset of this.
All data is sent using HTTP chunked encoding. This allows each replication sequence to be sent in isolation, and allows requests to be proxied via mainstream HTTP servers such as Apache. The format of chunked encoding is not described here as it is already well defined elsewhere.
On top of the underlying chunked encoding, each replication sequence is sent using the following format. Note that each entry shown in the following bullet points is typically sent using its own HTTP chunk, however intermediate proxy servers may choose to re-encode the data so this alignment cannot be relied upon.
- A number is sent in ASCII format followed by a carriage return/line feed pair. This number defines the size of the following state data.
- The state data for the replication sequence is sent. This state data is byte for byte identical to a normal replication state.txt file.
- A second number is sent in ASCII format followed by a carriage return/line feed pair. This number defines the size of the following replication data.
- The replication data for the replication sequence is sent. This replication data is byte for byte identical to a normal replication *.osc.gz file.
The above protocol provides a minimal method of sending normal state.txt and *.osc.gz replication files to a client, but many files can be pushed to the client over a single HTTP connection instead of being individually polled and downloaded by the client.