User:ZeLonewolf/Overpass Installation Guide/v1.0

From OpenStreetMap Wiki
Jump to navigation Jump to search

The existing guides for configuring an overpass server are somewhat incomplete. I composed this guide starting from those guides and added here what I felt was missing.

Below are the steps that I followed for configuring an overpass server on Ubuntu 18.04 LTS. This guide will walk through the process of configuring a "complete" overpass server, which includes:

  • OSM Base: "basic" queries; all functionality not included in the next two bullets.
  • Areas: searches within a particular area. If you want to do searches within a way or relation, you must have areas enabled.
  • Attic Data: in this context, "attic" refers to prior versions of data in the map. If you want to do "diff" queries which return the changes in an object over time, you must have this.
  • Map data updates on 1-minute intervals.
  • Automatic startup and cleanup of unneeded files.

This guide assumes a basic familiarity with Linux.

Server Prerequisites

An overpass server can take some pretty serious horsepower. In addition to serving queries, your overpass server is also constantly updating its database with new map data. At a minimum, the server must be powerful enough to process updates faster than real-time (if it takes 65 seconds to apply a 1-minute diff, your server will not be able to remain current). You should not use burstable cloud servers, such as AWS "t2" or "t3" instances -- in my experience they will not work. The bandwidth and disk space requirements are considerable. If you have access to compute infrastructure, consider testing out your configuration locally to work out the kinks before deploying to a cloud server.

At a minimum, you must have:

  • 400 GB of disk space (as of June 2020)
  • SSD hard disk for the database if you intend to deploy minutely updates. In testing, magnetic hard drives would frequently lag to around ~30 minute refresh intervals.
  • Linux operating system installed. You can use any distro, but for this guide I used Ubuntu 18.04 LTS
  • SSH access to a user account with sudo privileges
  • Inbound firewall access to port 80 (for serving queries)

Configure Overpass User & Required Dependencies

Your overpass server will run as a dedicated user. I named my overpass user op. Your normal user account (with sudo access) is listed as user in the snippet below. Replace that with your actual username.

sudo su

mkdir -p /opt/op
groupadd op
usermod -a -G op user
useradd -d /opt/op -g op -G sudo -m -s /bin/bash op
chown -R op:op /opt/op
apt-get update
apt-get install g++ make expat libexpat1-dev zlib1g-dev apache2
a2enmod cgid
a2enmod ext_filter

exit

Web Server Configuration

The Apache web server provides the http-side interface to overpass. Modify the default configuration as shown below.

sudo nano -w /etc/apache2/sites-available/000-default.conf

Note the timeout below is set to 300 (5 minutes). This can be increased if you want to run long-running queries. Configuration file:

<VirtualHost *:80>
	ServerAdmin webmaster@localhost
	ExtFilterDefine gzip mode=output cmd=/bin/gzip
	DocumentRoot /var/www/html
	ScriptAlias /api/ /opt/op/cgi-bin/
	<Directory "/opt/op/cgi-bin/">
                AllowOverride None
                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                Require all granted
	</Directory>
	ErrorLog /var/log/apache2/error.log
	LogLevel warn
	CustomLog /var/log/apache2/access.log combined
    TimeOut 300
</VirtualHost>

Next, start the web server and configure it to auto-start on boot:

sudo systemctl restart apache2
sudo systemctl enable apache2

Compile and Install Overpass

Next, we log in as the overpass user to compile and install overpass in /opt/op. We'll also create some required subdirectories.

sudo su op

cd
wget http://dev.overpass-api.de/releases/osm-3s_v0.7.55.tar.gz
tar xvzf osm-3s_v0.7.55.tar.gz
cd osm-3s_v0.7.55
./configure CXXFLAGS="-O2" --prefix=/opt/op
make install
cd
mkdir db
mkdir diff
mkdir log
cp -pr osm-3s_v0.7.55/rules db

Download the Planet

Next, (still logged in as the overpass user), we download the entire planet data. It is possible to start from a downloaded planet file, particularly if you don't need older attic data. I will describe the clone option here as it seems to work the most consistently. If you need attic data going back to the beginning (2012), you must use the clone option as the planet file does not include any attic data. If you only need attic data from now going forward, it's possible start from a planet file and then add attic data to it.

Clone the planet data

This takes *a long time*. You will likely need to run this overnight. When I ran this, it took 7 hours. This runs interactively. You may wish to use the screen command to kick this off and return to it later.

bin/download_clone.sh --db-dir=db --source=http://dev.overpass-api.de/api_drolbr/ --meta=attic

Backup

Now would be an excellent time to backup your downloaded database folder. If something goes wrong in the next steps, you'll be able to quickly recover and not have to re-download the clone. Be aware of how much disk space you have available for the backup.

cp -pr db db-backup

Configure Launch Scripts

I found that the launch scripts that come with the tarball are less than ideal. The scripts below are replacements. The script defaults to a rate limit of 2 queries; if you need a higher rate limit (for example, you're standing up a dedicated server and would like to intentionally saturate it with queries), you can increase the value in the script.

Create a bin/launch.sh shell script to read as follows:

#!/usr/bin/env bash

EXEC_DIR="/opt/op/bin"
DB_DIR="/opt/op/db"
DIFF_DIR="/opt/op/diff"
LOG_DIR="/opt/op/log"

rm -f "$DB_DIR/osm3s_v0.7.55_osm_base"
rm -f "$DB_DIR/osm3s_v0.7.55_areas"
rm -rf "$DB_DIR/*.shadow"
rm -rf "/dev/shm/osm3s*"

ionice -c 2 -n 7 nice -n 17 nohup \
    "$EXEC_DIR/dispatcher" --osm-base --attic --rate-limit=2 \
    --space=10737418240 "--db-dir=$DB_DIR" >>"$LOG_DIR/osm_base.out" &
ionice -c 2 -n 7 nice -n 18 nohup \
    "$EXEC_DIR/dispatcher" --areas "--db-dir=$DB_DIR" >>"$LOG_DIR/areas.out" &
ionice -c 3 nice -n 19 nohup \
    "$EXEC_DIR/fetch_osc.sh" `cat "$DB_DIR/replicate_id"` \
    "https://planet.openstreetmap.org/replication/minute/" "$DIFF_DIR" \
    >>"$LOG_DIR/fetch_osc.out" &
ionice -c 3 nice -n 17 nohup \
    "$EXEC_DIR/apply_osc_to_db.sh" "$DIFF_DIR" auto --meta=attic \
    >>"$LOG_DIR/apply_osc_to_db.out" &

"$EXEC_DIR/area_updater.sh" &

Create a script called area_updater.sh in the bin folder as follows. This is a replacement for the rules_loop.sh file.

#!/usr/bin/env bash

DB_DIR="/opt/op/db"
EXEC_DIR="/opt/op/bin"
LOG_DIR="/opt/op/log"

pushd "$EXEC_DIR"

while [[ true ]]; do
{
  echo "`date '+%F %T'`: update started" >>$LOG_DIR/area_update.log
  ionice -c 2 -n 7 nice -n 19 ./osm3s_query --progress \
    --rules <$DB_DIR/rules/areas.osm3s >>$LOG_DIR/area_update.log
  echo "`date '+%F %T'`: update finished" >>$LOG_DIR/area_update.log
  sleep 3
}; done

Edit the db/rules/areas.osm3s script. At the top of the file there is a "timeout" value that defaults to 24 hours:

<?xml version="1.0" encoding="UTF-8"?>
<osm-script timeout="86400" element-limit="4294967296">

Importing areas can easily take more than 24 hours to complete. Increase this value to something large.

Log File Management

Disk space = money. Therefore, you should configure your server so that unneeded files do not grow too large.

Centralize Log Files

By default, the scripts out of the box put log files in various locations. Create the following symbolic links in order to centralize log file writing to one location (the overpass log directory):

ln -sf /opt/op/diff/fetch_osc.log /opt/op/log/fetch_osc.out 
ln -sf /opt/op/db/apply_osc_to_db.log /opt/op/log/apply_osc_to_db.out 
ln -sf /opt/op/db/transactions.log /opt/op/log/transactions.log

Automatic Log Rotation

The logrotate utility comes installed by default with Ubuntu 18.04 and allows you to automatically roll over and delete old log files so they don't fill up your disk.

As the regular user, configure logrotate for overpass:

sudo nano -w /etc/logrotate.d/overpass

The configuration file below will rotate and compress log files daily and delete log files after three days:

/opt/op/log/*.log /opt/op/log/*.out {
        daily
        missingok
        rotate 3
        compress
        delaycompress
        notifempty
        create 640 op op
}

Test that you've configured logrotate correctly with the following command:

logrotate -d /etc/logrotate.d/overpass

Server Automation

Ideally, we would like to be able to "set and forget" our server, and not have to manually log in each time it is rebooted. The following jobs should be automated:

  • Automatic startup. Start the overpass server on system boot
  • Remove old diff files. The scripts out of the box download and apply diffs, but never delete them. In order to preserve disk space, you should automatically clear old diff files or else they will endlessly accumulate.

As the op user, edit the crontab in order to define automated tasks:

crontab -e

Next, configure cron with automation tasks. The scripts below run the launch script on boot, and delete diff files more than 2 days old every day at 1:00 AM server time.

@reboot /opt/op/bin/launch.sh
0 1 * * * find /opt/op/diff -mtime +2 -type f -delete

Start Overpass

The overpass configuration consists of back-end processing and a web server. The scripts above will start the back-end piece only. First, we will launch the overpass back-end:

bin/launch.sh

Next, start the HTTP server:

sudo systemctl restart apache2

The log directory will start populating with log file output. Now would be an excellent time to grab a sandwich and come back later. The server is busy populating areas and catching up the database. This will take a number of hours for the server to catch up.

Performance Verification

Next, we must check to make sure that our server is working, and that it has enough horsepower to actually keep up on its updates.

Fetching updates

The fetc_osc.sh script will continuously download new 1-minute diff updates. The scripts above download these diffs to the diff directory. Run the command below to find the timestamp of the most recently downloaded diff file:

find diff -type f -printf '%Tb %Td  %TY %TT %p\n' | sort -n | tail -1 | cut -f2- -d" "

After a minute or two, re-run this command. You should see the timestamp update and the filename numbering sequence increment. In addition, there is a log file you can inspect to see the diff downloading activity:

tail -f log/fetch_osc.out

Applying Updates

Next, we need to confirm that updates are being applied. Since the database download took several hours, your database needs to "catch up" to real time. Updates are logged to the db/apply_osc_to_db.log file. Inspect this file:

tail bin/apply_osc_to_db.log

The output will have a series of lines that look something like the text below.

2020-06-28 23:29:08: updating from 4084526
2020-06-28 23:29:08: updating to 4084529
2020-06-28 23:29:41: update complete 4084529

Subtract the two date stamps to get the time elapsed between updates. In this case, the update took 33 seconds. Next, subtract the two update numbers. In this case, 3 updates were applied. This means it took 33 seconds to apply 3 minutes of updates. As long as the time elapsed is less than the update differences, your server will (eventually) catch up to real-time. The server is completely caught up once the update numbers are 1 apart.

HTTP Interface

Next, test to see if the web server is responding to queries:

wget --output-document=test.xml http://localhost/api/interpreter?data=%3Cprint%20mode=%22body%22/%3E

Examine the test.xml file. It should look something like:

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.55.9 ab41fea6">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2020-06-30T03:19:02Z"/>
</osm>

Next, test the external web interface. In a web browser, enter: http://[SERVER ADDRESS]/api/status. You should see output that looks like the following:

Connected as: 3232236024
Current time: 2020-06-30T03:18:29Z
Rate limit: 2
2 slots available now.
Currently running queries (pid, space limit, time limit, start time):

Area Database Initial Load

At this point, your server should be functional for basic queries. It can take up to 24 hours or more for area queries to start working. On a magnetic hard drive, expect this to take 2-3 days. The area_updater.sh script is executing the command ./osm3s_query --progress --rules which populates a set of files starting with "area" in the db directory. You can view the progress of populating the area database with:

tail -f log/area_update.log

and

watch ls -l db/area*

You can verify that updating areas is complete by checking the log file for a line that looks like:

2020-07-03 03:03:29: update finished

Query Tests

To test the server, use Overpass Turbo. Note that this site sends client-side queries, so it will work on servers on the same network as your web browser. Go to settings and enter the server as follows:

http://{SERVER_ADDRESS}/api/

The queries below will test various categories of functionality in your overpass server.

Basic Queries

The following example will query for a single way (Downing Street, London, UK). This query should work out of the box.

[timeout:180][out:json];
way(4244999);
(._;>;);
out body;

Area Queries

This example will load all roads in Boston. This query will only return results once the areas database is fully populated.

[timeout:180][out:json];
area(3602315704);
(
way(area)
["name"]
["highway"];
); 
(._;>;);
out body;

Attic Area Querues

This example will load all roads in Boston that have changed since June 12, 2020. This query will load only the changes.

[timeout:180][out:xml][diff:"2020-06-12T00:00:00Z"];
area(3602315704);
(
way(area)
["name"]
["highway"];
);
(._;>;);
out body;

Firewall Configuration

If you are building a public overpass server (one that anyone can access), you can safely skip this section. Otherwise, most likely you want to restrict access to one or more IP addresses. The default firewall on Ubuntu 18.04 is ufw or "uncomplicated firewall".

The following commands will configure your firewall for SSH and HTTP access from specified addresses.

sudo ufw default deny incoming
sudo ufw default allow outgoing
# Change below to IP address or range that should be allowed SSH login access
sudo ufw allow from 192.168.0.0/16 to any port ssh
# Change below to IP address or range that should be allowed Overpass query access
sudo ufw allow from 123.123.123.123 to any port 80

Finally, enable your firewall:

sudo ufw enable

Verify that it's configured properly as follows:

sudo ufw status verbose

Recovering a Corrupted Database

If your database gets corrupted for some reason, it may be faster to re-download a fresh database rather than fix the issue, especially if the database is more than a day out of date. The basic steps are as follows:

  1. Take the apache server offline and kill all overpass processes
  2. Remove the existing db and diff directories
  3. Re-copy the rules folder into db
  4. Re-sync the planet data
  5. Re-run the launch scripts and wait for areas generation to be completed
  6. Re-create log file links from db/
  7. Start the apache server