User:ZeLonewolf/Overpass Installation Guide

From OpenStreetMap Wiki
Jump to navigation Jump to search

This is my guide for configuring a complete overpass server from beginning to end. I found the existing guides to be lacking. This is a combination of various other guides, the official docs, Kai Johnson's diary entry, and suggestions from @mmd on Slack. This guide is intended to demonstrate how to configure a server dedicated to only running overpass.

Below are the steps that I followed for configuring an overpass server on Ubuntu 18.04 LTS. This guide will walk through the process of configuring a "complete" overpass server, which includes:

  • OSM Base: "basic" queries; all functionality not included in the next two bullets.
  • Areas: searches within a particular area. If you want to do searches within a way or relation, you must have areas enabled.
  • Attic Data: in this context, "attic" refers to prior versions of data in the map. If you want to do "diff" queries which return the changes in an object over time, you must have this.
  • Map data updates on 1-minute intervals.
  • Automatic startup and cleanup of unneeded files.

This guide assumes a basic familiarity with Linux.

Server Prerequisites

An overpass server can take some pretty serious horsepower. In addition to serving queries, your overpass server is also constantly updating its database with new map data. At a minimum, the server must be powerful enough to process updates faster than real-time (if it takes 65 seconds to apply a 1-minute diff, your server will not be able to remain current). You should not use burstable cloud servers, such as AWS "t2" or "t3" instances -- in my experience they will not work. The bandwidth and disk space requirements are considerable. If you have access to compute infrastructure, consider testing out your configuration locally to work out the kinks before deploying to a cloud server. I created this guide using an Intel NUC server.

At a minimum, you must have:

  • 1TB of disk space
  • SSD hard disk for the database if you intend to deploy minutely updates. In testing, magnetic hard drives would frequently lag to around ~30 minute refresh intervals.
  • Linux operating system installed. You can use any distro, but for this guide I used Ubuntu 18.04 LTS
  • SSH access to a user account with sudo privileges
  • Inbound firewall access to port 80 (for serving queries)

Configure Overpass User & Required Dependencies

Your overpass server will run as a dedicated user. I named my overpass user op. Your normal user account (with sudo access) is listed as user in the snippet below. Replace that with your actual username.

sudo su

mkdir -p /opt/op
groupadd op
usermod -a -G op user
useradd -d /opt/op -g op -G sudo -m -s /bin/bash op
chown -R op:op /opt/op
apt-get update
apt-get install g++ make expat libexpat1-dev zlib1g-dev apache2 liblz4-dev
a2enmod cgid
a2enmod ext_filter
a2enmod headers

exit

Web Server Configuration

The Apache web server provides the http-side interface to overpass. Modify the default configuration as shown below.

sudo nano -w /etc/apache2/sites-available/000-default.conf

Note the timeout below is set to 300 (5 minutes). This can be increased if you want to run long-running queries. Configuration file:

<VirtualHost *:80>
	ServerAdmin webmaster@localhost
	ExtFilterDefine gzip mode=output cmd=/bin/gzip
	DocumentRoot /var/www/html
	ScriptAlias /api/ /opt/op/cgi-bin/
	<Directory "/opt/op/cgi-bin/">
                AllowOverride None
                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                Require all granted
                Header add Access-Control-Allow-Origin "*"
	</Directory>
    Header add Access-Control-Allow-Origin "overpass-turbo.eu";
	ErrorLog /var/log/apache2/error.log
	LogLevel warn
	CustomLog /var/log/apache2/access.log combined
    TimeOut 300
</VirtualHost>

The "Access-Control-Allow-Origin" line allows your overpass server to be invoked from a separate domain. You'll need this if you want to use your server with the Overpass Turbo web interface. Be sure to use the "http" version of Overpass Turbo unless you are installing an SSL certificate in front of your overpass server.

Next, start the web server and configure it to auto-start on boot:

sudo systemctl restart apache2
sudo systemctl enable apache2

Compile and Install Overpass

Next, we log in as the overpass user to compile and install overpass in /opt/op. We'll also create some required subdirectories.

sudo su op

cd
wget http://dev.overpass-api.de/releases/osm-3s_v0.7.61.1.tar.gz
tar xvzf osm-3s_v0.7.61.1.tar.gz
cd osm-3s_v0.7.61.1
./configure CXXFLAGS="-O2" --prefix=/opt/op --enable-lz4
make install
cp -pr cgi-bin ..
cd
chmod -R 755 cgi-bin
mkdir db
mkdir diff
mkdir log
cp -pr osm-3s_v0.7.61.1/rules db

Edit the db/rules/areas.osm3s script. At the top of the file there is a "timeout" value that defaults to 24 hours:

<?xml version="1.0" encoding="UTF-8"?>
<osm-script timeout="86400" element-limit="4294967296">

If importing areas takes more than 24 hours to complete, this value needs to be increased. However, I have found that in recent versions of overpass, this runs in about 2-3 hours, so this is unnecessary.

Additionally, this file defines which OSM ways and relations should be queryable as an area. You can customize this to suit your use case. For example, I don't need to query by postal code, so I removed that block in order to save resources.

Download the Planet

Next, (still logged in as the overpass user), we download the entire planet data. It is possible to start from a downloaded planet file, particularly if you don't need older attic data. I will describe the clone option here as it seems to work the most consistently. If you need attic data going back to the beginning (2012), you must use the clone option as the planet file does not include any attic data. If you only need attic data from now going forward, it's possible start from a planet file and then add attic data to it.

Clone the planet data

This takes *a long time*. You will likely need to run this overnight. When I ran this, it took 7 hours. This runs interactively. You may wish to use the screen command to kick this off and return to it later.

bin/download_clone.sh --db-dir=db --source=https://dev.overpass-api.de/api_drolbr/ --meta=attic

Backup

Now would be an excellent time to backup your downloaded database folder. If something goes wrong in the next steps, you'll be able to quickly recover and not have to re-download the clone. Be aware of how much disk space you have available for the backup. I installed a second (low-cost) magnetic hard drive on my server specifically for this backup.

cp -pr db <db backup location>

Configure Launch Scripts

I found that the launch scripts that come with the tarball are less than ideal. The scripts below are replacements, with credit to Kai Johnson for improving upon my initial examples. The script defaults to a rate limit of 2 queries; if you need a higher rate limit (for example, you're standing up a dedicated server and would like to intentionally saturate it with queries), you can increase the value in the script.

Create a bin/launch.sh shell script to read as follows:

#!/usr/bin/env bash

EXEC_DIR="/opt/op/bin"
DB_DIR="/opt/op/db"
DIFF_DIR="/opt/op/diff"
LOG_DIR="/opt/op/log"

rm -fv $DB_DIR/osm3s_v0.7*
rm -fv $DB_DIR/*.shadow
rm -fv $DB_DIR/*.lock
rm -fv /dev/shm/osm3s*

ionice -c 2 -n 7 nice -n 17 nohup \
    "$EXEC_DIR/dispatcher" --osm-base --attic --rate-limit=0 --allow-duplicate-queries=yes \
    --space=10737418240 "--db-dir=$DB_DIR" >>"$LOG_DIR/osm_base.out" &
ionice -c 3 nice -n 19 nohup \
    "$EXEC_DIR/dispatcher" --areas --space=10700000000 --rate-limit=0 --allow-duplicate-queries=yes \
    "--db-dir=$DB_DIR" >>"$LOG_DIR/areas.out" &
ionice -c 3 nice -n 19 nohup \
    "$EXEC_DIR/fetch_osc.sh" `cat "$DB_DIR/replicate_id"` \
    "https://planet.openstreetmap.org/replication/minute/" "$DIFF_DIR" \
    >>"$LOG_DIR/fetch_osc.out" &
ionice -c 2 -n 7 nice -n 17 nohup \
    "$EXEC_DIR/apply_osc_to_db.sh" "$DIFF_DIR" `cat "$DB_DIR/replicate_id"` \
    --meta=yes >> "$LOG_DIR/apply_osc_to_db.out" &

Create a script called area_updater.sh in the bin folder as follows. This is a replacement for the rules_loop.sh file.

#!/usr/bin/env bash

DB_DIR="/opt/op/db"
EXEC_DIR="/opt/op/bin"
LOG_DIR="/opt/op/log"

pushd "$EXEC_DIR"#!/usr/bin/env bash

DB_DIR="/opt/op/db"
LOG_DIR="/opt/op/log"
EXEC_DIR="/opt/op/bin"

pushd "$EXEC_DIR"

echo "`date '+%F %T'`: update started" >>$LOG_DIR/area_update.log
ionice -c 3 nice -n 19 "$EXEC_DIR/osm3s_query" --progress --rules < $DB_DIR/rules/areas.osm3s >>$LOG_DIR/area_update.log
echo "`date '+%F %T'`: update finished" >>$LOG_DIR/area_update.log
sleep 3

popd

Log File Management

Disk space = money. Therefore, you should configure your server so that unneeded files do not grow too large.

Centralize Log Files

By default, the scripts out of the box put log files in various locations. Create the following symbolic links in order to centralize log file writing to one location (the overpass log directory):

ln -sf /opt/op/diff/fetch_osc.log /opt/op/log/fetch_osc.out 
ln -sf /opt/op/db/apply_osc_to_db.log /opt/op/log/apply_osc_to_db.out 
ln -sf /opt/op/db/transactions.log /opt/op/log/transactions.log

Automatic Log Rotation

The logrotate utility comes installed by default with Ubuntu 18.04 and allows you to automatically roll over and delete old log files so they don't fill up your disk.

As the regular user, configure logrotate for overpass:

sudo nano -w /etc/logrotate.d/overpass

The configuration file below will rotate and compress log files daily and delete log files after three days:

/opt/op/diff/*.log /opt/op/state/*.log /opt/op/db/*.log /opt/op/log/*.out {
        daily
        missingok
        copytruncate
        rotate 3
        compress
        delaycompress
        notifempty
        create 644 op op
}

Test that you've configured logrotate correctly with the following command:

logrotate -d /etc/logrotate.d/overpass

Server Automation

Ideally, we would like to be able to "set and forget" our server, and not have to manually log in each time it is rebooted. The following jobs can be automated:

  • Automatic startup. Start the overpass server on system boot.
  • Remove old diff files. The scripts out of the box download and apply diffs, but never delete them. In order to preserve disk space, you should automatically clear old diff files or else they will endlessly accumulate.
  • Update areas. Updates the areas index, which allows you to run area queries. New areas added to the database can't be queried until this is updated.

As the op user, edit the crontab in order to define automated tasks:

crontab -e

Next, configure cron with automation tasks. The scripts below run the launch script on boot, and delete diff files more than 2 days old every day at 1:00 AM server time. Additionally, it runs the area update script three times daily.

@reboot /opt/op/bin/launch.sh
0 1 * * * find /opt/op/diff -mtime +2 -type f -delete
0 */8 * * * /opt/op/bin/area_updater.sh

Start Overpass

The overpass configuration consists of back-end processing and a web server. The scripts above will start the back-end piece only. First, we will launch the overpass back-end:

bin/launch.sh

Next, start the HTTP server:

sudo systemctl restart apache2

The log directory will start populating with log file output. Now would be an excellent time to grab a sandwich and come back later. The server is busy populating areas and catching up the database. This will take a number of hours for the server to catch up.

Performance Verification

Next, we must check to make sure that our server is working, and that it has enough horsepower to actually keep up on its updates.

Fetching updates

The fetc_osc.sh script will continuously download new 1-minute diff updates. The scripts above download these diffs to the diff directory. Run the command below to find the timestamp of the most recently downloaded diff file:

find diff -type f -printf '%Tb %Td  %TY %TT %p\n' | sort -n | tail -1 | cut -f2- -d" "

After a minute or two, re-run this command. You should see the timestamp update and the filename numbering sequence increment. In addition, there is a log file you can inspect to see the diff downloading activity:

tail -f log/fetch_osc.out

Applying Updates

Next, we need to confirm that updates are being applied. Since the database download took several hours, your database needs to "catch up" to real time. Updates are logged to the db/apply_osc_to_db.log file. Inspect this file:

tail bin/apply_osc_to_db.log

The output will have a series of lines that look something like the text below.

2020-06-28 23:29:08: updating from 4084526
2020-06-28 23:29:08: updating to 4084529
2020-06-28 23:29:41: update complete 4084529

Subtract the two date stamps to get the time elapsed between updates. In this case, the update took 33 seconds. Next, subtract the two update numbers. In this case, 3 updates were applied. This means it took 33 seconds to apply 3 minutes of updates. As long as the time elapsed is less than the update differences, your server will (eventually) catch up to real-time. The server is completely caught up once the update numbers are 1 apart.

HTTP Interface

Next, test to see if the web server is responding to queries:

wget --output-document=test.xml http://localhost/api/interpreter?data=%3Cprint%20mode=%22body%22/%3E

Examine the test.xml file. It should look something like:

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.56.9 ab41fea6">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2020-06-30T03:19:02Z"/>
</osm>

Next, test the external web interface. In a web browser, enter: http://[SERVER ADDRESS]/api/status. You should see output that looks like the following:

Connected as: 3232236024
Current time: 2020-06-30T03:18:29Z
Rate limit: 2
2 slots available now.
Currently running queries (pid, space limit, time limit, start time):

Area Database Initial Load

At this point, your server should be functional for basic queries. It can take up to 24 hours or more for area queries to start working. On a magnetic hard drive, expect this to take 2-3 days. The area_updater.sh script is executing the command ./osm3s_query --progress --rules which populates a set of files starting with "area" in the db directory. You can view the progress of populating the area database with:

tail -f log/area_update.log

and

watch ls -l db/area*

You can verify that updating areas is complete by checking the log file for a line that looks like:

2020-07-03 03:03:29: update finished

Query Tests

To test the server, use Overpass Turbo. Note that this site sends client-side queries, so it will work on servers on the same network as your web browser. Go to settings and enter the server as follows:

http://{SERVER_ADDRESS}/api/

The queries below will test various categories of functionality in your overpass server.

Basic Queries

The following example will query for a single way (Downing Street, London, UK). This query should work out of the box.

[timeout:180][out:json];
way(4244999);
(._;>;);
out body;

Area Queries

This example will load all roads in Boston. This query will only return results once the areas database is fully populated.

[timeout:180][out:json];
area(3602315704);
(
way(area)
["name"]
["highway"];
); 
(._;>;);
out body;

Attic Area Querues

This example will load all roads in Boston that have changed since June 12, 2020. This query will load only the changes.

[timeout:180][out:xml][diff:"2020-06-12T00:00:00Z"];
area(3602315704);
(
way(area)
["name"]
["highway"];
);
(._;>;);
out body;

Firewall Configuration

If you are building a public overpass server (one that anyone can access), you can safely skip this section. Otherwise, most likely you want to restrict access to one or more IP addresses. The default firewall on Ubuntu 18.04 is ufw or "uncomplicated firewall".

The following commands will configure your firewall for SSH and HTTP access from specified addresses.

sudo ufw default deny incoming
sudo ufw default allow outgoing
# Change below to IP address or range that should be allowed SSH login access
sudo ufw allow from 192.168.0.0/16 to any port ssh
# Change below to IP address or range that should be allowed Overpass query access
sudo ufw allow from 123.123.123.123 to any port 80

Finally, enable your firewall:

sudo ufw enable

Verify that it's configured properly as follows:

sudo ufw status verbose

Recovering a Corrupted Database

THIS SECTION IS UNDER CONSTRUCTION

If your database gets corrupted for some reason, it may be faster to re-download a fresh database rather than fix the issue, especially if the database is more than a day out of date. The basic steps are as follows:

  1. Disable the area update cron job
  2. Take the apache server offline and run the shutdown script:

/opt/op/bin/shutdown.sh apachectl stop

  1. Mount external disk (needed if local SSD space is limited

mount /mnt/sda rm -rf /mnt/sda/op/db/*

  1. Remove the existing db and diff directories
  2. Re-copy the rules folder into db
  3. Re-sync the planet data

/opt/op/bin/download_clone.sh --db-dir=db --source=https://dev.overpass-api.de/api_drolbr/ --meta=attic

  1. Convert to lz4 (be careful with disk space)
  2. Re-run the launch scripts and wait for areas generation to be completed
  3. Re-create log file links from db/
  4. Start the apache server