User:ZeLonewolf/Overpass Installation Guide
This guide is under construction to update for v0.7.61.1. See User:ZeLonewolf/Overpass_Installation_Guide/v1.0 for the v0.55 version |
This is my guide for configuring a complete overpass server from beginning to end. I found the existing guides to be lacking. This is a combination of various other guides, the official docs, Kai Johnson's diary entry, and suggestions from @mmd on Slack. This guide is intended to demonstrate how to configure a server dedicated to only running overpass.
Below are the steps that I followed for configuring an overpass server on Ubuntu 18.04 LTS. This guide will walk through the process of configuring a "complete" overpass server, which includes:
- OSM Base: "basic" queries; all functionality not included in the next two bullets.
- Areas: searches within a particular area. If you want to do searches within a way or relation, you must have areas enabled.
- Attic Data: in this context, "attic" refers to prior versions of data in the map. If you want to do "diff" queries which return the changes in an object over time, you must have this.
- Map data updates on 1-minute intervals.
- Automatic startup and cleanup of unneeded files.
This guide assumes a basic familiarity with Linux.
Server Prerequisites
An overpass server can take some pretty serious horsepower. In addition to serving queries, your overpass server is also constantly updating its database with new map data. At a minimum, the server must be powerful enough to process updates faster than real-time (if it takes 65 seconds to apply a 1-minute diff, your server will not be able to remain current). You should not use burstable cloud servers, such as AWS "t2" or "t3" instances -- in my experience they will not work. The bandwidth and disk space requirements are considerable. If you have access to compute infrastructure, consider testing out your configuration locally to work out the kinks before deploying to a cloud server. I created this guide using an Intel NUC server.
At a minimum, you must have:
- 1TB of disk space
- SSD hard disk for the database if you intend to deploy minutely updates. In testing, magnetic hard drives would frequently lag to around ~30 minute refresh intervals.
- Linux operating system installed. You can use any distro, but for this guide I used Ubuntu 18.04 LTS
- SSH access to a user account with
sudo
privileges - Inbound firewall access to port 80 (for serving queries)
Configure Overpass User & Required Dependencies
Your overpass server will run as a dedicated user. I named my overpass user op
. Your normal user account (with sudo
access) is listed as user
in the snippet below. Replace that with your actual username.
sudo su
mkdir -p /opt/op
groupadd op
usermod -a -G op user
useradd -d /opt/op -g op -G sudo -m -s /bin/bash op
chown -R op:op /opt/op
apt-get update
apt-get install g++ make expat libexpat1-dev zlib1g-dev apache2 liblz4-dev
a2enmod cgid
a2enmod ext_filter
a2enmod headers
exit
Web Server Configuration
The Apache web server provides the http-side interface to overpass. Modify the default configuration as shown below.
sudo nano -w /etc/apache2/sites-available/000-default.conf
Note the timeout below is set to 300 (5 minutes). This can be increased if you want to run long-running queries. Configuration file:
<VirtualHost *:80>
ServerAdmin webmaster@localhost
ExtFilterDefine gzip mode=output cmd=/bin/gzip
DocumentRoot /var/www/html
ScriptAlias /api/ /opt/op/cgi-bin/
<Directory "/opt/op/cgi-bin/">
AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Require all granted
Header add Access-Control-Allow-Origin "*"
</Directory>
Header add Access-Control-Allow-Origin "overpass-turbo.eu";
ErrorLog /var/log/apache2/error.log
LogLevel warn
CustomLog /var/log/apache2/access.log combined
TimeOut 300
</VirtualHost>
The "Access-Control-Allow-Origin" line allows your overpass server to be invoked from a separate domain. You'll need this if you want to use your server with the Overpass Turbo web interface. Be sure to use the "http" version of Overpass Turbo unless you are installing an SSL certificate in front of your overpass server.
Next, start the web server and configure it to auto-start on boot:
sudo systemctl restart apache2
sudo systemctl enable apache2
Compile and Install Overpass
Next, we log in as the overpass user to compile and install overpass in /opt/op
. We'll also create some required subdirectories.
sudo su op
cd
wget http://dev.overpass-api.de/releases/osm-3s_v0.7.61.1.tar.gz
tar xvzf osm-3s_v0.7.61.1.tar.gz
cd osm-3s_v0.7.61.1
./configure CXXFLAGS="-O2" --prefix=/opt/op --enable-lz4
make install
cp -pr cgi-bin ..
cd
chmod -R 755 cgi-bin
mkdir db
mkdir diff
mkdir log
cp -pr osm-3s_v0.7.61.1/rules db
Edit the db/rules/areas.osm3s
script. At the top of the file there is a "timeout" value that defaults to 24 hours:
<?xml version="1.0" encoding="UTF-8"?>
<osm-script timeout="86400" element-limit="4294967296">
If importing areas takes more than 24 hours to complete, this value needs to be increased. However, I have found that in recent versions of overpass, this runs in about 2-3 hours, so this is unnecessary.
Additionally, this file defines which OSM ways and relations should be queryable as an area. You can customize this to suit your use case. For example, I don't need to query by postal code, so I removed that block in order to save resources.
Download the Planet
Next, (still logged in as the overpass user), we download the entire planet data. It is possible to start from a downloaded planet file, particularly if you don't need older attic data. I will describe the clone option here as it seems to work the most consistently. If you need attic data going back to the beginning (2012), you must use the clone option as the planet file does not include any attic data. If you only need attic data from now going forward, it's possible start from a planet file and then add attic data to it.
Clone the planet data
This takes *a long time*. You will likely need to run this overnight. When I ran this, it took 7 hours. This runs interactively. You may wish to use the screen
command to kick this off and return to it later.
bin/download_clone.sh --db-dir=db --source=https://dev.overpass-api.de/api_drolbr/ --meta=attic
Backup
Now would be an excellent time to backup your downloaded database folder. If something goes wrong in the next steps, you'll be able to quickly recover and not have to re-download the clone. Be aware of how much disk space you have available for the backup. I installed a second (low-cost) magnetic hard drive on my server specifically for this backup.
cp -pr db <db backup location>
Configure Launch Scripts
I found that the launch scripts that come with the tarball are less than ideal. The scripts below are replacements, with credit to Kai Johnson for improving upon my initial examples. The script defaults to a rate limit of 2 queries; if you need a higher rate limit (for example, you're standing up a dedicated server and would like to intentionally saturate it with queries), you can increase the value in the script.
Create a bin/launch.sh
shell script to read as follows:
#!/usr/bin/env bash
EXEC_DIR="/opt/op/bin"
DB_DIR="/opt/op/db"
DIFF_DIR="/opt/op/diff"
LOG_DIR="/opt/op/log"
rm -fv $DB_DIR/osm3s_v0.7*
rm -fv $DB_DIR/*.shadow
rm -fv $DB_DIR/*.lock
rm -fv /dev/shm/osm3s*
ionice -c 2 -n 7 nice -n 17 nohup \
"$EXEC_DIR/dispatcher" --osm-base --attic --rate-limit=0 --allow-duplicate-queries=yes \
--space=10737418240 "--db-dir=$DB_DIR" >>"$LOG_DIR/osm_base.out" &
ionice -c 3 nice -n 19 nohup \
"$EXEC_DIR/dispatcher" --areas --space=10700000000 --rate-limit=0 --allow-duplicate-queries=yes \
"--db-dir=$DB_DIR" >>"$LOG_DIR/areas.out" &
ionice -c 3 nice -n 19 nohup \
"$EXEC_DIR/fetch_osc.sh" `cat "$DB_DIR/replicate_id"` \
"https://planet.openstreetmap.org/replication/minute/" "$DIFF_DIR" \
>>"$LOG_DIR/fetch_osc.out" &
ionice -c 2 -n 7 nice -n 17 nohup \
"$EXEC_DIR/apply_osc_to_db.sh" "$DIFF_DIR" `cat "$DB_DIR/replicate_id"` \
--meta=yes >> "$LOG_DIR/apply_osc_to_db.out" &
Create a script called area_updater.sh
in the bin
folder as follows. This is a replacement for the rules_loop.sh
file.
#!/usr/bin/env bash
DB_DIR="/opt/op/db"
EXEC_DIR="/opt/op/bin"
LOG_DIR="/opt/op/log"
pushd "$EXEC_DIR"#!/usr/bin/env bash
DB_DIR="/opt/op/db"
LOG_DIR="/opt/op/log"
EXEC_DIR="/opt/op/bin"
pushd "$EXEC_DIR"
echo "`date '+%F %T'`: update started" >>$LOG_DIR/area_update.log
ionice -c 3 nice -n 19 "$EXEC_DIR/osm3s_query" --progress --rules < $DB_DIR/rules/areas.osm3s >>$LOG_DIR/area_update.log
echo "`date '+%F %T'`: update finished" >>$LOG_DIR/area_update.log
sleep 3
popd
Log File Management
Disk space = money. Therefore, you should configure your server so that unneeded files do not grow too large.
Centralize Log Files
By default, the scripts out of the box put log files in various locations. Create the following symbolic links in order to centralize log file writing to one location (the overpass log
directory):
ln -sf /opt/op/diff/fetch_osc.log /opt/op/log/fetch_osc.out
ln -sf /opt/op/db/apply_osc_to_db.log /opt/op/log/apply_osc_to_db.out
ln -sf /opt/op/db/transactions.log /opt/op/log/transactions.log
Automatic Log Rotation
The logrotate
utility comes installed by default with Ubuntu 18.04 and allows you to automatically roll over and delete old log files so they don't fill up your disk.
As the regular user, configure logrotate for overpass:
sudo nano -w /etc/logrotate.d/overpass
The configuration file below will rotate and compress log files daily and delete log files after three days:
/opt/op/diff/*.log /opt/op/state/*.log /opt/op/db/*.log /opt/op/log/*.out {
daily
missingok
copytruncate
rotate 3
compress
delaycompress
notifempty
create 644 op op
}
Test that you've configured logrotate
correctly with the following command:
logrotate -d /etc/logrotate.d/overpass
Server Automation
Ideally, we would like to be able to "set and forget" our server, and not have to manually log in each time it is rebooted. The following jobs can be automated:
- Automatic startup. Start the overpass server on system boot.
- Remove old diff files. The scripts out of the box download and apply diffs, but never delete them. In order to preserve disk space, you should automatically clear old diff files or else they will endlessly accumulate.
- Update areas. Updates the areas index, which allows you to run area queries. New areas added to the database can't be queried until this is updated.
As the op
user, edit the crontab in order to define automated tasks:
crontab -e
Next, configure cron with automation tasks. The scripts below run the launch script on boot, and delete diff files more than 2 days old every day at 1:00 AM server time. Additionally, it runs the area update script three times daily.
@reboot /opt/op/bin/launch.sh
0 1 * * * find /opt/op/diff -mtime +2 -type f -delete
0 */8 * * * /opt/op/bin/area_updater.sh
Start Overpass
The overpass configuration consists of back-end processing and a web server. The scripts above will start the back-end piece only. First, we will launch the overpass back-end:
bin/launch.sh
Next, start the HTTP server:
sudo systemctl restart apache2
The log directory will start populating with log file output. Now would be an excellent time to grab a sandwich and come back later. The server is busy populating areas and catching up the database. This will take a number of hours for the server to catch up.
Performance Verification
Next, we must check to make sure that our server is working, and that it has enough horsepower to actually keep up on its updates.
Fetching updates
The fetc_osc.sh
script will continuously download new 1-minute diff updates. The scripts above download these diffs to the diff
directory. Run the command below to find the timestamp of the most recently downloaded diff file:
find diff -type f -printf '%Tb %Td %TY %TT %p\n' | sort -n | tail -1 | cut -f2- -d" "
After a minute or two, re-run this command. You should see the timestamp update and the filename numbering sequence increment. In addition, there is a log file you can inspect to see the diff downloading activity:
tail -f log/fetch_osc.out
Applying Updates
Next, we need to confirm that updates are being applied. Since the database download took several hours, your database needs to "catch up" to real time. Updates are logged to the db/apply_osc_to_db.log
file. Inspect this file:
tail bin/apply_osc_to_db.log
The output will have a series of lines that look something like the text below.
2020-06-28 23:29:08: updating from 4084526 2020-06-28 23:29:08: updating to 4084529 2020-06-28 23:29:41: update complete 4084529
Subtract the two date stamps to get the time elapsed between updates. In this case, the update took 33 seconds. Next, subtract the two update numbers. In this case, 3 updates were applied. This means it took 33 seconds to apply 3 minutes of updates. As long as the time elapsed is less than the update differences, your server will (eventually) catch up to real-time. The server is completely caught up once the update numbers are 1 apart.
HTTP Interface
Next, test to see if the web server is responding to queries:
wget --output-document=test.xml http://localhost/api/interpreter?data=%3Cprint%20mode=%22body%22/%3E
Examine the test.xml
file. It should look something like:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.56.9 ab41fea6">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2020-06-30T03:19:02Z"/>
</osm>
Next, test the external web interface. In a web browser, enter: http://[SERVER ADDRESS]/api/status
. You should see output that looks like the following:
Connected as: 3232236024
Current time: 2020-06-30T03:18:29Z
Rate limit: 2
2 slots available now.
Currently running queries (pid, space limit, time limit, start time):
Area Database Initial Load
At this point, your server should be functional for basic queries. It can take up to 24 hours or more for area queries to start working. On a magnetic hard drive, expect this to take 2-3 days. The area_updater.sh
script is executing the command ./osm3s_query --progress --rules
which populates a set of files starting with "area" in the db
directory. You can view the progress of populating the area database with:
tail -f log/area_update.log
and
watch ls -l db/area*
You can verify that updating areas is complete by checking the log file for a line that looks like:
2020-07-03 03:03:29: update finished
Query Tests
To test the server, use Overpass Turbo. Note that this site sends client-side queries, so it will work on servers on the same network as your web browser. Go to settings and enter the server as follows:
http://{SERVER_ADDRESS}/api/
The queries below will test various categories of functionality in your overpass server.
Basic Queries
The following example will query for a single way (Downing Street, London, UK). This query should work out of the box.
[timeout:180][out:json];
way(4244999);
(._;>;);
out body;
Area Queries
This example will load all roads in Boston. This query will only return results once the areas database is fully populated.
[timeout:180][out:json];
area(3602315704);
(
way(area)
["name"]
["highway"];
);
(._;>;);
out body;
Attic Area Querues
This example will load all roads in Boston that have changed since June 12, 2020. This query will load only the changes.
[timeout:180][out:xml][diff:"2020-06-12T00:00:00Z"];
area(3602315704);
(
way(area)
["name"]
["highway"];
);
(._;>;);
out body;
Firewall Configuration
If you are building a public overpass server (one that anyone can access), you can safely skip this section. Otherwise, most likely you want to restrict access to one or more IP addresses. The default firewall on Ubuntu 18.04 is ufw or "uncomplicated firewall".
The following commands will configure your firewall for SSH and HTTP access from specified addresses.
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Change below to IP address or range that should be allowed SSH login access
sudo ufw allow from 192.168.0.0/16 to any port ssh
# Change below to IP address or range that should be allowed Overpass query access
sudo ufw allow from 123.123.123.123 to any port 80
Finally, enable your firewall:
sudo ufw enable
Verify that it's configured properly as follows:
sudo ufw status verbose
Recovering a Corrupted Database
THIS SECTION IS UNDER CONSTRUCTION
If your database gets corrupted for some reason, it may be faster to re-download a fresh database rather than fix the issue, especially if the database is more than a day out of date. The basic steps are as follows:
- Disable the area update cron job
- Take the apache server offline and run the shutdown script:
/opt/op/bin/shutdown.sh
apachectl stop
- Mount external disk (needed if local SSD space is limited
mount /mnt/sda
rm -rf /mnt/sda/op/db/*
- Remove the existing db and diff directories
- Re-copy the rules folder into db
- Re-sync the planet data
/opt/op/bin/download_clone.sh --db-dir=db --source=https://dev.overpass-api.de/api_drolbr/ --meta=attic
- Convert to lz4 (be careful with disk space)
- Re-run the launch scripts and wait for areas generation to be completed
- Re-create log file links from db/
- Start the apache server