PostgreSQL

From OpenStreetMap Wiki
Jump to: navigation, search

PostgreSQL (postgresql.org) is an Open Source relational database. Since API v0.6 it is used for the main central OSM database running on Servers/smaug. For technical details of how this can be set up to work with the OSM web app, see The Rails Port and Database schema.

Geospatial capabilities

PostgreSQL has geometry types. For our core OSM database we do not use these. We have our own representation of OpenStreetMap Data Primitives.

The PostGIS extension for PostgreSQL is often used for geographic data. PostGIS adds geospatial functions and two metadata tables. Again we do not use this for our core database, however we do use all of these things on the tile server database as required by the Mapnik rendering engine.

  • Osmosis can be used to populate a more general PostgreSQL/PostGIS database from a Planet.osm OSM data dump file.
  • osm2postgresql is powered by Osmosis and does much more (but works only on GNU/Linux)
  • osm2pgsql handles transformation of the planet data into the expected format.
  • OSM2PostGIS pretty new
  • OSM in a box has a own converter
  • Imposm PostGIS importer that supports custom DB schemas. Can import planet files with low memory.
  • osm2pgrouting imports to PostgreSQL/PostGIS database for routing purpose.

Tune the database

Since we are predominantly doing bulk loads and handling lots of data, it is advisable to tune Postgres a bit for this load; the default settings are generally fairly conservative. The following is just a brief overview of the settings you need to think about, read the documentation for more info. Remember at all times that the rest of your system (including a webserver for example) also needs memory to run, so don't starve them out just for PostgreSQL. You need to edit the file postgresql.conf which may be in /etc/postgresql/8.3/main or /var/lib/pgsql/data or similar depending on your distribution. The server needs to be restarted after the changes :

sudo /etc/init.d/postgresql reload

On openSUSE, the configuration files will not exist until you have started the server at least once:

sudo /etc/init.d/postgresql start

After that, it is located in /var/lib/pgsql/data/postgresql.conf

If you are using a version older than 8.2 you will need to use the numbers rather than the more descriptive units.

shared_buffers = 128MB    # 16384 for 8.1 and earlier

This controls the amount of memory used for stored things like index pages, data pages directly into local cache. More is not necessarily better. Remember that osm2pgsql needs lots of memory too and this memory is permanently reserved for PostgreSQL.

If the server won't start after making this change, check for errors appearing in pgstartup.log about exceeding the kernel SHMMAX variable. If this occurs, increase the kernel.shmmax variable using the sysctl command:

# increase max shared memory to 256 megabytes
sysctl -w kernel.shmmax=268435456
 
# for OS X Leopard or newer use the line below and add kern.sysv.shmmax=268435456 to /etc/sysctl.conf
sudo sysctl -w kern.sysv.shmmax=268435456


Edit /etc/sysctl.conf to make it permanent. If you are using linux-vserver technology, please mention you have to set shmmax separately for each guest. If you want to tune your kernel, do not forget to read Managing Kernel Resources part of PostgreSQL documentation.


checkpoint_segments = 20

The default is far too low for bulk inserts. It costs more diskspace this way but the load will be smoother; it depends a bit on your disk subsystem. To determine the optimal setting you need to play a bit. Check your postgres logs for the following:

LOG:  checkpoints are occurring too frequently (28 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".

For advanced users: you can also tune the bgwriter to smooth performance.

maintenance_work_mem = 256MB    # 256000 for 8.1 and earlier

This setting is the amount of memory used while building indexes. The default is far too small for the size of tables we're dealing with. This memory is released after completion so you don't need to be as stingy as above. Note that osm2pgsql is still large at this point so you can't overdo it.

work_mem = 256MB # 256000 for 8.1 and earlier

This setting is the amount of memory used when performing an ORDER BY for which no index can be used. osm2pgsql pre-orders some tables before creating indexes on them, and for this the default setting once again is far too small. All notes on maintenance_work_mem above apply to work_mem, too.

autovacuum = off

Auto-vacuuming means PostgreSQL starts cleaning up your data. This might slow down your data inserting and reading considerably. If you do not use the database for anything else, you can definitely turn autovacuuming off.


See also