User:JamesBadger/Osm2pgsql/benchmarks

From OpenStreetMap Wiki
Jump to navigation Jump to search

Importing the OSM data set using osm2pgsql. See Osm2pgsql/benchmarks.

Cloud Instance Import

Configuration

OpenStack IceHouse instance.

General settings

Setting Value
Created Aug 27 11:27 MDT
Flavor m1.xxlarge
RAM 32GB
VCPUs 8
Disk 20GB
OS Ubuntu Server 14.04.1 LTS 64-bit
Postgres Version 9.3.5
osm2pgsql version 901a996 (May 28 2014)
Volumes 32 GB at /planet, 200 GB at /work
Planetfile PBF - 2014/08/20

OpenStack Glance volumes are mounted with ZFS on Linux. Default ZFS compression is enabled for /work. ZFS recordsize=8k and primarycache=metadata for the work/postgresql filesystem. PostgreSQL has been configured to use that as its data directory.

ZFS Pools and Filesystems

32 GB pool 'planet'

Name Used by Dataset Available Compression Ratio Mountpoint
planet 25.5G 5.73G 1.00x /planet

200 GB pool 'work'

Name Used by Dataset Available Compression Ratio Mountpoint Properties
work 32K 196G 1.97x /work compression=lz4, atime=off
work/log 31K 196G 1.00x /work/log inherit
work/log/postgresql 30K 196G 1.00x /work/log/postgresql inherit
work/postgresql 41.8M 196G 2.76x /work/postgresql recordsize=8k, primarycache=metadata

PostgreSQL configuration

Setting Value
shared_buffers 8MB
temp_buffers 64MB
work_mem 1MB
maintenance_work_mem 4096MB
effective_io_concurrency 1
fsync off
synchronous_commit off
full_page_writes off
checkpoint_segments 20
checkpoint_completion_target 0.9
random_page_cost 3.0
effective_cache_size 18GB
log_destination csvlog
logging_collector on
log_directory '/work/log/postgresql'
log_filename 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age 1d
log_rotation_size 1GB
log_min_duration_statement 250ms
log_checkpoints on
log_connections on
log_disconnections on
log_duration on
log_line_prefix '%t [%p]:[%l] user=%u, db=%d '
log_lock_waits on
log_temp_files 0
autovacuum off
autovacuum_vacuum_scale_factor 0.04
autovacuum_analyze_scale_factor 0.02

Run

$ sudo -u postgres -i
postgres$ time osm2pgsql --slim -d gis -C 20000 --flat-nodes /work/nodes.cache --number-processes 6 /planet/planet-140827.osm.pbf
osm2pgsql SVN version 0.85.0 (64bit id space)

Using projection SRS 900913 (Spherical Mercator)
Setting up table: planet_osm_point
NOTICE:  table "planet_osm_point" does not exist, skipping
NOTICE:  table "planet_osm_point_tmp" does not exist, skipping
Setting up table: planet_osm_line
NOTICE:  table "planet_osm_line" does not exist, skipping
NOTICE:  table "planet_osm_line_tmp" does not exist, skipping
Setting up table: planet_osm_polygon
NOTICE:  table "planet_osm_polygon" does not exist, skipping
NOTICE:  table "planet_osm_polygon_tmp" does not exist, skipping
Setting up table: planet_osm_roads
NOTICE:  table "planet_osm_roads" does not exist, skipping
NOTICE:  table "planet_osm_roads_tmp" does not exist, skipping
Using built-in tag processing pipeline
Allocating memory for dense node cache
Allocating dense node cache in one big chunk
Allocating memory for sparse node cache
Sharing dense sparse
Node-cache: cache=20000MB, maxblocks=2560000*8192, allocation method=11
Mid: loading persistent node cache from /work/nodes.cache
Allocated space for persistent node cache file
Maximum node in persistent node cache: 0
Mid: pgsql, scale=100 cache=20000
Setting up table: planet_osm_nodes
NOTICE:  table "planet_osm_nodes" does not exist, skipping
Setting up table: planet_osm_ways
NOTICE:  table "planet_osm_ways" does not exist, skipping
Setting up table: planet_osm_rels
NOTICE:  table "planet_osm_rels" does not exist, skipping

Reading in file: /planet/planet-140827.osm.pbf
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(708550 31.34/s)
Standard exception processing way_id 1382119: TopologyException: side location conflict at 1189651.77 8876718.8800000008
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(2442080 33.79/s)
Standard exception processing way_id 3631463: TopologyException: side location conflict at 1906862.01 6654069.9800000004
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(2552370 34.24/s)
Standard exception processing way_id 3757788: TopologyException: side location conflict at 1322274.0800000001 5383303.8399999999
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(2672440 34.37/s)
Standard exception processing way_id 3890452: TopologyException: side location conflict at 1038877.48 5779728.6200000001
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(2765210 34.59/s)  parse time: 109760s

Node stats: total(2496510292), max(3044821862) in 5011s
Way stats: total(249511922), max(300402050) in 24800s
Relation stats: total(2765219), max(3994576) in 79949s
Committing transaction for planet_osm_point
Committing transaction for planet_osm_line
Committing transaction for planet_osm_polygon
Committing transaction for planet_osm_roads

Going over pending ways...
Maximum node in persistent node cache: 3045064703
        160218658 ways are pending

Using 6 helper-processes
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Helper process 4 out of 6 initialised          
Helper process 5 out of 6 initialised          
Helper process 1 out of 6 initialised          
Helper process 2 out of 6 initialised          
Helper process 3 out of 6 initialised          
Helper process 0 out of 6 initialised          
Process 5 finished processing 26703109 ways in 76486 sec
Process 0 finished processing 26703110 ways in 76486 sec
Process 4 finished processing 26703109 ways in 76486 sec
Process 1 finished processing 26703110 ways in 76486 sec
Process 2 finished processing 26703110 ways in 76486 sec
Process 3 finished processing 26703110 ways in 76486 sec
Maximum node in persistent node cache: 3045064703
Maximum node in persistent node cache: 3045064703
Maximum node in persistent node cache: 3045064703
Maximum node in persistent node cache: 3045064703
Maximum node in persistent node cache: 3045064703

All child processes exited

160218658 Pending ways took 76491s at a rate of 2094.61/s
Committing transaction for planet_osm_point
Committing transaction for planet_osm_line
Committing transaction for planet_osm_polygon
Committing transaction for planet_osm_roads

Going over pending relations...
Maximum node in persistent node cache: 3045064703
        0 relations are pending

Using 6 helper-processes
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Process 4 finished processing 0 relations in 13 sec
Maximum node in persistent node cache: 3045064703
Process 5 finished processing 0 relations in 14 sec
Maximum node in persistent node cache: 3045064703
Process 3 finished processing 0 relations in 14 sec
Maximum node in persistent node cache: 3045064703
Process 0 finished processing 0 relations in 14 sec
Process 1 finished processing 0 relations in 14 sec
Maximum node in persistent node cache: 3045064703
Process 2 finished processing 0 relations in 14 sec
Maximum node in persistent node cache: 3045064703

All child processes exited
0 Pending relations took 15s at a rate of 0.00/s

node cache: stored: 2343195456(93.86%), storage efficiency: 89.39% (dense blocks: 2387255, sparse nodes: 88445625), hit rate: 92.85%
Sorting data and creating indexes for planet_osm_line
Sorting data and creating indexes for planet_osm_point
Sorting data and creating indexes for planet_osm_polygon
Sorting data and creating indexes for planet_osm_roads
Maximum node in persistent node cache: 3045064703
Stopping table: planet_osm_nodes
Stopped table: planet_osm_nodes in 0s
Stopping table: planet_osm_rels
Stopping table: planet_osm_ways
Building index on table: planet_osm_rels (fastupdate=off)
Building index on table: planet_osm_ways (fastupdate=off)
Analyzing planet_osm_point finished
Analyzing planet_osm_roads finished
Analyzing planet_osm_polygon finished
Analyzing planet_osm_line finished
Stopped table: planet_osm_rels in 2256s
Copying planet_osm_point to cluster by geometry finished
Creating geometry index on  planet_osm_point
Copying planet_osm_roads to cluster by geometry finished
Creating geometry index on  planet_osm_roads
Creating osm_id index on  planet_osm_roads
Creating osm_id index on  planet_osm_point
Creating indexes on  planet_osm_roads finished
All indexes on  planet_osm_roads created  in 31410s
Completed planet_osm_roads
Creating indexes on  planet_osm_point finished
All indexes on  planet_osm_point created  in 35766s
Completed planet_osm_point
Copying planet_osm_line to cluster by geometry finished
Creating geometry index on  planet_osm_line
Creating osm_id index on  planet_osm_line
Creating indexes on  planet_osm_line finished
All indexes on  planet_osm_line created  in 157911s
Completed planet_osm_line
Copying planet_osm_polygon to cluster by geometry finished
Creating geometry index on  planet_osm_polygon
Creating osm_id index on  planet_osm_polygon
Creating indexes on  planet_osm_polygon finished
All indexes on  planet_osm_polygon created  in 232064s
Completed planet_osm_polygon
Stopped table: planet_osm_ways in 265753s

Osm2pgsql took 452331s overall

real    7538m51.173s
user    2426m23.276s
sys     262m43.906s

7538 minutes is 5 days, 5 hours, and 38 minutes.

I estimate the bottleneck was disk I/O, which is network based on this OpenStack cloud.

Post-Run Data Usage

Osm2pgsql-import-charts.png

These charts were made using PNP4Nagios and NCSA scripts running on the host to push data to a remote Nagios instance. NCSA scripts were run every 10-15 minutes. Note that the "Free Space" charts are inverted and actually display space usage on the filesystem.

Post-Run Info

This server was terminated after being unable to properly vacuum the database. A "VACUUM ANALYZE VERBOSE" command would not finish, even after taking a day to run. This is likely due to the extremely limited disk IO.

Intel Server Import

Configuration

Circa-2010 Intel Server.

General settings

Setting Value
RAM 24 GB DDR3 SDRAM
Motherboard Intel S5520SCR
Chassis Intel SC5600BRP
CPUs 1 x Xeon E5530 2.4GHz
Cores 8
RAID Card Intel SRCSASBB8I
Disk 2x300 GB 15000 RPM Disks in Hardware RAID 1 (SAS)

4x2000 GB 5400-7200 RPM WD RED Disk as individual RAID 0 drives (SATA), then used to build a RAID 10 ZFS Pool

OS Ubuntu Server 14.04.1 LTS 64-bit
Postgres Version 9.3.5
osm2pgsql version 901a996 (May 28 2014)
Planetfile PBF - 2014/09/03

PostgreSQL data directory at /var/lib/postgresql on RAID 1. PostgreSQL logs at /work/log/postgresql on ZFS Pool. Node cache for osm2pgsql at /opt/osm/nodes.cache on RAID 1. Planet file at /work/planet/planet-140903.osm.pbf on ZFS Pool. Compression enabled on ZFS Pool.

PostgreSQL configuration

Setting Value
shared_buffers 8MB
temp_buffers 64MB
work_mem 48MB
maintenance_work_mem 4096MB
effective_io_concurrency 2
fsync off
synchronous_commit off
full_page_writes off
checkpoint_segments 20
checkpoint_completion_target 0.9
effective_cache_size 16GB
autovacuum on
autovacuum_vacuum_scale_factor 0.04
autovacuum_analyze_scale_factor 0.02

Autovacuum is enabled because I do not want to have to run a manual vacuum afterwards that will take who know how long.

Run

Note that only data above 45˚ N is imported on this run; that is the area of interest for our implementation. Hstore is also enabled.

$ sudo -u postgres -i
$ service postgresql start
$ createuser osm
$ createdb -E UTF8 -O osm gis
$ psql -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis.sql -d gis
$ psql -d gis -c "ALTER TABLE geometry_columns OWNER TO osm; ALTER TABLE spatial_ref_sys OWNER TO osm;"
$ psql -d gis -c "CREATE EXTENSION hstore;"
$ psql -f /usr/share/osm2pgsql/osm2pgsql/900913.sql -d gis
$ time osm2pgsql --slim -d gis -C 16000 --number-processes 8 --bbox -180,45,180,90 --hstore --flat-nodes /opt/osm/nodes.cache /work/planet/planet-140903.osm.pbf

Post-Run Data Usage

TDB

Data Update Statistics

Info on updating the DB to newer versions using changesets. TBD.