User:Species/PostGIS Tuning

From OpenStreetMap Wiki
Jump to navigation Jump to search

PostGIS Rendering Speed Improvements

I'm working as a freelancer for opendi AG (Munich), where I did set up a OSM Rendering-Toolchain with Mapnik/PostGIS after the Howto Minutely_Mapnik.

DB is updated once a week and primarily used for generating custom sized images via generate_image.py. A Apache/mod_tile with Tirex is running on another machine, but not in production yet.

Performance after initial setup was way too slow for the machine (8 Cores, 4×256GB SSD raid0, 48G RAM).

So i've done some work for rendering speed tuning.

Postgresql settings (e.g. from Frederik's Slides):

  • synchronous_commit off
  • fsync off
  • random_page_cost 1.1 (only for SSD users)
  • wal_buffers 16MB
  • effective_cache_size .75xRAM
  • maintenance_work_mem 4096

note: deactivate sync commits, fsync only if you do not care about your data (in my opinion, i don't - it's only a "copy" of the main DB, if a SSD crashes i have other problems - and i can reimport at any time)

Things i found out myself:

  • shared_buffers default → 12G (dont forget to increase kernel.shmmax) -5% render-time
look with e.g. top how much of reserved share memory postgres uses - this is a good hint how much is needed.
  • work_mem = 1MB → 1500MB -10% render-time
this is mainly needed for big sort operations - landuses on z8 can go easily over 1G (get via explain analyze)
  • autovacuum = on . If your forget to reenable after import: +20% render-time.
  • checkpoint_segments 60 → 1600
needed through import, indexing and vacuum
  • effective_io_concurrency = 30 (for 4 SSDs in HW-RAID0)
no performance-increase at values>30
  • max_connections = 300
default 100 was too low for 8 rendering-procs

Linux:

  • kernel cpufreq-governor from “ondemand” to “performance”: -50% render-time
ondemand didn't raise cpu-frequency fast enough - on my CPU 1.6→3.07 GHz, now running constantly on 3.07 GHz.

Basics on Optimizing

Optimize those queries which use the most render time *absolutely*! There maybe a query which runs 150ms, but is only executed one time when rendering a tile. There maybe an other which runs 20ms, but is executed 60 times - guess which is more important to optimize ☺. You can use my script to print out a list of queries using the most time absolutely.

I've written a shellscript which analyzes the logged queries in the postgresql logfile, you can use it to print out the slowest queries: https://github.com/species/mapnik-query-analyzer

Vincent_De_Phily suggested trying pgfouine or pgbadger also for postgresql log analyzing.


If you found a query, check why it runs so slow:

For all statements written as wiki-"code", you have to connect to your local PostgreSQL/PostGIS db:

 # sudo -u postgres psql -d gis
 gis=# 


Testing improvements after change:

  1. clear cache (sync;echo 3 > /proc/sys/vm/drop_caches)
  2. restart DB
  3. render small amount of tiles to reread indices and shapefiles
  4. clear postgresql log
  5. Render a sufficient amount of random tiles (100-1000)
  6. analyze logfiles

Print Query Plans

at first, print the query plan and show execution order

gis=# explain analyze $STATEMENT ;

for $STATEMENT, copy the code of the query found in your postgresql log file (which has coordinates in it in contrast to the mapnik stylesheet).

example:

gis=# explain analyze SELECT ST_AsBinary("way") AS geom,"name","way_area" FROM (select way,way_area,name
               from planet_osm_polygon
               where name is not null
                 and (waterway is null or waterway != 'riverbank')
                 and place is null
               order by way_area desc
              ) as text WHERE "way" && ST_SetSRID('BOX3D(1171604.841520746 6470654.265434774,1172064.955953034 6470860.847424781)'::box3d, 900913);

it produces the following output:

QUERY PLAN
----------
Subquery Scan on text  (cost=2.65..2.66 rows=1 width=275) (actual time=43.231..90.545 rows=15 loops=1)
  ->  Sort  (cost=2.65..2.65 rows=1 width=275) (actual time=30.587..30.592 rows=15 loops=1)
        Sort Key: planet_osm_polygon.way_area
        Sort Method: quicksort  Memory: 27kB
        ->  Index Scan using idx_poly_name_noplace on planet_osm_polygon  (cost=0.41..2.64 rows=1 width=275) (actual time=26.222..30.529 rows=15 loops=1)
              Index Cond: (way && '010300002031BF0D00010000000500000053E76DD794E0314122E2FC90FFAE584153E76DD794E0314126353C3633AF58418A56B9F460E2314126353C3633AF58418A56B9F460E2314122E2FC90FFAE584153E76DD794E0314122E2FC90FFAE5841'::geometry)
              Filter: ((waterway IS NULL) OR (waterway <> 'riverbank'::text))
Total runtime: 90.742 ms
(8 rows)

this is how the DB will solve your query. You can see which indices it uses, and how much time this needs. Use this information to invent a better index or to rewrite your queries! The output of the query will be discarded.

Another approach: write out the output of a query to a csv:

View results of queries

It can be useful to view the results of a query directly:

Copy (SELECT ST_AsBinary("way") AS geom,"name","way_area" FROM (select way,way_area,name from planet_osm_polygon
              where name is not null
                and (waterway is null or waterway != 'riverbank')
                and place is null
              order by way_area desc
             ) as text WHERE "way" && ST_SetSRID('BOX3D(841288.7876000311 6472714.638467353,841748.367313656 6472920.980379593)'::box3d, 900913)
)To '/tmp/gisq-text.csv' With CSV;

It creates a csv in /tmp/, look at it with your favourite editor. In this special case, this analysis was the key for an improvement, because the content of the file was really huge (several MB), despite being only 9 rows:

(first row discarded, was base64 encoded binary→the ways!)

France métropolitaine,1.28334e+12
Deutschland,9.76873e+11
Rheinland-Pfalz,4.7835e+10
Rhein-Hunsrück-Kreis,2.33281e+09
UNESCO Weltnaturerbe Oberes Mittelrheintal,1.85669e+09
Emmelshausen,3.27037e+08
Emmelshausen,1.97694e+07
Leiningen (Hunsrück),1.39276e+07
Schwall (Rhein-Hunsrück),3.94131e+06

what has the silly stylesheet done? It tried to render text labels on areas - that's okay, but we do NOT need all text labels for whole countries like France, Germany etc on a z18 Tile! The labels will be off the rendered tile in 99.99% anyway ;-); The reasonable solution would be to alter the query to only render labels on areas with a maximum size a little bigger than the tile size itself and add "and way_area <= 320000" somewhere in the query. For fast access an index is needed, example see in the next section.

Indices

Indizes are the key for a fast serving DB!

Downside: A lot of indices slow down updating the DB and increase DB size.

In our special spatial case, every index MUST be on "USING gist (way)", only then it will be effective because it can be searched by coordinates!

How to check if an index is used:

gis=# SELECT relname, indexrelname, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_all_indexes WHERE
   schemaname = 'public';
     relname       |            indexrelname             | idx_scan | idx_tup_read | idx_tup_fetch 
-------------------+-------------------------------------+----------+--------------+---------------
planet_osm_polygon | water_areas_idx                     |   372544 |    121639676 |      10505232
planet_osm_polygon | planet_osm_polygon_nobuilding_index |   124851 |      2942357 |       2942357
planet_osm_polygon | planet_osm_polygon_pkey             |        0 |            0 |             0
planet_osm_polygon | planet_osm_polygon_index            |  2198219 |    232523423 |     232523423

If any of the values goes up after a query, the index is being used.

the last two columns are interesting : idx_tup_read and idx_tup_fetch. Especially idx_tup_fetch counts how many rows are returned directly from the index without scanning the table itselt → try to get up this value!

you can reset the stat counter with the following command:

gis=# SELECT pg_stat_reset();

the usefulness of each index depends on your stylesheet!

The following ones are developed for a only slightly modified osm-carte style.

index on id:landuse query

index on the big OR'ed WHERE clause: from 73.722ms down to 3.6648ms ☺ (average)

CREATE INDEX idx_poly_idlanduse ON  planet_osm_polygon USING gist (way)
       WHERE((landuse IS NOT NULL) 
               OR (leisure IS NOT NULL) 
               OR (aeroway = ANY ('{apron,aerodrome}'::text[])) 
               OR (amenity = ANY ('{parking,university,college,school,hospital,kindergarten,grave_yard}'::text[])) 
               OR (military = ANY ('{barracks,danger_area}'::text[])) 
               OR ("natural" = ANY ('{field,beach,desert,heath,mud,grassland,wood,sand,scrub}'::text[])) 
               OR (power = ANY ('{station,sub_station,generator}'::text[])) 
               OR (tourism = ANY ('{attraction,camp_site,caravan_site,picnic_site,zoo}'::text[])) 
               OR (highway = ANY ('{services,rest_area}'::text[])))
       ;

building=no

this one can be used on many queries: -5 % render-time

    CREATE INDEX "planet_osm_polygon_nobuilding_index" 
       ON "planet_osm_polygon"
         USING gist ("way")
           WHERE "building" IS NULL; 

source: https://github.com/gravitystorm/openstreetmap-carto/issues/207

ferry lines

decreases render-time by 2%

CREATE INDEX ferry_idx ON planet_osm_line USING gist (way)
       WHERE (route = 'ferry'::text);

https://github.com/gravitystorm/openstreetmap-carto/issues/207

single indizes

The "sledgehammer approach": Index everything simple:

for all columns and table do (i've used a shellscript to generate all the index statements):

CREATE INDEX "idx_poly_aeroway" on planet_osm_polygon  USING gist (way) WHERE "aeroway" IS NOT NULL ;

After having added ~50 simple indices on planet_osm_line,planet_osm_polygon,planet_osm_point, render some stuff, check the used ones and remove the others.: -8% render-time

The following ones remained:

CREATE INDEX "idx_poly_aeroway" on planet_osm_polygon  USING gist (way) WHERE "aeroway" IS NOT NULL ;
CREATE INDEX "idx_poly_historic" on planet_osm_polygon  USING gist (way) WHERE "historic" IS NOT NULL ; 
CREATE INDEX "idx_poly_leisure" on planet_osm_polygon  USING gist (way) WHERE "leisure" IS NOT NULL ; 
CREATE INDEX "idx_poly_man_made" on planet_osm_polygon  USING gist (way) WHERE "man_made" IS NOT NULL ; 
CREATE INDEX "idx_poly_military" on planet_osm_polygon  USING gist (way) WHERE "military" IS NOT NULL ; 
CREATE INDEX "idx_poly_power" on planet_osm_polygon  USING gist (way) WHERE "power" IS NOT NULL ; 
CREATE INDEX "idx_poly_landuse" on planet_osm_polygon  USING gist (way) WHERE "landuse" IS NOT NULL ; 
CREATE INDEX "idx_poly_amenity" on planet_osm_polygon  USING gist (way) WHERE "amenity" IS NOT NULL ; 
CREATE INDEX "idx_poly_natural" on planet_osm_polygon  USING gist (way) WHERE "natural" IS NOT NULL ; 
CREATE INDEX "idx_poly_highway" on planet_osm_polygon  USING gist (way) WHERE "highway" IS NOT NULL ; 
CREATE INDEX "idx_poly_tourism" on planet_osm_polygon  USING gist (way) WHERE "tourism" IS NOT NULL ; 
CREATE INDEX "idx_poly_building" on planet_osm_polygon  USING gist (way) WHERE "building" IS NOT NULL ; 
CREATE INDEX "idx_poly_barrier" on planet_osm_polygon  USING gist (way) WHERE "barrier" IS NOT NULL ; 
CREATE INDEX "idx_poly_railway" on planet_osm_polygon  USING gist (way) WHERE "railway" IS NOT NULL ; 
CREATE INDEX "idx_poly_aerialway" on planet_osm_polygon  USING gist (way) WHERE "aerialway" IS NOT NULL ; 
CREATE INDEX "idx_poly_power_source" on planet_osm_polygon  USING gist (way) WHERE "power_source" IS NOT NULL ; 
CREATE INDEX "idx_poly_generator:source" on planet_osm_polygon  USING gist (way) WHERE "generator:source" IS NOT NULL ;
CREATE INDEX "idx_line_aerialway" on planet_osm_line  USING gist (way) WHERE "aerialway" IS NOT NULL ; 
CREATE INDEX "idx_line_waterway" on planet_osm_line  USING gist (way) WHERE "waterway" IS NOT NULL ; 
CREATE INDEX "idx_line_bridge" on planet_osm_line  USING gist (way) WHERE "bridge" IS NOT NULL ; 
CREATE INDEX "idx_line_tunnel" on planet_osm_line  USING gist (way) WHERE "tunnel" IS NOT NULL ; 
CREATE INDEX "idx_line_access" on planet_osm_line  USING gist (way) WHERE "access" IS NOT NULL ; 
CREATE INDEX "idx_line_railway" on planet_osm_line  USING gist (way) WHERE "railway" IS NOT NULL ; 
CREATE INDEX "idx_line_power" on planet_osm_line  USING gist (way) WHERE "power" IS NOT NULL ; 
CREATE INDEX "idx_line_name" on planet_osm_line  USING gist (way) WHERE "name" IS NOT NULL ; 
CREATE INDEX "idx_line_ref" on planet_osm_line  USING gist (way) WHERE "ref" IS NOT NULL ;
CREATE INDEX "idx_point_aerialway" on planet_osm_point  USING gist (way) WHERE "aerialway" IS NOT NULL ;  
CREATE INDEX "idx_point_power_source" on planet_osm_point  USING gist (way) WHERE "power_source" IS NOT NULL ; 
CREATE INDEX "idx_point_shop" on planet_osm_point  USING gist (way) WHERE "shop" IS NOT NULL ;  
CREATE INDEX "idx_point_place" on planet_osm_point  USING gist (way) WHERE "place" IS NOT NULL ;  
CREATE INDEX "idx_point_barrier" on planet_osm_point  USING gist (way) WHERE "barrier" IS NOT NULL ; 
CREATE INDEX "idx_point_railway" on planet_osm_point  USING gist (way) WHERE "railway" IS NOT NULL ; 
CREATE INDEX "idx_point_amenity" on planet_osm_point  USING gist (way) WHERE "amenity" IS NOT NULL ; 
CREATE INDEX "idx_point_natural" on planet_osm_point  USING gist (way) WHERE "natural" IS NOT NULL ; 
CREATE INDEX "idx_point_highway" on planet_osm_point  USING gist (way) WHERE "highway" IS NOT NULL ; 
CREATE INDEX "idx_point_tourism" on planet_osm_point  USING gist (way) WHERE "tourism" IS NOT NULL ; 
CREATE INDEX "idx_point_power" on planet_osm_point  USING gist (way) WHERE "power" IS NOT NULL ;  
CREATE INDEX "idx_point_aeroway" on planet_osm_point  USING gist (way) WHERE "aeroway" IS NOT NULL ; 
CREATE INDEX "idx_point_historic" on planet_osm_point  USING gist (way) WHERE "historic" IS NOT NULL ; 
CREATE INDEX "idx_point_leisure" on planet_osm_point  USING gist (way) WHERE "leisure" IS NOT NULL ; 
CREATE INDEX "idx_point_man_made" on planet_osm_point  USING gist (way) WHERE "man_made" IS NOT NULL ;  
CREATE INDEX "idx_point_waterway" on planet_osm_point  USING gist (way) WHERE "waterway" IS NOT NULL ; 
CREATE INDEX "idx_point_generator:source" on planet_osm_point  USING gist (way) WHERE "generator:source" IS NOT NULL ;
CREATE INDEX "idx_point_capital" on planet_osm_point  USING gist (way) WHERE "capital" IS NOT NULL ;  
CREATE INDEX "idx_point_lock" on planet_osm_point  USING gist (way) WHERE "lock" IS NOT NULL ;  
CREATE INDEX "idx_point_landuse" on planet_osm_point  USING gist (way) WHERE "landuse" IS NOT NULL ; 
CREATE INDEX "idx_point_military" on planet_osm_point  USING gist (way) WHERE "military" IS NOT NULL ;

area-text

The famous example from above where optimizing really excels: down from ~497ms to 0.68 ms ☺

The query has been altered to include way_area <= 320000 (i've done it globally since i only need zoomlevels > 15 atm)

the following index must be used for this to work:

CREATE INDEX idx_poly_wayarea_text ON  planet_osm_polygon USING gist (way)
     WHERE name IS NOT NULL
     AND place IS NULL
     AND way_area <= 320000;

text-poly

Adding the following index for id: text-poly brought it down from 61.211ms down to 16.704ms per render-request.

This is an example where reducing a query for 5ms actually improves performance noticeably, because it runs ten times while rendering a single request.

CREATE INDEX idx_poly_text_poly ON planet_osm_polygon USING gist (way)
       where amenity is not null
                 or shop in ('supermarket','bakery','clothes','fashion','convenience','doityourself','hairdresser','department_store', 'butcher','car','car_repair','bicycle')
                 or leisure is not null
                 or landuse is not null
                 or tourism is not null
                 or "natural" is not null
                 or man_made in ('lighthouse','windmill')
                 or place='island'
                 or military='danger_area'
                 or historic in ('memorial','archaeological_site')
                 or highway='bus_stop';

cutline

there are very few tag:man_made=cutline, so an index can improve results dramatically (for a small test from 37.601ms to 0.12007ms)

CREATE INDEX "idx_line_cutline" on planet_osm_line  USING gist (way)
       where man_made='cutline';


buildings-lz

reduced average query speed for one tile from 47.240ms down to 0.49313ms with this index:

CREATE INDEX idx_poly_buildings_lz ON planet_osm_polygon USING gist (way)
       where railway='station'
         or building in ('station','supermarket')
         or amenity='place_of_worship';

buildings

this index takes longer to build, but improves query time from 18.532ms to 1.0745ms

CREATE INDEX idx_poly_buildings ON planet_osm_polygon USING gist (way)
       where (building is not null
                and building not in ('no','station','supermarket','planned')
                and (railway is null or railway != 'station')
                and (amenity is null or amenity != 'place_of_worship'))
                 or aeroway = 'terminal';

other

some other examples i found on the Internet (used, but not speed-tested yet):

CREATE INDEX water_areas_idx ON planet_osm_polygon USING gist (way)
       WHERE (((waterway IS NOT NULL)
       OR (landuse = ANY (ARRAY['reservoir'::text, 'water'::text, 'basin'::text])))
       OR ("natural" IS NOT NULL));

Appendix

If you find this page useful, please link it on appropriate pages in this Wiki or elsewhere.

If you have other hints for speed-improving, expand this article it or provide links here:

Links: