User:Davetoo/GIS/postGIS/performance

From OpenStreetMap Wiki
Jump to: navigation, search

blah

From planet-100707.osm, we created ca-100707.osm by clipping with california.poly from cloudmade; we then performed an osmosis unfiltered --wpd. Osmosis was NOT used to create way bbox or linestring geometries; file sizes as shown:

   5.6G Jul  9 00:32 ca-100707.osm
     
   282M Jul 10 19:57 node_tags.txt
   2.4G Jul 10 19:57 nodes.txt
   1.6M Jul 10 19:57 relation_members.txt
   1.1M Jul 10 19:57 relation_tags.txt
   291K Jul 10 19:57 relations.txt
    33K Jul 10 19:57 users.txt
   576M Jul 10 19:57 way_nodes.txt
   501M Jul 10 19:57 way_tags.txt
    76M Jul 10 19:57 ways.txt

BEFORE ALL STEPS, the postgreSQL cache was cleared by restarting.


Effect of geometry constraints on large node file copy-in

First, we performed the nodes copy-in with all constraints and indexes dropped:

   CREATE TABLE public.nodes
   (
     node_id bigint NOT NULL,
     "version" integer NOT NULL,
     user_id integer NOT NULL,
     tstamp timestamp without time zone NOT NULL,
     changeset_id bigint NOT NULL,
     geom geometry
   )
   WITH (
     OIDS=TRUE
   );
   ALTER TABLE public.nodes OWNER TO pgdlc;
   time psql -U pgdlc -c "\copy nodes from 'nodes.txt'" osm_trails_ca
   real    2m54.743s
   user    0m6.065s
   sys     0m14.031s

Next, we truncate the table and add the geometry constraints by dropping and re-adding the geometry column (and clear the cache by re-starting pgsql):

   -- DROP TABLE nodes;
   CREATE TABLE nodes
   (
     node_id bigint NOT NULL,
     "version" integer NOT NULL,
     user_id integer NOT NULL,
     tstamp timestamp without time zone NOT NULL,
     changeset_id bigint NOT NULL,
     geom geometry,
     CONSTRAINT enforce_dims_geom CHECK (st_ndims(geom) = 2),
     CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) = 'POINT'::text OR geom IS NULL),
     CONSTRAINT enforce_srid_geom CHECK (st_srid(geom) = 4326)
   )
   WITH (
     OIDS=TRUE
   );
   ALTER TABLE nodes OWNER TO pgdlc;    

..and re-time the copy-in:

   time psql -U pgdlc -c "\copy nodes from 'nodes.txt'" osm_trails_ca
   real    3m11.535s
   user    0m6.397s
   sys     0m14.001s

CONCLUSION: 12 seconds is insignificant compared to the confidence gained in the data by leaving the constraints intact.

Other timings

   time psql -U pgdlc -c "ALTER TABLE nodes ADD CONSTRAINT pk_nodes PRIMARY KEY(id);" osm_trails_ca
   NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pk_nodes" for table "nodes"
   ALTER TABLE
   real    0m42.791s
   user    0m0.002s
   sys     0m0.008s


Node Tags

Headline text