OpenStreetBrowser/Database Structure

From OpenStreetMap Wiki
Jump to navigation Jump to search

The Database Structure of OpenStreetBrowser is based on a pgsql data import with Osmosis. Therefore you get the following tables:

name columns (type) Content
nodes id (bigint), version (int), user_id (int), tstamp (timestamp), changeset_id (bigint), geom (geometry, srid 4326) Holds newest version of all nodes with their geometry in Lat/Long
node_tags node_id (bigint), k (text), v (text) Holds all tags of all nodes

This structure is not very efficient, as it would be better to have a "flat" database structure, where you have a table with one line containing all information of one object (say node, line, polygon ...). That's why osm2pgsql imports a database structure like this:

name columns (type) Content
planet_osm_point osm_id (bigint), amenity (text), highway (text), cycleway (text), ...., way (geometry, srid 900913) A table having one column for each tag which was specified before where at least one tag is not null

The problem with this database layout is, that you have to know which tag-keys you will need before importing the data. Also there's a lot of information missing like the user_id and changeset_id. Good thing: One line per each object.

So the preprocessing of OpenStreetBrowser generates tables with the following layout:

name columns (type) Content
osm_point osm_id (text), osm_tags (hstore), osm_way (geom, srid 900913) Holds every node, which has at least one tag<super>1</super>
osm_line osm_id (text), osm_tags (hstore), osm_way (geom, srid 900913) Holds every way in the database
osm_polygon osm_id (text), osm_tags (hstore), osm_way (geom, srid 900913) Holds every closed way (start node equals end node) but is not member of a multipolygon relation as a simple polygon and an id similar to "way_1234" AND every multipolygon relation with an id similar to "rel_4321".
osm_rel osm_id (text), osm_tags (hstore), osm_way (geom, srid 900913) Holds every relation with the geometry of all relation members (nodes and ways, excluding relations) merged together.

There might be additional columns, e.g. to speed up queries, but these are the "main" columns. Here's some information about the columns:

name Content
osm_id An ID based on the original ID, e.g. "node_1234", "way_1234" or "rel_1234". If several objects get merged together by functions they will be collected, seperated by semi colon, e.g.: "node_1234;node_2345;way_123".
osm_tags A hstore object containing all tags.
osm_way A geometry in 900913 SRID.

As index I mainly use a mulitcolumn (gist) index, of osm_way and osm_tags. Accessing large areas (like all place=city in an area the size of Europe) takes quite a long time, it's better to use only the index on osm_tags and a sequential scan on the location. Accessing an area which was not accessed some time before can take a minute, I guess it has to load the index from disk.