List of Database Schemas
| It has been proposed that this page or section be merged with Databases#Database Schemas. (Discuss) |
This page collects information on the various database schemas used in and around OSM. For a general overview of different data sources see Databases and data access APIs. Different schemas can be created from OSM data by various bits of software. Since each schema (and associated software) is optimised for a different set of circumstances, the choice of database schema can significantly impact the capability and efficiency of any software built on top of it. Each entry below attempts to summarise the effects of these differences and give example use-cases.
Contents |
Definitions
- Updatable
- Whether the schema supports updating with OsmChange format "diffs".
- This can be extremely important for keeping world-wide databases up-to-date, as it allows the database to be kept up-to-date without requiring a complete (and space- and time-consuming) full, worldwide re-import. However, if you only need a small extract, then re-importing that extract may be a quicker and easier method to keep up-to-date than using the OsmChange diffs.
- Geometries
- Whether the schema has pre-built geometries.
- Some database schemas provide native (e.g: PostGIS) geometries, which allows their use in other pieces of software which can read those geometry formats. Other database schemas may provide enough data to produce the geometries (e.g: nodes, ways, relations and their linkage) but not in a native format. Some can provide both. If you want to use the database with other bits of software such as a GIS editor then you probably want a schema with these geometries pre-built. However, if you are doing your own analysis, or are using software which is written to use OSM node/way/relations then you may not need the geometries.
- Lossless
- Whether the full set of OSM data is kept.
- Some schemas will retain the full set of OSM data, including versioning, user IDs, changeset information and all tags. This information is important for editors, and may be of importance to someone doing analysis. However, if it is not important then it may be better to choose a "lossy" schema, as it is likely to take up less disk space and may be quicker to import.
List
TODO: add Used by column like in Databases#Database schemas
| Schema name | Created with | Primary use case | Updatable? | Geometries? | Lossless? | Database |
|---|---|---|---|---|---|---|
| osm2pgsql | osm2pgsql | Rendering | Yes | Yes | No | PostgreSQL |
| apidb | osmosis | Mirroring | Yes | No | Yes | PostgreSQL, MySQL |
| pgsnapshot | osmosis | Analysis | Yes | Yes | No | PostgreSQL |
| imposm | Imposm | Rendering | No | Yes | No | PostgreSQL |
| nominatim | osm2pgsql | Geocoding | Yes | Yes | Yes | PostgreSQL |
| osmsharp | OsmSharp | Routing | Yes | ? | ? | Oracle |
| overpass | Overpass API | Analysis | Yes | ? | Yes | custom |
| mongosm | MongOSM | Analysis | maybe | ? | ? | MongoDB |
Details
osm2pgsql
Osm2pgsql schema has historically been the standard way to import OSM data for use in rendering software such as Mapnik. It also has uses in analysis, although the schema does not support versioning or history directly. The import is handled by the Osm2pgsql software, which has two modes of operation, slim and non-slim, which control the amount of memory used by the software during import and whether it can be updated. Slim mode supports updates, but time taken to import is highly dependent on disk speed and may take several days for the full planet, even on a fast machine. Non-slim mode is faster, but does not support updates and requires a vast amount of memory.
The import process is lossy, and controlled by a configuration file in which the keys of elements of interest are listed. The values of these "interesting" elements are imported as columns in the points, lines and polygons tables. These tables can be very large, and care must be paid to get good indexed performance. If the set of "interesting" keys changes after the import, then the import must be re-run.
For more information, please see the Osm2pgsql page.
apidb
ApiDB is a schema designed to replicate the storage of OSM data in the same manner as the main API schema and can be produced using the Osmosis commands for writing ApiDBs or updating ApiDBs with changes. This schema does not have any native geometry, although in the nodes, ways and relations tables there is enough data to reconstruct the geometries.
This schema does support history, although the import process does not (?), so can be used for mirroring of the main OSM DB.
The import process, even on good hardware, can take several days for the full planet.
For more information, please see the detailed usage page for Osmosis.
pgsnapshot
The pgsnapshot schema is a modified and simplified version of the main OSM DB schema which provides a number of useful features, including generating geometries and storing tags in a single hstore column for easier use and indexing. JXAPI's schema is built on pgsnapshot.
Although the pgsnapshot data is technically lossy, this is only with metadata and full element data (including all tags) are imported.
imposm
Imposm is an import tool, and is able to generate schemas using a mapping which is fully configurable (there is also a good default for most use-cases). As such it really shouldn't count as its own schema, but it needed fitting in somehow. The ability to break data out thematically into different tables greatly simplifies the problem of indexing performance, and may result in smaller table and index sizes on-disk.
Imposm is faster to import than Osm2pgsql in slim mode, but does not provide updatability.
nominatim
Nominatim is a geocoder where the database is produced by a special back-end of Osm2pgsql. It is a special-purpose database, and may not be suitable for other problem domains such as rendering or routing. The development overview gives information on some of the innards.
Nominatim's database is notoriously hard to set up, so you may want to try one of the pre-indexed data releases first.
overpass
The Overpass_API is a query language built on top of a custom back-end database with software called OSM3S (see OSM3S/install for install and setup instructions). This is a custom database and it is therefore hard to compare it with other database schemas. You could recreate the complete planet file from the database. It is geared to have good performance on locally concentrated datasets.
osmsharp
OsmSharp is a toolbox of OSM-related routines, including some to import OSM data into Oracle databases.
mongosm
MongOSM is a set of Python scripts for importing, querying and (maybe) keeping up-to-date OSM data in a MongoDB database.