Databases and data access APIs

This page provides an overview of the databases that could be used to store and manipulate OSM data, how to obtain data to populate the databases, and how to query them to find something useful.

It is intended as an overview for new developers who wish to write software to use OSM data, and not for end users of the information.

Sources of OSM Data

See also Downloading data for a run down of the basic options

The various sources of OSM data (either the whole world, or a small part of it) are identified below with links to other Wiki pages which provide more detail.

The most of the following methods of obtaining data return the data in the OSM XML format that can be used by other tools to populate the database. The format of the data is described in Data Primitives.

Planet.osm/.pbf

Every week a dump of the entire current OSM dataset is saved in different formats and made available as Planet.osm. Quite a few people break this file down into smaller files for different regions and make extracts available separately on mirror servers. Various tools are available to cut the Planet file up into smaller areas if required, but are also available pre-cut e.g. from GeoFabrik (pre-selected regions like by-state) or slice.openstreetmap.us (pbf data filtered by bounding polygon; provides time-limited link to re-download the same data). Some sources omit metadata from tag-less nodes to minimize space.

Differences between the live OSM data and the planet dump are also published each minute as changeset, so it is possible to maintain an up-to date copy of the OSM dataset.

API

The main API is the method of obtaining OSM data used by editors, as this is the only method of changing the OSM data in the live database. The API page provides a link to the specification of the protocol to be used to obtain data.

Its limitations are:

it will only return very small areas < 0.25deg square.
This method of obtaining data should therefore be reserved for editing applications. Use other methods for rendering, routing or other purposes.

Overpass API

The Overpass API is a read-only API that serves up custom selected parts of the OSM map data. In contrast to the editing API described in the previous API, the Overpass API is optimized for data consumers that need a few elements within a glimpse or up to roughly 10 million elements in some minutes, both selected by search criteria like e.g. location, type of objects, tag properties, proximity, or combinations of them. It acts as a database backend for various services. It’s query language is documented at Overpass QL guide/language reference. It is highly recommended to get familiar with various features via overpass turbo, an interactive Web-based frontend.

XAPI

The Xapi service allowed OSM data to be downloaded in OSM XML format for a given region of the globe, filtered by tag. The service was replaced by Overpass, legacy XAPI applications can leverage the XAPI Compatibility Layer.

Open Planet Data

Open Planet Data hosts worldwide OSM datasets in PBF, GeoDesk GOL and GeoParquet formats, updated daily. Arbitrary regions can be downloaded into a Geo-Object Library using the gol load command.

Database Schemas

The database schema for the main API database (openstreetmap.org) can be found here: Rails port/Database schema.

OSM uses different database schemas for different applications:

Updatable: Whether the schema supports updating with OsmChange format "diffs".; This can be extremely important for keeping world-wide databases up-to-date, as it allows the database to be kept up-to-date without requiring a complete (and space- and time-consuming) full, worldwide re-import. However, if you only need a small extract, then re-importing that extract may be a quicker and easier method to keep up-to-date than using the OsmChange diffs.

Geometries: Whether the schema has pre-built geometries.; Some database schemas provide native (e.g: PostGIS) geometries, which allows their use in other pieces of software which can read those geometry formats. Other database schemas may provide enough data to produce the geometries (e.g: nodes, ways, relations and their linkage) but not in a native format. Some can provide both. If you want to use the database with other bits of software such as a GIS editor then you probably want a schema with these geometries pre-built. However, if you are doing your own analysis, or are using software which is written to use OSM node/way/relations then you may not need the geometries.

Lossless: Whether the full set of OSM data is kept.; Some schemas will retain the full set of OSM data, including versioning, user IDs, changeset information and all tags. This information is important for editors, and may be of importance to someone doing analysis. However, if it is not important then it may be better to choose a "lossy" schema, as it is likely to take up less disk space and may be quicker to import.

hstore columns: Whether the schema uses a key-value pair datatype for tags. (This datatype is called hstore in PostgreSQL.); hstore is perhaps the most straightforward approach to represent OSM's freeform tagging in PostgreSQL. However, not all tools use it and other databases might not have (or need) an equivalent.

Schema name	Created with	Used by	Primary use case	Updatable	Geometries (PostGIS)	Lossless	hstore columns	Database
osm2pgsql	osm2pgsql	Mapnik, Kothic JS	Rendering	yes	yes	no	optional	PostgreSQL
apidb	osmosis	API	Mirroring	yes	no	yes	no	PostgreSQL, MySQL
pgsnapshot	osmosis	jXAPI	Analysis	yes	optional	yes	yes	PostgreSQL
imposm	Imposm		Rendering	no	yes	no	Imposm2: no, Imposm3: yes	PostgreSQL
nominatim	osm2pgsql	Nominatim	Search, Geocoding	yes	yes	yes	?	PostgreSQL
ogr2ogr	ogr2ogr		Analysis	no	yes	no	optional	various
osmsharp	OsmSharp		Routing	yes	no	?	?	Oracle
overpass	Overpass API		Analysis	yes	yes (but not pre-built)	no (see below)	yes	custom
osmium	Osmium		Analysis	no	yes	no	yes	PostgreSQL
pgsnapshot	Openstreetmap h3		Analysis	no	yes	yes	yes	PostgreSQL, Spark

osm2pgsql

Osm2pgsql schema has historically been the standard way to import OSM data for use in rendering software such as Mapnik. It also has uses in analysis, although the schema does not support versioning or history directly. The import is handled by the Osm2pgsql software, which has two modes of operation, slim and non-slim, which control the amount of memory used by the software during import and whether it can be updated. Slim mode supports updates, but time taken to import is highly dependent on disk speed and may take several days for the full planet, even on a fast machine. Non-slim mode is faster, but does not support updates and requires a vast amount of memory.

The import process is lossy, and controlled by a configuration file in which the keys of elements of interest are listed. The values of these "interesting" elements are imported as columns in the points, lines and polygons tables. (Alternatively, values of all tags can be imported into a "hstore" type column.) These tables can be very large, and care must be paid to get good indexed performance. If the set of "interesting" keys changes after the import and no hstore column has been used, then the import must be re-run.

Starting with version 1.3.0, configuration became more flexible. A Lua script describes now the names, fields and types of database tables. For each processed OSM object, a Lua callback is called where you can describe which tables the object should be written to.

Osm2pgsql is used by Nominatim, too.

For more information, please see the Osm2pgsql website

apidb

ApiDB is a schema designed to replicate the storage of OSM data in the same manner as the main API schema and can be produced using the Osmosis commands for writing ApiDBs or updating ApiDBs with changes. This schema does not have any native geometry, although in the nodes, ways and relations tables there is enough data to reconstruct the geometries. This schema is not recommended for users who need geometries.

This schema does support history, although the import process does not, so it can be used for mirroring of the main OSM DB. A history will be generated as replication diffs are applied.

The import process, even on good hardware, can take several weeks for the full planet. The database will take approximately 1 TB as of April 2012.

For more information, please see the detailed usage page for Osmosis.

pgsnapshot

The pgsnapshot schema is a modified and simplified version of the main OSM DB schema which provides a number of useful features, including generating geometries and storing tags in a single hstore column for easier use and indexing. JXAPI's schema is built on pgsnapshot.

imposm

Imposm is an import tool, and is able to generate schemas using a mapping which is fully configurable. As such it really shouldn't count as its own schema, but it needed fitting in somehow. The ability to break data out thematically into different tables greatly simplifies the problem of indexing performance, and may result in smaller table and index sizes on-disk.

nominatim

Nominatim is a forward and reverse geocoder. The database is produced by a special back-end of Osm2pgsql. It is a special-purpose database, and may not be suitable for other problem domains such as rendering. The Nominatim homepage provides links to the detailed technical documentation, change logs, etc.

ogr2ogr

See also: OGR#ogr2ogr_schema

The OGR library can read OSM data (XML and PBF) and can write into various other formats, including PostgreSQL/PostGIS, SQLite/Spatialite, and MS SQL databases (though I've tried only PostGIS). The ogr2ogr utility can do the conversion without any programming necessary with a schema configuration that's reminiscent of osm2pgsql. One interesting feature is that it resolves relations into geometries: OSM multipolygons and boundaries become OGC MultiPolygon, OSM multilinestrings and routes become OGC MultiLineString, and other OSM relations become OGC GeometryCollection.

It is listed as lossy because membership info, such as nodes in ways and relation members, is not preserved. Metadata is optional. Untagged/unused nodes and ways are optional.

overpass

The Overpass_API is a query language built on top of a custom back-end database with software called OSM3S (see OSM3S/install for install and setup instructions). This is a custom database engine and it is therefore hard to compare it with other database schemas. You could recreate the complete planet file from the database. It is geared to have good performance on locally concentrated datasets.

Overpass does not store pre-built geometries for OSM elements but can dynamically construct geometries from its database.

Although Overpass appears lossless (and was previously listed here as lossless) Overpass does not store the complete version history of OSM elements. During database updates, Overpass drops element versions whose timestamps are inconsistent with their version numbers. OSM elements have a monotonically increasing version number and a timestamp for the version, and normally each version's timestamp is later than the previous one. Occasionally this ordering is violated. For example, version 3 can have a timestamp that is equal to or greater than version 4's timestamp.

When Overpass encounters such a pair while building its historical ("attic") store, it discards the lower-numbered version with the anomalous timestamp. The dropped version cannot be retrieved by an attic query at any timestamp. And historical data in Overpass cannot be referenced by version number. This means Overpass attic output can disagree with other OSM data views — the main OSM API, full-history planet dumps, and tools such as OSMcha — all of which treat version numbers as authoritative and present every version regardless of timestamp ordering.

The behavior also applies only within a single database import or update batch. If inconsistent versions arrive through separate replication steps, both versions are stored, and historical query results for that element and time range are not guaranteed to be correct.

osmsharp

OsmSharp is a toolbox of OSM-related routines, including some to import OSM data into Oracle databases.

osmium

The Osmium toolset can read OSM data and with osmium export can write into PostgreSQL/PostGIS.

Objects are loaded into a single osmdata table with column geom and tags.

OSHDB

The OSHDB is a high-performance data analysis framework for analysing OSM's full-history data. Data can be stored in a relational (JDBC) or distributed database (Apache Ignite).

openstreetmap_h3

openstreetmap_h3 a high performance tool for importing OSM PBF files into PostGIS databases or into Big Data ecosystem via Apache Arrow data format. This project split planet dump Geo data by H3 indexes into many partitions to simplify world wide data Geo analysis/aggregation and routing tasks.

Choice of DBMS

While OSM.org mainly uses PostgreSQL, several different databases systems used by OSM users:

Database	Benefits	Disbenefits	Used By
PostgreSQL	Can handle large datasets. The PostGIS extension allows the use geographic extensions.	Requires database server to be installed, with associated administrative overhead.	Main OSM API, Osm2pgsql (loading data and differential updates), Mapnik renderer, Nominatim geocoder, Postpass query engine
MySQL	Can handle large datasets	There is no program to import OSM data.	The main database API used MySQL until version 0.6, when it was changed to PostgreSQL.
GeoDesk	Compact storage format (<100 GB for entire OSM dataset), fast queries and imports.	Does not store metadata. Current version does not support updates.	GeoDesk OSM Toolkits, GOL Tool

Databases and data access APIs

Contents

Sources of OSM Data

Planet.osm/.pbf

API

Overpass API

XAPI

Open Planet Data

Database Schemas

osm2pgsql

apidb

pgsnapshot

imposm

nominatim

ogr2ogr

overpass

osmsharp

osmium

OSHDB

openstreetmap_h3

Choice of DBMS

Navigation menu

Databases and data access APIs

Sources of OSM Data

Planet.osm/.pbf

API

Overpass API

XAPI

Open Planet Data

Database Schemas

osm2pgsql

apidb

pgsnapshot

imposm

nominatim

ogr2ogr

overpass

osmsharp

osmium

OSHDB

openstreetmap_h3

Choice of DBMS

Navigation menu

Search