Databases and data access APIs
This page provides an overview of the databases that could be used to store and manipulate OSM data, how to obtain data to populate the databases, and how to query them to find something useful.
It is intended as an overview for new developers who wish to write software to use OSM data, and not for end users of the information.
Sources of OSM Data
See also Downloading data for a run down of the basic options
The various sources of OSM data (either the whole world, or a small part of it) are identified below with links to other Wiki pages which provide more detail.
The most of the following methods of obtaining data return the data in the OSM XML format that can be used by other tools to populate the database. The format of the data is described in Data Primitives.
Every week a dump of the entire current OSM dataset is saved in different formats and made available as Planet.osm. Quite a few people break this file down into smaller files for different regions and make extracts available separately on mirror servers. Various tools are available to cut the Planet file up into smaller areas if required.
Differences between the live OSM data and the planet dump are also published each minute as changeset, so it is possible to maintain an up-to date copy of the OSM dataset.
Due to the massive growth of client applications the following APIs can be unavailable.|
Please check Platform Status.
The Xapi servers allow OSM data to be downloaded in XML format for a given region of the globe, filtered by tag. Xapi will return quite larger areas (city level) of the globe if requested, which makes it different to the standard OSM API described below.
The main API is the method of obtaining OSM data used by editors, as this is the only method of changing the OSM data in the live database. The API page provides a link to the specification of the protocol to be used to obtain data.
Its limitations are:
- it will only return very small areas < 0.25deg square.
- This method of obtaining data should therefore be reserved for editing applications. Use other methods for rendering, routing or other purposes.
Allows quite complex queries on larger areas.
Choice of DBMS
There are several different databases systems used by OSM users:
|PostgreSQL||Can handle large datasets. The PostGIS extension allows the use geographic extensions||Requires database server to be installed, with associated administrative overhead||Main OSM API, Mapnik renderer|
|MySQL||Can handle large datasets||Does not have geographic extensions. Requires database server to be installed, with associated administrative overhead||The main database API used MySQL until version 0.6, when it was changed to Postgresql|
|SQLite||Small, does not require a database server||May struggle with large datasets - See Mail Archive (from 2008, may not be current)||Microcosm|
|MongoDB||Native Geospatial Indexes and Queries||MongOSM, Node-Mongosm|
|Hadoop / Hive||Can handle very large datasets (known as big data). Extensions available for geospatial queries (for example ESRI GIS for Hadoop)||Requires Hadoop cluster to be installed, with associated administrative overhead||OSM2Hive|
The database schema for the main database (openstreetmap.org) can be found here: Rails port/Database schema.
OSM uses different database schemas for different applications.
- Whether the schema supports updating with OsmChange format "diffs".
- This can be extremely important for keeping world-wide databases up-to-date, as it allows the database to be kept up-to-date without requiring a complete (and space- and time-consuming) full, worldwide re-import. However, if you only need a small extract, then re-importing that extract may be a quicker and easier method to keep up-to-date than using the OsmChange diffs.
- Whether the schema has pre-built geometries.
- Some database schemas provide native (e.g: PostGIS) geometries, which allows their use in other pieces of software which can read those geometry formats. Other database schemas may provide enough data to produce the geometries (e.g: nodes, ways, relations and their linkage) but not in a native format. Some can provide both. If you want to use the database with other bits of software such as a GIS editor then you probably want a schema with these geometries pre-built. However, if you are doing your own analysis, or are using software which is written to use OSM node/way/relations then you may not need the geometries.
- Whether the full set of OSM data is kept.
- Some schemas will retain the full set of OSM data, including versioning, user IDs, changeset information and all tags. This information is important for editors, and may be of importance to someone doing analysis. However, if it is not important then it may be better to choose a "lossy" schema, as it is likely to take up less disk space and may be quicker to import.
- hstore columns
- Whether the schema uses a key-value pair datatype for tags. (This datatype is called hstore in PostgreSQL.)
- hstore is perhaps the most straightforward approach to represent OSM's freeform tagging in PostgreSQL. However, not all tools use it and other databases might not have (or need) an equivalent.
|Schema name||Created with||Used by||Primary use case||Updatable||Geometries (PostGIS)||Lossless||hstore columns||Database|
|osm2pgsql||osm2pgsql||Mapnik, Kothic JS||Rendering||yes||yes||no||optional||PostgreSQL|
|imposm||Imposm||Rendering||no||yes||no||Imposm2: no, Imposm3: yes||PostgreSQL|
|waychange||SQL||streetkeysmv||Data cache and Analysis||only a schema||optional||?||no||PostgreSQL|
Osm2pgsql schema has historically been the standard way to import OSM data for use in rendering software such as Mapnik. It also has uses in analysis, although the schema does not support versioning or history directly. The import is handled by the Osm2pgsql software, which has two modes of operation, slim and non-slim, which control the amount of memory used by the software during import and whether it can be updated. Slim mode supports updates, but time taken to import is highly dependent on disk speed and may take several days for the full planet, even on a fast machine. Non-slim mode is faster, but does not support updates and requires a vast amount of memory.
The import process is lossy, and controlled by a configuration file in which the keys of elements of interest are listed. The values of these "interesting" elements are imported as columns in the points, lines and polygons tables. (Alternatively, values of all tags can be imported into a "hstore" type column.) These tables can be very large, and care must be paid to get good indexed performance. If the set of "interesting" keys changes after the import and no hstore column has been used, then the import must be re-run.
For more information, please see the Osm2pgsql page.
ApiDB is a schema designed to replicate the storage of OSM data in the same manner as the main API schema and can be produced using the Osmosis commands for writing ApiDBs or updating ApiDBs with changes. This schema does not have any native geometry, although in the nodes, ways and relations tables there is enough data to reconstruct the geometries.
This schema does support history, although the import process does not, so it can be used for mirroring of the main OSM DB. A history will be generated as replication diffs are applied.
The import process, even on good hardware, can take several weeks for the full planet. The database will take approximately 1 TB as of April 2012.
The pgsnapshot schema is a modified and simplified version of the main OSM DB schema which provides a number of useful features, including generating geometries and storing tags in a single hstore column for easier use and indexing. JXAPI's schema is built on pgsnapshot.
Imposm is an import tool, and is able to generate schemas using a mapping which is fully configurable (there is also a good default for most use-cases). As such it really shouldn't count as its own schema, but it needed fitting in somehow. The ability to break data out thematically into different tables greatly simplifies the problem of indexing performance, and may result in smaller table and index sizes on-disk.
Nominatim is a geocoder where the database is produced by a special back-end of Osm2pgsql. It is a special-purpose database, and may not be suitable for other problem domains such as rendering or routing. The development overview gives information on some of the innards.
Nominatim's database is notoriously hard to set up, so you may want to try one of the pre-indexed data releases first. Files no longer available.
The OGR library can read OSM data (XML and PBF) and can write into various other formats, including PostgreSQL/PostGIS, SQLite/Spatialite, and MS SQL databases (though I've tried only PostGIS). The ogr2ogr utility can do the conversion without any programming necessary with a schema configuration that's reminiscent of osm2pgsql. One interesting feature is that it resolves relations into geometries: OSM multipolygons and boundaries become OGC MultiPolygon, OSM multilinestrings and routes become OGC MultiLineString, and other OSM relations become OGC GeometryCollection.
It is listed as lossy because membership info, such as nodes in ways and relation members, is not preserved. Metadata is optional. Untagged/unused nodes and ways are optional.
The Overpass_API is a query language built on top of a custom back-end database with software called OSM3S (see OSM3S/install for install and setup instructions). This is a custom database and it is therefore hard to compare it with other database schemas. You could recreate the complete planet file from the database. It is geared to have good performance on locally concentrated datasets.
OsmSharp is a toolbox of OSM-related routines, including some to import OSM data into Oracle databases.
MongOSM is a set of Python scripts for importing, querying and (maybe) keeping up-to-date OSM data in a MongoDB database.
Inspired by mongOSM, Node-MongOSM uses Mongoose to provide schemas and insert vs upsert options via a command line interface.
Goosm is an application written in go for importing OSM XML files into MongoDB. It imports in two passes, first for the nodes, and second the ways. Nodes are imported as Geo-JSON points, and ways are imported as Geo-JSON lines. Both of the collections created are indexed with geo-spatial indicies enabling fast searching.
This is an application based on goosm written in go for importing OSM PBF files into MongoDB. It is under development and to handle with care but worth a try.
This simple database schema has been created to store temporaly nodes, ways, tags, relations and the relations between nodes and ways as well as between relations and its members in a PostgreSQL database. It will be used with a PHP script to split and update ways with the official street keys in the north eastern part of Germany. The script load osm data via API into the database and create changeset files from that data together with manually created nodes that splitting of ways of into parts. Those parts matches with the official streets and its keys, categories and nummbers. The ways have no changes in geographic position, therefore no PostGIS columns are necessary. The update add tags to ways and if ways has been splitted off it change the node lists, add new nodes and ways and changes the list of members of relations. This database schema stores only the parts that we need to create the changeset. The spatial relationship between official streets and the osm ways has been calculated in a separate custum database schema with osm ways and its geometry in PostGIS columns. However its also possible to add geometry colums to this waychange schema. Feel free to uses this schema and to contact the OSM Community in Mecklenburg-Western Pomerania and streetkeysmv user
- Example of schema for processing OSM data
CREATE SCHEMA osm; -- metadata (optional) CREATE TABLE osm.users ( id integer NOT NULL, -- 32-bit minimum recommanded if you need this field name character varying NOT NULL, -- user names shold be short, may be empty for users with deleted accounts -- other public or private user information may be added here CONSTRAINT users_pkey PRIMARY KEY (id) ); CREATE TABLE osm.changesets ( id integer NOT NULL, -- 32-bit minimum but 64-bit strongly recommanded visible boolean NOT NULL, -- only needed if we can request redacted objects closed boolean NOT NULL, uid integer NOT NULL, -- 32-bit minimum recommanded if you need this field last_change timestamp NOT NULL, edit_comment character varying, -- may also be stored as a changeset's tag CONSTRAINT changesets_pkey PRIMARY KEY (id) ); -- core elements CREATE TABLE osm.nodes ( id integer NOT NULL, -- 64-bit now required for nodes version integer NOT NULL, -- 16-bit minimum recommanded if you need this field visible boolean NOT NULL, -- only needed if we can request redacted objects changeset_id integer, -- NOT NULL if id>0 longitude number NOT NULL, -- only needed if you want to compute geometries latitude number NOT NULL, -- only needed if you want to compute geometries CONSTRAINT nodes_pkey PRIMARY KEY (id, version) ); CREATE TABLE osm.ways ( id integer NOT NULL, -- 32-bit minimum but 64-bit strongly recommanded version integer NOT NULL, -- 16-bit minimum recommanded if you need this field visible boolean NOT NULL, -- only needed if we can request redacted objects changeset_id integer, -- informative metadata, NOT NULL if id>0 CONSTRAINT ways_pkey PRIMARY KEY (id, version) ); CREATE TABLE osm.relations ( id integer NOT NULL, -- 32-bit minimum but 64-bit strongly recommanded version integer NOT NULL, -- 16-bit minimum recommanded if you need this field visible boolean NOT NULL, -- only needed if we can request redacted objects changeset_id integer, -- informative metadata, NOT NULL if id>0 CONSTRAINT relations_pkey PRIMARY KEY (id, version) ); CREATE TYPE osm.entry_type AS ENUM ('node', 'way', 'relation', 'changeset'); -- children elements CREATE TABLE osm.way_nodes ( way_id integer NOT NULL, -- 32-bit minimum but 64-bit strongly recommanded way_version integer NOT NULL, -- 16-bit minimum recommanded if you need this field nodes_order integer NOT NULL, -- 16-bit at least (for up to ~2000 nodes per way) node_id integer NOT NULL, -- 64-bit now required for nodes node_version integer NOT NULL, -- 16-bit minimum recommanded if you need this field CONSTRAINT way_nodes_pkey PRIMARY KEY (way_id, way_version, nodes_order) ); CREATE TABLE osm.relation_members ( relation_id integer NOT NULL, -- 32-bit minimum but 64-bit strongly recommanded relation_version integer NOT NULL, -- 16-bit minimum recommanded if you need this field members_order integer NOT NULL, -- 16-bit minimum, but optional (order is not significant in most relations) member_type osm.entry_type NOT NULL, member_id integer NOT NULL, -- 64-bit now required for node members member_version integer NOT NULL, -- 16-bit minimum recommanded if you need this field role character varying NOT NULL, -- may be an empty string CONSTRAINT relation_members_pkey PRIMARY KEY (relation_id, relation_version, members_order) ); -- for all core elements and changesets CREATE TABLE osm.tags ( entry_type osm.entry_type NOT NULL, entry_id integer NOT NULL, -- 64-bit now required for tagging nodes entry_version integer NOT NULL, -- 16-bit minimum recommanded if you need this field k character varying NOT NULL, -- maximum key length should be lower than 256 characters v character varying NOT NULL, -- maximum value length should be lower than 256 characters CONSTRAINT tags_pkey PRIMARY KEY (entry_type, entry_id, entry_version, k) );
- this schema does not describes any foreign key contraints between tables, only the primary keys with unique constraints. As well it does not contain useful indexes you may need for performance in your derived database.
- the entry_type field in the last osm.tags table is not required if this table is split in separate tables for node_tags, way_tags, relation_tags (and changeset_tags added in OSM API v0.6).
- the date of last change and changeset fields the changesets and users tables are informational only, you don't need them to process data and eventually submit changes to the OSM database.
- the visible fields are informative, most users can't access it or will just detect objects that *are* visible. For normal processing or even for updating objects, you don't need these fields that will be implicitly true.
- the version fields are not needed for read-only accesses, they are useful only to detect and process changes, but they are required for submitting your own changes to the OSM database.
- See also why 64-bit identifiers are now required for nodes in the OSM database.
- All text fields should be UTF-8 encoded in the OSM database. Please don't submit any other encoding (some legacy data still remain that use invalid encodings, most often in comment, note, source or fixme tags created with old editors). Don't store them using named or decimal character entities (HTML, XML and JSON will automatically and transparently use escaping sequences, only where needed by their syntax). Please don't use any ASCII controls, private-use Unicode characters, or unassigned Unicode characters. Please trim leading and trailing whitespaces. Standard tag keys should also avoid using any whitespace, symbol or punctuation (except the underscore and colon), in order to make them compatible with HTML, XML, CSS, or DOM identifiers.