OSMbin (file format)

From OpenStreetMap Wiki
Jump to navigation Jump to search
Logo.png
This page describes a historic artifact in the history of OpenStreetMap. It does not reflect the current situation, but instead documents the historical concepts, issues, or ideas.
About
OSMbin is a binary file format for OSM data offering spatial index for fast random access and was intended for map rendering and routing applications (not OSM raw data/editing).
Reason for being historic
Apart from Traveling Salesman (development ceased in 2011) and Osmosis (read and write support via plugin, unmaintained since 2018), no other software is known to have it implemented.
Captured time
2010


Draft for an osm-binary-format.

About

Features offered

  • fast, indexed access via object-id or geographic location without loading or uncompressing more than the object to be loaded
  • fast, incremental updates without affecting more than the updated objects (e.g. apply hourly diffs to a binary planet-file)
  • fast, indexed access of "ways of a node", "relations of a way" and "relations of a node"
  • can store all information the OSM-xml-format can except username and userid (these are usually not required for anything).
  • can be used as a native format for:
    • navigation software
    • routing software
    • moving vector-maps
    • editors (not recommed)

Usage and intended use

The OSMbin file-format is intended for the following types of clients:

  • navigators/routers
  • addresses-finders
  • realtime-rendering of graphical maps

it is not intended for

  • devices with very limited storage-capacity

it is optimized for:

  • fast, indexed data-access
  • incremental updates
  • general usage

This protocol is supported by the following clients:

Status

  • DONE The on-disk-format of version 1.0 is completely specified. It is simple enough to be understood by developers without a geodata-background.
  • DONE A reference-implementation of version 1.0 is provided in libOsm (part of Traveling Salesman).
  • DONE finding the optimal number of tag and wayRef/nodeRef -slots per record via a spreadsheet containing statistics of hamburg.
  • DONE: osmosis-tasks for reading, writing and reindexing osmbin-v1.0
  • DONE: implement an fsck-program that scans and repairs broken files/indexes.
  • DONE: add version-information to nodes, ways and relations
  • DONE: back-references between nodes, ways, relation and the relations that referencce them
  • DONE: optimized storage of long attribute-values
  • DONE: shorter storage of the element-types of relations

Status: OSMbin Version 1.0 is fully specified and a reference-implementation is fully working.

Requirements

OSMbin is an on-disk-format that supports:

  • getWaybyID(), getNodebyID(), getRelationbyID()
  • getWaysForNodeID()
  • getAttributeofNodeID(AttribName)
  • getAttributeofWayID(AttribName)
  • getRelationsofWayID()
  • getRelationsofNodeID() and most important:
  • getNodesbyBoundingBox(north,south,east,west)
  • It is uncompressed, so it can be mmapped()
  • It is a mutable format to support updating parts of the map without having to re-generate the complete map-file
  • We keep wayIDs and nodeIDs as well as all nodes that originally belonged to a way from OSM, so osm-xml-diff -files can be applied to update the map.

Version 1.0

Version 1.0 requires API v0.6.

It is the default file-format of Traveling Salesman Release 1.0 .

Version 0.9

Version 0.9 of this format is the default-format of Traveling Salesman Release 0.8 .

Notes

The format need not consist of only a single file. e.g. indexes can be in separate files and ways, nodes, relations and attributes each in their own file. This can make it easier to grow an index and make the files for way,node,relation contain only records of a fixed size. You may also separate the (possibly normalized) data required for routing from the larger data-set required for real-time map-rendering with or without duplicating information between the 2.

  • the file-format contains redundant information but also the rules required to repair a broken file in a defined manner.
  • IDs are stored as 32bit-integer and are assumed to be dense in the planet-file. The current distribution is as follows:
    • Nodes: Number of used IDs=278150661, max(ID)=311426557 = 89% of the IDs between 0 and 311426557 are in use for not yet deleted objects
    • Ways: Number of used IDs=22702734, max(ID)=28356734)
    • Relations: Number of used IDs=41545, max(ID)=50910
  • Whitespace at the end or start of tag-values may be lost.
  • The empty key and the empty value MAY be supported.

File size:

  • hamburg.osm.bz2 = 4MB
  • hamburg.osm = 42MB
  • indexed street-names in HSQLDB=21KB
  • nodex.idx = 160MB (Tree of order 8 with no balancing and fixed, implicit depth of 16+1. Each level encodes the next 4 bit of the ID)
  • nodex.obm = 63.5MB (32 chars/Tag-Value, 4 attributes/record, 4 wayRefs/record)
  • ways.idx = 33.7MB
  • ways.obm = 26.6Mb (32 chars/Tag-Value, 6 attributes/record, 8 wayRefs/record)
  • attrnames.txt = 3KB (253 tag-names, longest name has 42 characters)

File size:

  • baden-wurttemberg.osm.bz2 = 44MB
  • baden-wuerttemberg.osm = 1,6GB
  • nodex.idx = 2,5GB
  • nodex.obm = 533MB
  • ways.idx = 391MB
  • ways.obm = 451MB
  • attrnames.txt = 22KB

Reference implementation:

  • Java is limiting the size of memory-mapped files to 64MB per default. Change it via the "-XX:MaxDirectMemorySize=256M" -parameter to the JVM