OSMbin (file format)
Draft for an osm-binary-format
- OsmBin is an on-disk data-format for applications working with OpenStreetMap-data
- It is primarily used by Traveling Salesman
- OsmBin has no relation to the OSM Mobile Binary Format or OSM Binary Format.
- fast, indexed access via object-id or geographic location without loading or uncompressing more than the object to be loaded
- fast, incremental updates without affecting more than the updated objects (e.g. apply hourly diffs to a binary planet-file)
- fast, indexed access of "ways of a node", "relations of a way" and "relations of a node"
- can store all information the OSM-xml-format can except username and userid (these are usually not required for anything).
- can be used as a native format for:
- navigation software
- routing software
- moving vector-maps
- editors (not recommed)
usage and intended use
The OSMbin file-format is intended for the following types of clients:
- realtime-rendering of graphical maps
it is not intended for
- devices with very limited storage-capacity
it is optimized for:
- fast, indexed data-access
- incremental updates
- general usage
This protocol is supported by the following clients:
- DONE The on-disk-format of version 1.0 is completely specified. It is simple enough to be understood by developers without a geodata-background.
- DONE A reference-implementation of version 1.0 is provided in libOsm (part of Traveling Salesman).
- DONE finding the optimal number of tag and wayRef/nodeRef -slots per record via a spreadsheet containing statistics of hamburg.
- DONE: osmosis-tasks for reading, writing and reindexing osmbin-v1.0
- DONE: implement an fsck-program that scans and repairs broken files/indexes.
- DONE: add version-information to nodes, ways and relations
- DONE: back-references between nodes, ways, relation and the relations that referencce them
- DONE: optimized storage of long attribute-values
- DONE: shorter storage of the element-types of relations
Status: OSMbin Version 1.0 is fully specified and a reference-implementation is fully working.
OSMbin is an on-disk-format that supports:
- getWaybyID(), getNodebyID(), getRelationbyID()
- getRelationsofNodeID() and most important:
- It is uncompressed, so it can be mmapped()
- It is a mutable format to support updating parts of the map without having to re-generate the complete map-file
- We keep wayIDs and nodeIDs as well as all nodes that originally belonged to a way from OSM, so osm-xml-diff -files can be applied to update the map.
Version 1.0 requires API v0.6.
It is the default file-format of Traveling Salesman Release 1.0 .
Version 0.9 of this format is the default-format of Traveling Salesman Release 0.8 .
- specification of version 0.9
- status: done
The format need not consist of only a single file. e.g. indexes can be in separate files and ways, nodes, relations and attributes each in their own file. This can make it easier to grow an index and make the files for way,node,relation contain only records of a fixed size. You may also separate the (possibly normalized) data required for routing from the larger data-set required for real-time map-rendering with or without duplicating information between the 2.
- the file-format contains redundant information but also the rules required to repair a broken file in a defined manner.
- IDs are stored as 32bit-integer and are assumed to be dense in the planet-file. The current distribution is as follows:
- Nodes: Number of used IDs=278150661, max(ID)=311426557 = 89% of the IDs between 0 and 311426557 are in use for not yet deleted objects
- Ways: Number of used IDs=22702734, max(ID)=28356734)
- Relations: Number of used IDs=41545, max(ID)=50910
- Whitespace at the end or start of tag-values may be lost.
- The empty key and the empty value MAY be supported.
- hamburg.osm.bz2 = 4MB
- hamburg.osm = 42MB
- indexed street-names in HSQLDB=21KB
- nodex.idx = 160MB (Tree of order 8 with no balancing and fixed, implicit depth of 16+1. Each level encodes the next 4 bit of the ID)
- nodex.obm = 63.5MB (32 chars/Tag-Value, 4 attributes/record, 4 wayRefs/record)
- ways.idx = 33.7MB
- ways.obm = 26.6Mb (32 chars/Tag-Value, 6 attributes/record, 8 wayRefs/record)
- attrnames.txt = 3KB (253 tag-names, longest name has 42 characters)
- baden-wurttemberg.osm.bz2 = 44MB
- baden-wuerttemberg.osm = 1,6GB
- nodex.idx = 2,5GB
- nodex.obm = 533MB
- ways.idx = 391MB
- ways.obm = 451MB
- attrnames.txt = 22KB
- Java is limiting the size of memory-mapped files to 64MB per default. Change it via the "-XX:MaxDirectMemorySize=256M" -parameter to the JVM