OSMbin (file format)/version 1.0

From OpenStreetMap Wiki
Jump to navigation Jump to search

This is version 1.0 of the OSMbin(file format)

status

This specification is complete!

This file-format requires features introduced with API v0.6!

changes to version 0.9

The following has changes since version 0.9 of this file.format:

  • introduction of a properties-file contaning the version of the format
  • introduction of version-numbers
  • storing relation-ids in nodes and ways
  • storing long attribute-values inside the .osm -file

The format consists of multiple files contained in a single directory:

osmbin.properties

This file contains CRLF-delimited name=value -pairs.

Currently only the following names and values are defined:

  • "osmbin.version=v1.0\n" (or v0.9 for older versions)

attrnames.txt

This file consists of all attribute-keys in UTF8-encoding, delimited by "\n". The first key gets the ID Short.MIN_VALUE + 2 = -32766.

The + 2 is required because in *.obm the value

  • Short.MIN_VALUE denotes an unused attribut-slot
  • Short.MIN_VALUE + 1 denotes the continuation of an attribut-entry spanning multiple attribute-slots

Java-code of a reference-implementation

nodes.obm

This file stores fixed-size records of nodes.

Java-code of a reference-implementation

Layout of a record:

  • nodeID [4 byte signed integer]
  • nodeVersion [4 byte signed integer]
  • latitude [4 byte signed integer used in OSM]
  • longitude [4 byte signed integer used in OSM]
  • attrID1 [2 byte short integer]
  • attrValue1 [32 character string in 16-bit Unicode big-endian (64 bytes), padded to 32 characters with appended spaces ' ']
  • wayID1 [4 byte signed integer]
  • wayID2 [4 byte signed integer]
  • wayID3 [4 byte signed integer]
  • relationID1 [4 byte signed integer]

Semantics of nodeID:

  • the nodeID Integer.MIN_VALUE (0x80000000) is used to denote an unused record
  • this nodeID is authorative. If there is disagreement with the .idx or .id2 -files, then the .idx or .id2 -files are incorrect and need to be re-generated.

Semantics of attrID1:

  • if attrID1 is Short.MIN_VALUE then the record stores no attribute.
  • if attrID1 is Short.MIN_VALUE - 1 then the attribute-value is to be appended to the value of the last attribute
  • attrValue1: All values 0 (beware, char has 2 bytes here as we are talking utf16) are to be removed. They are used to pad the value to the length of the attrValue1-field.

Semantics of wayID :

  • the wayID Integer.MIN_VALUE marks an unused entry

Overall semantics:

  • the sort-order of attributes and ways is application-dependend. No assumptions are to be made. However it is advised to store important attributes like 'highway' first.
  • if the record has not enough size for all attributes or ways, the next record is used (this is marked by the next record having the same nodeID).
  • if this situation arised while updating the node and the next record is not free, the node is moved to a new location in this file.
  • if there is a discrepancy between wayIDs here and the nodeIDs in the ways-file. The ways-file is authorative and this entry must be corrected. (rule for repaiting broken files)

nodes.idx

reference implementation

This file stores an index of node-ID to record-number in nodes.obm.

We use an unbalanced tree of order 16. The record-format is as follows:

middle-node:

  • recordNumber[32bit] of (current value>>4)
  • recordNumber[32bit] of (current value>>4+1)

...

  • recordNumber[32bit] of (current value>>4+15)

leaf:

  • record-number of indexed OSM-Object 1 [32bit]
  • record-number of indexed OSM-Object 2 [32bit]

...

  • record-number of indexed OSM-Object 16 [32bit]

Notes:

  • for the empty recordNumner the value Integer.MIN_VALUE is used.
  • a record consisting of only Integer.MIN_VALUE marks the empty record
  • A record is a leaf if and only if it has the depth of 32/4+1.
  • The root-node has the recordNumber of 0 and is thus stored at the beginning of the file.

nodes.id2

reference implementation

(uses less storage-space then draft 1)

  • We use a KD-Tree as an AB-Tree here.
  • Each tree-node stores one node like in a KD-Tree;
  • Each tree-node with an even depth (root=depth 0=even) stores children with the same or lower latitude as a left child an with a larger latitude as a right child.
  • Each tree-node with an odd depth stores children with the same or lower longitude as a left child an with a larger longitude as a right child.
  • If the tree is empty, it has not even a root-node.
  • There is no separation of inner nodes vs. leaf-nodes.
  • We have fixed-size records
  • Latitude and Longitude of Long.MIN_VALUE denote an empty record.


Record-format:

  • 4 byte latitude of the center
  • 4 byte longitude of the center
  • 2 byte - recordNumber in nodex.osm stored in this tree-node
  • 2 byte - recordNumber in nodex.id2 of left child or Integer.MIN_VALUE
  • 2 byte - recordNumber in nodex.id2 of right child or Integer.MIN_VALUE

ways.obm

This file stores fixed-size records of ways.

Layout of a record:

  • wayID [4 byte signed long]
  • wayVersion [4 byte signed integer]
  • minLatitude [4 byte signed long as used in OSM]
  • minLongitude [4 byte signed long as used in OSM]
  • maxLatitude [4 byte signed long as used in OSM]
  • maxLongitude [4 byte signed long as used in OSM]
  • attrID1 [2 byte short integer]
  • attrValue1 [32 character string in 16-bit Unicode big-endian (64 bytes)]
  • attrID12[2 byte short integer]
  • attrValue2 [32 character string in 16-bit Unicode big-endian (64 bytes)]
  • attrID3 [2 byte short integer]
  • attrValue3 [32 character string in 16-bit Unicode big-endian (64 bytes)]
  • attrID4 [2 byte short integer]
  • attrValue4 [32 character string in 16-bit Unicode big-endian (64 bytes)]
  • attrID5 [2 byte short integer]
  • attrValue5 [32 character string in 16-bit Unicode big-endian (64 bytes)]
  • attrID6 [2 byte short integer]
  • attrValue6 [32 character string in 16-bit Unicode big-endian (64 bytes)]
  • nodeID1 [4 byte signed long]
  • nodeID2 [4 byte signed long]
  • nodeID3 [4 byte signed long]
  • nodeID4 [4 byte signed long]
  • nodeID5 [4 byte signed long]
  • nodeID6 [4 byte signed long]
  • nodeID7 [4 byte signed long]
  • nodeID8 [4 byte signed long]
  • relationID1 [4 byte signed integer]

Semantics of wayID:

  • the wayID Long.MIN_VALUE is used to denote an unused record
  • this wayID is authorative. If there is disagreement with the .idx or .id2 -files, then the .idx or .id2 -files are incorrect and need to be re-generated.

Semantics of attrID:

  • attrID and attrValue have the same semantic as for nodes.obm

Overall semantics:

  • the nodeID Long.MIN_VALUE marks an unused entry
  • the sort-order of attributes and ways is application-dependend. No assumptions are to be made. However it is advised to store important attributes like 'highway' first.
  • if the record has not enough size for all attributes or ways, the next record is used (this is marked by the next record having the same wayID).
  • if this situation arised while updating the node and the next record is not free, the node is moved to a new location in this file.
  • if there is a discrepancy between nodeIDs here and the wayID in the nodes-file. The this file is authorative and this entry of the node must be corrected.

ways.idx

This file stores an index of way-ID to record-number in ways.obm.

The structure is analog to nodes.idx .


relations.obm

This file stores fixed-size records of relations.

Layout of a record:

  • relationID [4 byte signed long]
  • relationVersion [4 byte signed integer]
  • minLatitude [4 byte signed long as used in OSM]
  • minLongitude [4 byte signed long as used in OSM]
  • maxLatitude [4 byte signed long as used in OSM]
  • maxLongitude [4 byte signed long as used in OSM]
  • attrID1 [2 byte short integer]
  • attrValue1 [32 character string in 16-bit Unicode big-endian (64 bytes)]
  • elementID1 [4 byte signed long]
  • elementType1 [4 byte signed long] (orginal of the v0.5-EntityType -enum)
  • roleID1 [4 byte signed long] (stored like an attribute-name in attrnames.txt)
  • elementID2 [4 byte signed long]
  • elementType2 [4 byte signed long]
  • roleID2 [4 byte signed long] (stored like an attribute-name in attrnames.txt)
  • elementID3 [4 byte signed long]
  • elementType3 [4 byte signed long]
  • roleID3 [4 byte signed long] (stored like an attribute-name in attrnames.txt)
  • elementID4 [4 byte signed long]
  • elementType4 [4 byte signed long]
  • roleID4 [4 byte signed long] (stored like an attribute-name in attrnames.txt)

notes:

  • the relationID Long.MIN_VALUE marks an unused entry
  • the elementID Long.MIN_VALUE is used to denote an unused record
  • this relationID is authorative. If there is disagreement with the .idx or .id2 -files, then the .idx -file is incorrect and need to be re-generated.
  • attrID and attrValue have the same semantic as for nodes.obm
  • the sort-order of attributes and ways is application-dependend. No assumptions are to be made. However it is advised to store important attributes like 'highway' first.
  • if the record has not enough size for all attributes or ways, the next record is used (this is marked by the next record having the same relationID ).
  • if this situation arised while updating the node and the next record is not free, the node is moved to a new location in this file.

relations.idx

This file stores an index of relation-ID to record-number in ways.obm.

The structure is analog to nodes.idx .