Osmium/Storing Node Positions

From OpenStreetMap Wiki
Jump to: navigation, search

To assemble linestring and polygon geometries from ways (and from multipolygon relations), the position of all nodes in the ways are needed. This can be done by storing all node positions when the nodes are read from the input file and using those stored positions a bit later when the ways are read from the input file to assemble the way geometries. (Typically ways are stored after nodes in OSM files.) Osmium supports several ways of storing the node positions each with its own advantages and disadvantages.

In any case node positions are stored as two 32bit integers. The longitude and latitude are multiplied by 10,000,000 to get these integers. Thats the same way the coordinates are stored in the central OSM database and by some other programs. This gives you a precision of about 1cm or better, which is good enough for OSM data. Storing the coordinates as doubles would have needed twice the amount of storage (16 bytes per position instead of 8 bytes).

For Users

Users will probably encounter the question of which storage method to use when they run the osmjs command. If you want to build way or area geometries, you have to use the -l, --location-store option.

You might have to try different settings. Which one is best in your case depends on how much main memory you have, how fast your disks are and what part of the OSM data you work on. If you use the debug option (-d) osmjs will tell you how much memory it used for storing the node positions.

Note that deleted nodes take up space, but that can't be avoided and it is not that much compared to the number of nodes. But it means you have to take into account the largest node ID, not just the number of nodes.


osmjs-Option: -l array

Positions are stored in memory in a huge array indexed by the node ID. It is very time and space efficient if you are working on the whole planet file or large portions of it, but it uses a lot of memory. Currently with over a 1.5 billion nodes, you need about 12 GBytes of memory.

Use this if you have enough RAM and you are working on the whole planet or large parts of it (such as a whole continent).


osmjs-Option: -l disk

This uses the same data layout as "Array", but the data is stored on disk. Access will be much slower than with "Array", how slow depends on how much memory you have to use as a cache by the operating system.

Use this if you don't have enough RAM but still want to work with the whole planet.


osmjs-Option: -l sparsetable

This stores the node positions in a special in-memory "sparse array" that only needs 1 bit for empty entries.

Use this for country-sized extracts or smaller.

For C++ Developers

The different strategies for storing node positions are implemented in the children of the abstract Osmium::Storage::ById

class. The storage class is given as template parameters to the handler Osmium::Handler::CoordinatesForWays

. You can choose at compile time which strategy you want or use the parent class and decide at runtime.

The storage classes all have a template parameter stating the type of data stored. Normally this is Osmium::OSM::Position

which needs 8 bytes. For some use cases (for instance when approximate coordinates are good enough) it might make sense to store something else.


Implemented in Osmium::Storage::FixedArray

Uses a fixed size array. Only use this if you can't use the Mmap store for some reason and you are sure you know what the largest node ID is.


Implemented in Osmium::Storage::MmapAnon

This memory is allocated using mmap and resized using mremap if needed. It uses the main memory to store the nodes. This is fast and flexible, but you need enough main memory for this to work.

This class is not available on MacOSX, because its kernel doesn't support mremap.


Implemented in Osmium::Storage::MmapFile

This memory is allocated using mmap using a temporary file as backing store. Normally the file is unlinked immediately after opening it, so you will not see it on disk (it still needs disk space though). The disk space will be automatically released once the application closes the file. If you want to keep the file around, set remove=false in the constructor.


Implemented in Osmium::Storage::SparseTable

Uses Google Sparsetable.