GeoDesk

From OpenStreetMap Wiki
Jump to navigation Jump to search
GeoDesk
Geodesk-logo.png
Author: GeoDesk Team
License: Apache 2.0 / AGPL
Platforms: Windows, macOS, and Linux
Version: 0.1.8
Website: http://www.geodesk.com

Fast and storage-efficient spatial database engine for OSM data

Overview

GeoDesk is a spatial database engine specifically designed for OpenStreetMap data. Each database is a single self-contained file -- a Geographic Object Library (GOL) -- which is only 10% to 50% larger than its equivalent source data in OSM-PBF format. Internally, features are organized into tiles, which can be exported in compressed form. Compressed tile sets are similar in size to OSM-PBF (and often smaller). Users can download tiles for only the regions they need, and the software will automatically reassemble them into a GOL. This makes GeoDesk tiles a convenient format for distributing OSM data.

Features can be queried by type, tags, bounding box or spatial relationships (such as crosses, connectsTo, or maxMetersFrom). The query language is similar to MapCSS.

GOLs replicate the OSM data model, including full support for relations. Users can treat geographic objects as OSM elements or simple features (points, linestrings, polygons, collections). However, GOLs do not include historical elements or OSM metadata (object version, timestamp and user of last edit).

The GeoDesk toolkit consists of two parts: a Java library (licensed under Apache 2.0), which allows developers to incorporate the database engine into their own applications, and a command-line utility for creating, querying and managing GOLs.

System Requirements

64-bit system running Windows, MacOS or Linux, with Java JDK (Version 16 or later) installed. For creating GOLs from OSM-PBF data, a minimum of 8 GB RAM is required (24 GB recommended if routinely processing planet-size datasets), and an SSD is highly recommended.

Installation

Stable releases of the GeoDesk Java library are available on Maven Central. Add the following dependency to your project's POM:

<dependency>
    <groupId>com.geodesk</groupId>
    <artifactId>geodesk</artifactId>
    <version>0.1.3</version>
</dependency>

Alternatively, build the latest snapshot from source:

git clone https://github.com/clarisma/geodesk.git
cd geodesk
mvn install

To install the latest version of the GeoDesk command-line tool, follow these instructions. Please note that you will need to install a Java runtime (Version 16 or above); the toolkit does not ship with its own JRE.

To build the latest version from source, see the instructions at the GitHub repository.

Usage

A complete user's guide can be found at docs.geodesk.com.

Creating a GOL from an OSM-PBF file

To create a GOL from an OSM data file (e.g. the country extract for Germany), use this command:

gol build germany germany-latest.osm.pbf

This will create germany.gol, a process that will take about 20 minutes on a dual-core notebook (or just a few minutes on any halfway decent workstation). Please make sure to have sufficient free disk space, or the build process may fail. As a rule of thumb, you will need space equal to at least 3 times the size of the OSM-PBF if you are using a full planet file, or up to 10 times for country-level extracts, to accommodate temporary files. Therefore, if germany-latest.osm.pbf is 3.5 GB, plan on having an additional 35 GB available (The resulting GOL file will only be about 5 GB).

Exporting a tile set

To export a tile set from a GOL, use the gol save command:

gol save germany germany-tiles

This will save a compressed tile set to the germany-tiles folder. This will create about 1,000 individual tile files, averaging 3 MB each. Depending on compression method and level, a tile set will be about 30% to 50% smaller than the original GOL.

You can publish the tile set folder on a web server, and other users can then automatically download tiles into a local GOL, based on the regions required by their queries.

Downloading tiles

To download tiles from a tile set, use the gol load command. For the Switzerland example set, use the following:

gol load -n swiss https://data.geodesk.com/switzerland

This loads all tiles into swiss.gol (The option -n causes the file to be created if it doesn't already exist).

You can choose to download a subset of tiles for a specific area (supplied in the form of a polygon file) via -a=myarea.poly.

Alternatively, use option -u=<URL> with any command (such as query), and any required tiles that aren't already present locally will automatically be downloaded.

Running queries

You can perform basic queries (type, tags and/or bounding box or area) directly from the command line, and obtain the results in various formats (GeoJSON, CSV, WKT, etc.).

gol query germany na[man_made=lighhouse] -f=geojsonl

retrieves all lighthouses (represented as node or area) and outputs line-separated GeoJSON.

gol query germany w[highway=residential][oneway] -b=13.29,52.46,13.47,52.56 -f=count

counts all one-way residential streets in central Berlin.

The command-line utility only supports a subset of GeoDesk's query capabilities. For spatial joins, you will have to submit two commands. for example, to extract all pubs in Bavaria, use:

gol query germany a[boundary=administrative][admin_level=4][name:en=Bavaria] -f=poly > bavaria.poly

This creates a polygon file, which you can then use to restrict the second query (output as CSV with name and phone number as columns):

gol query germany na[tourism=hotel,guest_house] -a=bavaria.poly -f=csv -t=name,phone > hotels.csv

To visualize query results on a Leaflet-powered slippy map, use the map formatting option and open the resulting file in your browser:

gol query germany na[amenity=pub] -a=bavaria.poly -f=map -t=lon,lat,name > pubs.html

For specifics on the various options, see the documentation.

Developing applications using GeoDesk

Developers can incorporate the GeoDesk database engine into their own geospatial applications (written in Java or another JVM language). There's no need for a server process; the GeoDesk library interacts directly with each database file.

Example code

This mini-application lists all the pubs in the central area of Zurich:

import com.geodesk.feature.*;
import com.geodesk.geom.*;

public class PubsExample
{
    public static void main(String[] args)
    {
        FeatureLibrary switzerland = new FeatureLibrary("switzerland.gol");                               
        
        for(Feature pub: switzerland                         
            .select("na[amenity=pub]")                   
            .in(Box.ofWSEN(8.53,47.36,8.55,47.38)))      
        {
            System.out.println(pub.stringValue("name")); 
        }
        
        switzerland.close();                                 
    }
}

More examples can be found on Github. There's also a tutorial.

Performance

Converting an OSM-PBF file into a GOL is significantly faster than importing into a traditional SQL-based database, even on low-end hardware. A Haswell-era workstation (10 x 2.3 GHz Xeon, 32 GB RAM, NVMe SSD) converts the planet file in 30 minutes. A dual-core laptop with 8 GB and a SATA SSD manages this task in about 4 hours. Regional extracts take proportionally less time.

GOLs contain indexes to speed up the most common queries (Indexing is done automatically at creation time and can be fine-tuned by advanced users).

The following benchmarks were run on a dual-core notebook (8 GB RAM, SATA SSD). Bounding boxes were distributed randomly across the area of Germany, with a minimum node density to avoid unpopulated areas. Each batch of queries was run 10 times, the sample reflects the median total for the batch. Queries were executed in parallel.

TODO: add benchmarks here