OSMCouch

From OpenStreetMap Wiki
Jump to navigation Jump to search
Logo.png
This page describes a historic artifact in the history of OpenStreetMap. It does not reflect the current situation, but instead documents the historical concepts, issues, or ideas.
About
OSMCouch was a tool to import OpenStreetMap data into a CouchDB database.
Impact on OpenStreetMap
Using OSM data in a CouchDB database was a niche use case and did not get lots of attention.
Reason for being historic
OSMCouch has not been further developed since 2013. Consider using PostgreSQL with the PostGIS extension to store OSM data in a database and use Osm2pgsql or Imposm to load data. As of 2020, these two tools have been one of the most performant ones for a couple of years.
Captured time
February 2020


OSMCouch logo

OSMCouch is storing Planet.osm data in CouchDB with GeoCouch extension. GeoJSON data can be retrieved for any use case or viewed with OpenLayers, extended with POI tools software.

This documentation is outdated (but still works). The latest OSMCouch version depends on Imposm diff support, which is not implemented yet. Documentation will be updated after that is done.


Features

  • RESTful GeoJSON API
  • filtering by tag and/or geometry
  • OpenLayers visualisation
  • osmChange support

Next priorities:

  • Replication, filtered by geometry. (Example: replicate the data of your city to your own database.)
  • KML output
  • Compatibility to Kothic JS tiles format

Querying GeoCouch

GeoCouch has two different kinds on indices: B-Tree _view for key-value queries and R-Tree _spatial for spatial queries. You need to decide which filter is faster for your queries: If there are only a few Features in the whole world, for example "unesco_world_heritage=yes", then you would prefer a _view query. If there are many Features, for example "building=yes" or "amenity=*", you should use a _spatial query. Formatting is performed with _list, spatial queries are formatted with _spatial/_list.

polygon and radius queries

GeoCouch currently supports bbox queries. Radius queries will be supported in a future version. Polygon queries are supported by a development version of GeoCouch.

Importing data

There is no need to import data yourself. You can replicate data from the main server (and update by replicating changes).

In the ideal case there is only one Planet server that initially imported data and is updated with minutely changes. All other servers can obtain their data via replication from the first Planet server or other servers that replicated from the first. If the same data and changes are shared via replication like this, smaller extracts or full replications and minutely changes can be obtained from any of those Planet servers.

If multiple servers perform imports, revision ids might be different for the same underlying data, leading to conflicts when replicating from multiple import sources. Please keep that in mind when you are importing data instead of replicating. Replication is easier, too.

Filtered replication

Filtered replication is easy for the first replication to an empty target database. Updating your filtered database is a bit more complicated because of the following reasons:

  • Deleted documents deleted objects do not contain tag or geometry information. For filtered replication including deleted documents, all deleted objects are included without filtering. The target database should take care of not storing deleted objects it did not store before deletion.
  • Documents that were included in previous filtered replications might have changed and are not included anymore. The target database still contains the old revision. You need to find out it changed (more about that later) and either keep the changed revision or remove it from the target database. Deleting it is not a good idea, as it creates a new revision, which might conflict with future versions that might be included in future replications. You can _purge the document, but that involves the risk of view rebuilding, which might take a long time.

Updating a filtered target database

A possible solution to the problems listed above might be to update the target database by replicating unfiltered changes and filtering in validate_doc_update(). If there is a previous revision, just write the new revision. If there is no previous revision, apply your filter and throw a forbidden error. The document will not be replicated.

You still might want to get rid of the changed documents that passed the filter in previous revisions and would not pass it in the current revision. Purging them could work like this (untested): Create a view collecting those documents. Get the ids from this view. Purge the documents one by one. After each purge, you might want to query all views that should not be rebuilt because of purging more than one document. On the other hand you could just keep the unwanted documents. They will be updated if replication is only filtered for documents without previous revision.

Data import without replication

As mentioned above this should only be neccessary on a single server. Other servers can replicate data.

OSM data is converted to JSON using the Osmium framework and bulk-uploaded afterwards.

osmjs -2 -l array -i area.js -j osm2json.js planet.osm.pbf

Please consult the Osmium documentation for options and requirements.

# do not try this with large datasets
curl -X POST -H "Content-Type: application/json" -d @bulk.json http://127.0.0.1:5984/$DB/_bulk_docs

This is not a good idea for too many documents as the whole bulk is kept in memory during the transmission. You might want to split the bulk in smaller chunks (10000 documents for example). There is a mass upload script that will perform this task for you. The command looks like this:

./chunkybulks.py bulk.json http://127.0.0.1:5984/$DB

Changes

If you are using an extract replicated from a Planet server, replicate changes from the same server.

This is what the source server will do: Diff updates are parsed with imposm.parser, geometries will be constructed using imposm/shapely. The update script is available in the OSMCouch imposm branch.

Installation

CouchDB and GeoCouch

The most simple way is to download and install Couchbase Single Server Community Edition. This is no longer true because Couchbase Single Server is no longer based on CouchDB

Otherwise you need to install from source, because GeoCouch compilation requires the CouchDB sources. GeoCouch packages have not made it into distributions yet.

CouchDB trunk

CouchDB trunk supports compression, which might be necessary for your amount of OSM data.

This is how to install CouchDB trunk and GeoCouch in Debian Squeeze:

# WARNING: This might be incomplete. Read couchdb/INSTALL.Unix and geocouch/README.md

# install dependencies
sudo apt-get install build-essential erlang libicu-dev libmozjs-dev libcurl4-openssl-dev \
 libtool automake checkinstall


# build and install couchdb trunk
git clone git://git.apache.org/couchdb.git
cd couchdb/
./bootstrap && ./configure
sudo checkinstall
cd ..


# create couchdb user and directories
sudo adduser --system --home /usr/local/var/lib/couchdb --no-create-home --shell /bin/bash --group --gecos "CouchDB Administrator" couchdb

sudo mkdir -p /usr/local/var/lib/couchdb
sudo mkdir -p /usr/local/var/log/couchdb
sudo mkdir -p /usr/local/var/run/couchdb

sudo chown -R couchdb:couchdb /usr/local/etc/couchdb
sudo chown -R couchdb:couchdb /usr/local/var/lib/couchdb
sudo chown -R couchdb:couchdb /usr/local/var/log/couchdb
sudo chown -R couchdb:couchdb /usr/local/var/run/couchdb

sudo chmod 0770 /usr/local/etc/couchdb
sudo chmod 0770 /usr/local/var/lib/couchdb
sudo chmod 0770 /usr/local/var/log/couchdb
sudo chmod 0770 /usr/local/var/run/couchdb


# build geocouch
git clone git://github.com/vmx/geocouch.git
cd geocouch/
git checkout 

export COUCH_SRC=$HOME/couchdb/src/couchdb
make


# enable GeoCouch
sudo mkdir -p /usr/local/etc/couchdb/local.d
sudo cp $HOME/geocouch/etc/couchdb/local.d/geocouch.ini /usr/local/etc/couchdb/local.d/
sudo chown -R couchdb:couchdb /usr/local/etc/couchdb/local.d


# ready to run
export ERL_FLAGS="-pa $HOME/geocouch/build"
sudo /usr/local/etc/init.d/couchdb start

Development

This project is about writing CouchDB design documents and views as well as documentation for easy usage.

Ideas

  • nginx proxy to handle high load (need thoughts on configuration)
  • pubsub change notification via _changes
  • clustering/P2P-API: Which of the clustering CouchDB implementations supports automated distribution weighed by available hardware resources?

Related projects

See also