POI tools/Importer

From OpenStreetMap Wiki
Jump to navigation Jump to search

There is a shared parser for reading .osm and .osc, writing .csv data and the SQL commands to import it into MonetDB. The parser and the SQL commands are optimized for performance.

Special cases

  • split k=v1;v2;v3 in k=v1, k=v2, k=v3
  • drop some tags: created_by, source

Import

Example import for Germany:

#!/bin/bash

wget -c ftp://ftp5.gwdg.de/pub/misc/openstreetmap/download.geofabrik.de/germany.osm.bz2 &&
bzcat germany.osm.bz2 | osmparser > import.sql &&

# Remove duplicates. Note that this should not be neccessary for Planet.osm, but might be neccessary for Geofabrik or Cloudmade downloads
for i in *.csv; do
    sort -u $i > $i-sorted
    mv $i-sorted $i
done &&

mclient -lsql -d germany < import.sql

Update

Summary

Just run update.sh to update the database. You need an existing state.txt (like Osmosis) before the first run.

The updater will download all changes since the last update, and load them into temporary tables. Only the latest version of an object is kept, constraints are checked to avoid any violations created by an update.

Update speed is almost independent from the update interval, which means you can catch up very fast after skipping some updates.

Details

This is a short description of the steps happening: First you need to download all changes since the last insertion. Usually this is done with Osmosis, but there is a shell script for updating that does everything for you without using Osmosis.

The configuration file is the same as for Osmosis, called state.txt. Before the first update you need to download a state.txt describing the state of your imported data. You should choose a few hours earlier than the data you imported was created, to make sure nothing is missing. There exists a web tool that returns the right state.txt for your time parameters:

# use the date a bit older than the first import
wget "https://replicate-sequences.osm.mazdermind.de/?Y=2019&m=03&d=25&H=25&i=00&s=00&stream=minute" -O state.txt

The script will download and concatenate all changes that occured since your import. They are then parsed by the same parser you used for importing, and inserted into the database (modify the shell script to match your database configuration).

# this is done in the update shell script
osmparser < update.osc > update.sql

# insert updates, if you run the server in read-only mode, restart it write-enabled with default_pipe before executing this
mclient < update.sql

SQL

This section explains what is going on in SQL.

The changes are parsed and written to .csv files: delete_{node,way,relation}s and insert_{node,way,relation}s.

Delete objects can be deleted immediately:

CREATE TEMPORARY TABLE delete_relations (id integer);
CREATE TEMPORARY TABLE delete_nodes (id integer);
CREATE TEMPORARY TABLE delete_ways (id integer);

COPY INTO delete_nodes FROM '/path/to/delete_nodes.csv' USING DELIMITERS ',', '\\n', '''';
COPY INTO delete_ways FROM '/path/to/delete_ways.csv' USING DELIMITERS ',', '\\n', '''';
COPY INTO delete_relations FROM '/path/to/delete_relations.csv' USING DELIMITERS ',', '\\n', '''';

DELETE FROM relation_members_relation WHERE to_relation IN (SELECT id FROM delete_relations);
DELETE FROM relation_members_way WHERE to_way IN (SELECT id FROM delete_ways);
DELETE FROM relation_members_node WHERE to_node IN (SELECT id FROM delete_nodes);
DELETE FROM way_nds WHERE to_node IN (SELECT id FROM delete_nodes);

DELETE FROM relation_members_relation WHERE relation IN (SELECT id FROM delete_relations);
DELETE FROM relation_members_way WHERE relation IN (SELECT id FROM delete_relations);
DELETE FROM relation_members_node WHERE relation IN (SELECT id FROM delete_relations);
DELETE FROM relation_tags WHERE relation IN (SELECT id FROM delete_relations);
DELETE FROM relations WHERE id IN (SELECT id FROM delete_relations);
DROP TABLE delete_relations;

DELETE FROM way_nds WHERE way IN (SELECT id FROM delete_ways);
DELETE FROM way_tags WHERE way IN (SELECT id FROM delete_ways);
DELETE FROM ways WHERE id IN (SELECT id FROM delete_ways);
DROP TABLE delete_ways;

DELETE FROM node_tags WHERE node IN (SELECT id FROM delete_nodes);
DELETE FROM nodes WHERE id IN (SELECT id FROM delete_nodes);
DROP TABLE delete_nodes;


Insert contains new ("create") and modified ("modify") objects. It will be mass-loaded to a temporary table (COPY INTO).

CREATE TEMPORARY TABLE insert_nodes (id integer, long numeric(10,7), lat numeric(10,7), uid integer, version integer, timestamp timestamptz);
CREATE TEMPORARY TABLE insert_node_tags (node integer, version integer, k varchar(255), v varchar(1024));
CREATE TEMPORARY TABLE insert_ways (id integer,uid integer, version integer, timestamp timestamptz);
CREATE TEMPORARY TABLE insert_way_tags (way integer, version integer, k varchar(255), v varchar(1024));
CREATE TEMPORARY TABLE insert_way_nds (way integer, version integer, idx integer, to_node integer);
CREATE TEMPORARY TABLE insert_relations(id integer, uid integer, version integer, timestamp timestamptz);
CREATE TEMPORARY TABLE insert_relation_members_node (relation integer, version integer, idx integer, to_node integer, role varchar(255));
CREATE TEMPORARY TABLE insert_relation_members_relation (relation integer, version integer, idx integer, to_relation integer, role varchar(255));
CREATE TEMPORARY TABLE insert_relation_members_way (relation integer, version integer, idx integer, to_way integer, role varchar(255));
CREATE TEMPORARY TABLE insert_relation_tags (relation integer, version integer, k varchar(255), v varchar(1024));
COPY INTO insert_nodes from '/path/to/nodes.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_node_tags from '/path/to/node_tags.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_ways from '/path/to/ways.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_way_tags from '/path/to/way_tags.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_way_nds from '/path/to/way_nds.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_relations from '/path/to/relations.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_relation_tags from '/path/to/relation_tags.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_relation_members_node from '/path/to/relation_member_node.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_relation_members_relation from '/path/to/relation_member_relation.csv' USING DELIMITERS ',', '\n', '''';
COPY INTO insert_relation_members_way from '/path/to/relation_member_way.csv' USING DELIMITERS ',', '\n', '''';


Before inserting data into the real table, all versions except the newest ones must be deleted from the temporary tables. If the newest version is the same as the existing version, it must be deleted from the "insert"-table or there must be an exception (EXCEPT) when inserting into the existing tables.

CREATE TEMPORARY TABLE highest_nodes (id integer, long numeric(10,7), lat numeric(10,7), uid integer, version integer, timestamp timestamptz);
INSERT INTO highest_nodes SELECT DISTINCT s.id,s.long,s.lat,s.uid,s.version,s.timestamp FROM insert_nodes AS s WHERE s.version = (SELECT MAX(highest.version) FROM insert_nodes AS highest WHERE s.id = highest.id);
DROP TABLE insert_nodes;
DELETE FROM highest_nodes WHERE highest_nodes.version <= (SELECT nodes.version FROM nodes WHERE highest_nodes.id = nodes.id);


To avoid foreign key violations while inserting the current version, references must be removed temporarily when deleting old versions before inserting new versions. The database will grow with each deleting and reinserting. Existing versions can also just be updated:

UPDATE nodes SET
 long = (SELECT long FROM highest_nodes WHERE id = nodes.id),
 lat = (SELECT lat FROM highest_nodes WHERE id = nodes.id),
 uid = (SELECT uid FROM highest_nodes WHERE id = nodes.id),
 version = (SELECT version FROM highest_nodes WHERE id = nodes.id),
 timestamp = (SELECT timestamp FROM highest_nodes WHERE id = nodes.id)
 WHERE id IN (SELECT id FROM highest_nodes);

INSERT INTO nodes SELECT * FROM highest_nodes WHERE id NOT IN (SELECT id FROM nodes WHERE id IN (SELECT id FROM highest_nodes));


Tags must be updated:

DELETE FROM node_tags WHERE node IN (SELECT id FROM highest_nodes);
INSERT INTO node_tags SELECT DISTINCT a.node, a.k, a.v FROM insert_node_tags AS a, highest_nodes AS b WHERE a.node = b.id AND a.version = b.version;


To avoid foreign key violations, all references to non-existing foreign keys must be deleted. Another solution would be to download missing references from API v0.6.

The commands below are similar to the ones for nodes, note the DELETE FROM insert_way_nds and DELETE FROM insert_relation_members_*.

CREATE TEMPORARY TABLE highest_ways (id integer, uid integer, version integer, timestamp timestamptz);
INSERT INTO highest_ways SELECT DISTINCT s.id,s.uid,s.version,s.timestamp FROM insert_ways AS s WHERE s.version = (SELECT MAX(highest.version) FROM insert_ways AS highest WHERE s.id = highest.id);
DROP TABLE insert_ways;
DELETE FROM highest_ways WHERE highest_ways.version <= (SELECT ways.version FROM ways WHERE highest_ways.id = ways.id);

UPDATE ways SET 
 uid = (SELECT uid FROM highest_ways WHERE id = ways.id),
 version = (SELECT version FROM highest_ways WHERE id = ways.id),
 timestamp = (SELECT timestamp FROM highest_ways WHERE id = ways.id)
 WHERE id IN (SELECT id FROM highest_ways);
 
INSERT INTO ways SELECT * FROM highest_ways WHERE id NOT IN (SELECT id FROM ways WHERE id IN (SELECT id FROM highest_ways));

DELETE FROM way_tags WHERE way IN (SELECT id FROM highest_ways);
INSERT INTO way_tags SELECT DISTINCT a.way, a.k, a.v FROM insert_way_tags AS a, highest_ways AS b WHERE a.way = b.id AND a.version = b.version;

DELETE FROM way_nds WHERE way IN (SELECT id FROM highest_ways);
DELETE FROM insert_way_nds WHERE to_node NOT IN (SELECT id FROM nodes WHERE id IN (SELECT DISTINCT to_node FROM insert_way_nds));
INSERT INTO way_nds SELECT a.way,a.idx,a.to_node FROM insert_way_nds AS a, highest_ways AS b WHERE a.way = b.id AND a.version = b.version;


CREATE TEMPORARY TABLE highest_relations (id integer, uid integer, version integer, timestamp timestamptz);
INSERT INTO highest_relations SELECT DISTINCT s.id,s.uid,s.version,s.timestamp FROM insert_relations AS s WHERE s.version = (SELECT MAX(highest.version) FROM insert_relations AS highest WHERE s.id = highest.id);
DROP TABLE insert_relations;
DELETE FROM highest_relations WHERE highest_relations.version <= (SELECT relations.version FROM relations WHERE highest_relations.id = relations.id);

UPDATE relations SET
 uid = (SELECT uid FROM highest_relations WHERE id = relations.id),
 version = (SELECT version FROM highest_relations WHERE id = relations.id),
 timestamp = (SELECT timestamp FROM highest_relations WHERE id = relations.id)
 WHERE id IN (SELECT id FROM highest_relations);

INSERT INTO relations SELECT * FROM highest_relations WHERE id NOT IN (SELECT id FROM relations WHERE id IN (SELECT id FROM highest_relations));

DELETE FROM relation_tags WHERE relation IN (SELECT id FROM highest_relations);
INSERT INTO relation_tags SELECT DISTINCT a.relation, a.k, a.v FROM insert_relation_tags AS a, highest_relations AS b WHERE a.relation = b.id AND a.version = b.version;

DELETE FROM relation_members_node WHERE relation IN (SELECT id FROM highest_relations);
DELETE FROM relation_members_way WHERE relation IN (SELECT id FROM highest_relations);
DELETE FROM relation_members_relation WHERE relation IN (SELECT id FROM highest_relations);
DELETE FROM insert_relation_members_node WHERE to_node NOT IN (SELECT id FROM nodes WHERE id IN (SELECT DISTINCT to_node FROM insert_relation_members_node));
DELETE FROM insert_relation_members_way WHERE to_way NOT IN (SELECT id FROM ways WHERE id IN (SELECT DISTINCT to_way FROM insert_relation_members_way));
DELETE FROM insert_relation_members_relation WHERE to_relation NOT IN (SELECT id FROM relations WHERE id IN (SELECT DISTINCT to_relation FROM insert_relation_members_relation));
INSERT INTO relation_members_node SELECT a.relation,a.idx,a.to_node,a.role FROM insert_relation_members_node AS a, highest_relations AS b WHERE a.relation = b.id AND a.version = b.version;
INSERT INTO relation_members_way SELECT a.relation,a.idx,a.to_way,a.role FROM insert_relation_members_way AS a, highest_relations AS b WHERE a.relation = b.id AND a.version = b.version;
INSERT INTO relation_members_relation SELECT a.relation,a.idx,a.to_relation,a.role FROM insert_relation_members_relation AS a, highest_relations AS b WHERE a.relation = b.id AND a.version = b.version;