Script for merging duplicate nodes in a targeted way

From OpenStreetMap Wiki
Jump to navigation Jump to search

Targeted dedupe

[ Abandoned: Careful use of Xapi and the JOSM Validator plugin can do this already. See LINZ#Fixing_duplicate_nodes_between_two_layers ]

[ Comment on Abandonment: The JOSM-Validator-based fix noted above as rendering this proposal obsolete fixes by deleting one node in group. More purposes requiring mass merger of nodes, the JOSM-based solution using Validator is not appropriate. ]

Purpose

The aim of this script will be to merge common nodes between two OSM layers or .osm files. It is not intended to be an autonomous Bot, the results should still be reviewed and uploaded by hand using an editor like JOSM or Merkaartor.

For example between two layers such as golf course area boundary and fence lines, or between road ways and gates.

The first generation of the script assumes the coordinate data comes from the same source and so the lat/lon strings match exactly with no floating point precision issues. (this is the case for the LINZ data bulk import which this is initially being written for)

In future a threshold option might be added. (if(fabs(x0-x1) < epsilon*10))

Design

Presumably it will be written in Perl, Python, or as a plugin for JOSM (Java) or Merkaartor (C++).

If using 2x input.osm to 1x output.osm it is probably best to work on .osm files newly exported from the source PostGIS database. If run as a plugin it will have to create its own live changeset.

  • Perl might have an advantage due to the library of existing osm XML read/write library code at svn.osm.org
for example: http://trac.openstreetmap.org/browser/applications/utils/sanitize/sanitize.pl
  • Python might have an advantage due to readability and number of helpers
  • Java might have an advantage due to JOSM's mature plugin infrastructure
  • C++ might have an advantage due to Merkaartor's easy to use GUI.


If one or both layers are already uploaded they should first be downloaded into individual .osm files using Xapi. For example:

BBOX="-177,-44.5,-175.5,-43.5"
BASE_URL="http://osmxapi.hypercube.telascience.org/api/0.6"

wget -O chatham_fences.osm \
 "$BASE_URL/*[LINZ:source_version=V16][man_made=fence][bbox=$BBOX]"

wget -O chatham_golf_course.osm \
 "$BASE_URL/*[LINZ:source_version=V16][leisure=golf_course][bbox=$BBOX]"
(For more on LINZ+Xapi see LINZ#Fixing_bulk_tagging_mistakes_after_upload)

In future downloading the two layers could happen automatically, the user would just have to supply the key=value pairs. But for now we'll do that manually.

Before you start you should decide which of the two layers will be the gobbler, and which will be the gobbled. Typically the smaller or points-only map will be consumed by the bigger or polyline/area map, and in turn lines might be consumed by areas. It is usually not appropriate to merge two points-only layers.

Pseudo Code

map1 = "smaller_map.osm"   # e.g. gates
map2 = "bigger_map.osm"    # e.g. roads

open map1, map2

# i and j are nodes

for i in list(map1.num_nodes) do:
   flush_tag_buffer(tags_a)
   flush_tag_buffer(tags_b)

   tags_a = load_tags(map1, i)
   # keep going even if no tags found
   
   for j in list(map2.num_nodes) do:
      if( i.x == j.x && i.y == j.y ):
         print "Duplicate nodes found: " i.node_id "," j.node_id
         tags_b = load_tags(map2, j)
         for tag in count(tags_a) do:
            # check for conflicts (two different values with the same key)
            if( ! exist(tag, tags_b) ):
               add_tag(tag, tags_b)   # from,to
         save_tags(tags_b)
         drop_node(i)
         # update map1's ways to use map2's equivalent node instead of its own custom one
         if(map1.type != "point"):
            update_ways(map1, j.node_id, i.node_id)

close map1, map2
print "Done."