Cropping OSM with awk

From OpenStreetMap

Jump to: navigation, search
Image:No05.png Software described on this page or in this section is unlikely to be compatible with API version 0.5 deployed on 8 October, 2007. If you have fixed the software, or concluded that this notice does not apply, remove it.

Contents

Objective

You want to extract a region of interest from planet.osm while keeping the osm format. See code for usage and define boundingbox.

Tools

You need gawk to run this script. Cause is the asort() function.

The script

# crop_osm.awk
# crop Openstreetmap to bounding box
# usage:
# cat planet.osm | awk -f crop_osm.awk > crop.osm
# Authors: Alexander Dusleag, Florian Kindl

BEGIN{
  FS="\"";
  # <?xml ...>, <osm ...>
  system("head -n 2 planet.osm");
  mode="undef";
  b=10000;
}

# parts of one node, segment or way are collected in array buff
# flushBuff() prints whole buff and resets the array
function resetBuff()
{
  delete buff;
  b=10000;
  mode="undef";
}
function flushBuff()
{
  # sort buffer
  j = 1
  for (i in buff)
  {
    ind[j] = i    # index value becomes element value
    j++
  }
  n = asort(ind)    # index values are now sorted
  for (i = 1; i <= n; i++)
  {
    print buff[ind[i]]
  }
  delete ind;
  resetBuff();
}

{
  # if mode is undef, find out if there starts a node, seg or way in this line
  if (mode == "undef")
  {
    # keep nodes within bbox and remember their id
    # DEFINE BOUNDING BOX HERE
    if (($1 ==  "  <node id=") && ($4 >= 42) && ($4 <= 50) && ($6 >= 4) && ($6 <= 18))
    {
      mode="node";
      nodes[$2]=1;
      buff[b++]=$0;
      if ($9 == "/>")
      {
        flushBuff();
      }
    }
    # keep segments completely within bbox and remember their id
    else if (($1 == "  <segment id=") && ($4 in nodes) && ($6 in nodes))
    {
      mode="seg";
      segs[$2]=1;
      buff[b++]=$0;
      if ($9 == "/>")
      {
        flushBuff();
        flushBuff();
      }
    }
    # keep ways if 1 or more segments within bbox
    else if ($1 == "  <way id=")
    {
      mode="way";
      waysegs=0;
      buff[b++]=$0;
    }
    else
    {
       #print;
    }
  }

  # if mode is not undef continue here
  # node, seg: write lines to buff and 
  #            print it after closing tag
  else if (mode == "node")
  {
    buff[b++]=$0;
    if ($1 == "  </node>")
    {
      flushBuff();
    }
  }
  else if (mode == "seg")
  {
    buff[b++]=$0;
    if ($1 == "  </segment>")
    {
      flushBuff();
    }
  }
  # way: count segments within bbox and
  #      print only if 1 or more found
  else if (mode == "way")
  {
    if ($1 == "    <seg id=")
    {
      if ($2 in segs)
      {
        buff[b++]=$0;
        waysegs++;
      }
    }
    else if ($1 == "  </way>")
    {
       if (waysegs > 0)
       {
         buff[b++]=$0;
          flushBuff();
       }
       else
       {
         resetBuff();
       }
    }
    else
    {
      buff[b++]=$0;
    }
  }
  else
  {
    print "This will never be printed."
  }
  # tell where we are in planet.osm
  if ( NR%10000 == 0) {
    printf "\r%s", NR > "/dev/stderr";
  }
}
END{
  # </osm>
  system("tail -n 1 planet.osm")
}

Authors

Alexander Dusleag and Florian Kindl, Tirol Atlas (University of Innsbruck)

for any questions write to:

  • a . dusleag (a) uibk . ac . at
  • florian . kindl (a) uibk . ac . at
Personal tools
recent changes