This perl script convert KSJ2 lake and pond data xml file to osm xml file. (ja)このperlスクリプトは国土数値情報(湖沼データ)のxmlファイルをosmのxmlファイルに変換します。



  • Installation of JOSM (a java program; needs J2SE SDK or JRE). (ja)JOSMの導入(javaプログラムなので実行にはJ2SE SDK或いはJREが必要。)
  • Installation of perl. You can download ActivePerl from ActiveState site (free of charge). (ja)perlの導入。ActiveStateのサイトからActivePerlをでダウンロードできる(無料)。
  • Save the code below as a file with proper name. (e.g. (ja)適切なファイル名を付けて下のソースコードを保存する。(例えばksj2osm-lake.plなど。)

How to convert the data


  • Download KSJ2 lake and pond data decompress from zip file to xml file. (ja)国土数値情報(湖沼データ)をダウンロードし、zipファイルを解凍してxmlファイルにする。
  • Put the xml file on the same directory of the script. (ja)xmlファイルをスクリプトと同じディレクトリに置く。
  • Open the perl script by editor program, edit the initial values of $target_lpn to specify lake/pond name with UTF-8 characters (you can copy and paste from list), $lake_type (0, 1 or 2) to specify lake/pond type and $water_type (0 or 1) to specify water type, and then save the script. (ja)このperlスクリプトをエディタープログラムで開き、$target_lpnの初期値にUTF-8の文字による湖沼名(このリストからコピペ可)、$lake_typeの初期値に湖沼のタイプ (0、1又は2) 、$water_typeの初期値に水のタイプ (0又は1) を書き込んでスクリプトを保存する。
    • As to "$lake_type=2" (landuse=reservoir), a list of artificial lakes in Japan (ja:wikipedia) is useful information. (ja)"$lake_type=2" (landuse=reservoir) については、ウィキペデアの日本の人造湖一覧が参考になります。
  • Open command prompt window and run the script on the directory where you put the script and xml file. Output files (ksj2osm-lake.osm, ksj2osm-lake.log) will be created on the same directory. (ja)コマンドプロンプトを開いて、スクリプトとxmlファイルを置いたディレクトリでスクリプトを実行する。出力ファイル (ksj2osm-lake.osm, ksj2osm-lake.log) は同じディレクトリに作成される。
  • Open ksj2osm-lake.osm by JOSM, check and edit the data (see notes below), and then upload to server. (ja)JOSMでksj2osm-lake.osmを開いてデータの確認や編集を行い(下の注意を参照)、それからサーバーにアップロードする。



  • KSJ2 lake and pond data doesn't have information below, so you need to input them manually into the outer closed ways. (ja)国土数値情報の湖沼データには以下の情報が無いので、外側にあるクローズドウェイについてはそれらを手入力する必要があります。
    • Lake/pond name in English; "name" tag (in round brackets) and "name:en" tag. (ja)英語の湖沼名("name"タグの丸括弧内と"name:en"タグ)。
    • Lake/pond name in romanization of Japanese; "name:ja_rm" tag. (ja)ローマ字表記の湖沼名("name:ja_rm"タグ)。
  • Direction of ways (ja)ウェイの方向
    • The outer ring of the area (area of water), the closed way should be clockwise. (ja)外側にある面データ(水域)については、クローズドウェイを時計回りにして下さい。
    • An inner ring of an area (area of land), the closed way should be anti-clockwise. (ja)内側にある面データ(陸地)については、クローズドウェイを反時計回りにして下さい。


As of 2008-04-12.
Bugfix 2010-04-08 by Tom.


use strict;
use warnings;
# use encoding "utf8"; 
use encoding "utf8", STDOUT => "shiftjis", STDERR => "shiftjis"; # for Windows
use Encode;
use open IO => "utf8";
use XML::Parser;
use XML::Simple;

# KSJ2 Lake and Pond Data
# National-Land Numerical Information (Lake and Pond) 2005, MLIT Japan
# 国土数値情報(湖沼データ)平成17年 国土交通省
# Files
#   Input
#     XML file : W09-05.xml
#   Output
#     Osm file : ksj2osm-lake.osm
#     Log file : ksj2osm-lake.log

our $file_in = "W09-05";
our $file_name = "ksj2osm-lake";

our $target_lpn = "亀山湖"; # lake/pond name (UTF-8) "亀山湖", "*", ""

our $lake_type = 2; # 0 : for natural lake (approximate depth 5 meters and over), tagged with "natural=water".
                    # 0 : 天然湖(概ねの深さが5m以上)、タグは "natural=water"。
                    # 1 : for natural pond/marsh (approximate depth under 5 meters), tagged with "natural=marsh".
                    # 1 : 天然の池・沼(概ねの深さが5m未満)、タグは "natural=marsh"。
                    # 2 : for artificial lake/pond/marsh, tagged with "landuse=reservoir".
                    # 2 : 人造湖(池・沼を含む)、タグは "landuse=reservoir"。

our $water_type = 0; # 0 : for fresh water, tagged with "water_type=fresh".
                     # 0 : 淡水、タグは "water_type=fresh"。
                     # 1 : for brackish water/brine/salt water/seawater, tagged with "water_type=salt".
                     # 1 : 汽水・塩水・鹹水(かんすい)・海水、タグは "water_type=salt"。

our $aac_code;
our %curveHash;
our %indirectHash;
our %lakeHash;
our $negative_id = 0;
our $node_ref;
our %nodes = ();
our $num_lands = 0;
our $num_nodes;
our $num_waters;
our $num_ways;
our $pref_code;
our %surfaceHash;
our @workArray;
our %workHash;
our $workString;
our $xml_aac;
our $xml_pref;

sub main() {

  my $parser = new XML::Parser(ErrorContext => 3,
                               Handlers => {Init => \&handle_init,
                                            Start => \&handle_start,
                                            Char => \&handle_char,
                                            End => \&handle_end,
                                            Final => \&handle_final});



sub handle_init() {



sub handle_start() {

  my ($expat, $element, %hash) = @_;
  if ($element eq "jps:GM_Point"
    || $element eq "jps:GM_Curve"
    || $element eq "GM_PointRef.point"
    || $element eq "jps:GM_Surface"
    || $element eq "jps:GM_CompositeCurve.generator"
    || $element eq "jps:GM_OrientableCurve"
    || $element eq "jps:GM_OrientablePrimitive.primitive"
    || $element eq "ksj:GC01"
    || $element eq "ksj:ARE") {get_id(@_);}
  $workString = "";                              # add 2009-05-26


sub handle_char () {

  my ($expat, $string) = @_;
#  $workString = $string;
#  $workString .= "";                             # mod 2009-05-26
  $workString = $string;                          # mod 2010-04-08 by Tom

sub handle_end() {

  my ($expat, $element) = @_;
  if ($element eq "DirectPosition.coordinate"
    || $element eq "ksj:LPN"
    || $element eq "ksj:AAC"
    || $element eq "ksj:LDM"
    || $element eq "ksj:HOW") {get_element(@_);}
  elsif ($element eq "jps:GM_Point") {add_indirect();}
  elsif ($element eq "jps:GM_Curve") {add_curve();}
  elsif ($element eq "jps:GM_SurfaceBoundary.exterior") {add_exterior();}
  elsif ($element eq "jps:GM_SurfaceBoundary.interior") {add_interior();}
  elsif ($element eq "jps:GM_Surface") {add_surface();}
  elsif ($element eq "ksj:GC01") {add_lake();}


sub handle_final() {

  my $time = localtime(time);
  print LOG "***** End of processing $file_in.xml : $time\n";
  print "***** End of processing $file_in.xml : $time\n";

  print LOG "*****\n";
  foreach my $item(keys %lakeHash) {
    print LOG "* 1st key: $item\n";
    foreach my $item2(keys %{$lakeHash{$item}}) {
      print LOG "** 2nd key: $item2, value: $lakeHash{$item}{$item2}\n";
      if ($item2 eq "ARE") {
        foreach my $item3(keys %{$surfaceHash{$lakeHash{$item}{$item2}}}) {
          print LOG "** 3rd key: $item3, value: $surfaceHash{$lakeHash{$item}{$item2}}{$item3}\n";
          if ($item3 eq "exterior" || $item3 eq "interior") {
            @workArray = split(/,/, $surfaceHash{$lakeHash{$item}{$item2}}{$item3});
            foreach my $item4(@workArray) {
              print LOG "** 4th key: $item4, value: $curveHash{$item4}\n";
    print LOG "*****\n";




sub open_log() {

  my $time = localtime(time);
  open(LOG, ">$file_name.log");
  print LOG "***** KSJ2 Lake and Pond Data 2005 : Start $time\n";
  print "***** KSJ2 Lake and Pond Data 2005 : Start $time\n";
  print LOG "***** Target : $target_lpn, lake type = $lake_type, water type = $water_type\n";
  print "***** Target : $target_lpn, lake type = $lake_type, water type = $water_type\n";
  print LOG "***** Extracting from $file_in.xml\n";
  print "***** Extracting from $file_in.xml\n";


sub close_log() {

  my $time = localtime(time);
  print LOG "***** Done!: End $time\n";
  close LOG;
  print "***** Done!: End $time\n";


sub get_codelist() {

  # AdminAreaCd.xml 行政コード

  $aac_code = XMLin($xml_aac, keyattr => ["code"]);

  # PrefCode.xml 都道府県コード

  $pref_code = XMLin($xml_pref, keyattr => ["code"]);


sub get_id() {

  my ($expat, $element, %hash) = @_;
  if ($element eq "GM_PointRef.point") {
    if (exists($hash{"idref"})) {
      if (exists($workHash{"points"})) {
        $workHash{"points"} .= ",";
      $workHash{"points"} .= $indirectHash{$hash{"idref"}};
    else {
      print LOG "* id not found in element GM_PointRef.point.\n";
      print "* id not found in element GM_PointRef.point.\n";
      while (my ($key, $value) = each(%hash)) {
        print LOG "key: $key , value: $value \n";
        print "key: $key , value: $value \n";
      die "id not found in element GM_PointRef.point.";
  elsif ($element eq "jps:GM_CompositeCurve.generator" ) {
    if (exists($hash{"idref"})) {
      $workHash{"CompositeCurve"} = $hash{"idref"};
    else {
      print LOG "* idref not found in element jps:GM_CompositeCurve.generator.\n";
      print "* idref not found in element jps:GM_CompositeCurve.generator.\n";
      while (my ($key, $value) = each(%hash)) {
        print LOG "key: $key , value: $value \n";
        print "key: $key , value: $value \n";
      die "idref not found in element jps:GM_CompositeCurve.generator.";
  elsif ($element eq "jps:GM_OrientableCurve" ) {
    if (exists($hash{"id"})) {
      $workHash{"id"} = $hash{"id"};
      $workHash{"element"} = $hash{"jps:GM_OrientableCurve"};
    else {
      print LOG "* id not found in element jps:GM_Curve.\n";
      print "* id not found in element jps:GM_Curve.\n";
      while (my ($key, $value) = each(%hash)) {
        print LOG "key: $key , value: $value \n";
        print "key: $key , value: $value \n";
      die "id not found in element jps:GM_Curve.";
  elsif ($element eq "jps:GM_OrientablePrimitive.primitive" ) {
    if (exists($workHash{"element"})) {
      if (exists($hash{"idref"})) {
        foreach my $item(keys %surfaceHash) {
          foreach my $item2(keys %{$surfaceHash{$item}}) {
            $surfaceHash{$item}{$item2} =~ s/$workHash{"id"}/$hash{"idref"}/g;
        %workHash = ();
      else {
        print LOG "* idref not found in element jps:GM_OrientablePrimitive.primitive.\n";
        print "* idref not found in element jps:GM_OrientablePrimitive.primitive.\n";
        while (my ($key, $value) = each(%hash)) {
          print LOG "key: $key , value: $value \n";
          print "key: $key , value: $value \n";
        die "idref not found in element jps:GM_OrientablePrimitive.primitive.";
  elsif ($element eq "ksj:ARE" ) {
    if (exists($hash{"idref"})) {
      $workHash{"ARE"} = $hash{"idref"};
    else {
      print LOG "* idref not found in element ksj:ARE.\n";
      print "* idref not found in element ksj:ARE.\n";
      while (my ($key, $value) = each(%hash)) {
        print LOG "key: $key , value: $value \n";
        print "key: $key , value: $value \n";
      die "idref not found in element ksj:ARE.";
  else {
    if (exists($hash{"id"})) {
      $workHash{"id"} = $hash{"id"};
    else {
      print LOG "* id not found in element $element.\n";
      print "* id not found in element $element.\n";
      while (my ($key, $value) = each(%hash)) {
        print LOG "key: $key , value: $value \n";
        print "key: $key , value: $value \n";
      die "id not found in element $element";

sub get_element() {

  my ($expat, $element) = @_;
  if ($element eq "DirectPosition.coordinate" ) {
    if (exists($workHash{"points"})) {
      $workHash{"points"} .= ",";
    $workHash{"points"} .= $workString;
  elsif ($element eq "ksj:LPN" ) {$workHash{"LPN"} = $workString;}
  elsif ($element eq "ksj:AAC" ) {
    if ($workString =~ /[^0-9]/) {
      print LOG "* AAC is not numeric : $workString\n";
      print "* AAC is not numeric : $workString\n";
      while (my ($key, $value) = each(%workHash)) {
        print LOG "key: $key , value: $value \n";
        print "key: $key , value: $value \n";
    else {
      $workString = sprintf("%05d", $workString);
      if (exists($workHash{"AAC"})) {
        $workHash{"AAC"} .= ",";
      $workHash{"AAC"} .= $workString;
      my $str = $aac_code->{'ksjc:C002'}->{'codelabel'}->{$workString}->{'label'};
      unless ($str) { 
        $str = substr($workString, 0, 2);
        $str = $pref_code->{'ksjc:C001'}->{'codelabel'}->{$str}->{'label'};
        $str .= "*";
      if (exists($workHash{"AAC_label"})) {
        $workHash{"AAC_label"} .= ",";
      $workHash{"AAC_label"} .= $str;
  elsif ($element eq "ksj:LDM" ) {$workHash{"LDM"} = $workString;}
  elsif ($element eq "ksj:HOW" ) {$workHash{"HOW"} = $workString;}

  $workString = "";                              # add 2009-05-26


sub add_indirect() {
  $indirectHash{$workHash{"id"}} = $workHash{"points"};
  %workHash = ();

sub add_curve() {
  $curveHash{$workHash{"id"}} = $workHash{"points"};
  %workHash = ();

sub add_exterior() {
  if (exists($workHash{"exterior"})) {
    $workHash{"exterior"} .= ",";
  $workHash{"exterior"} .= $workHash{"CompositeCurve"};
  delete $workHash{"CompositeCurve"};

sub add_interior() {
  if (exists($workHash{"interior"})) {
    $workHash{"interior"} .= ",";
  $workHash{"interior"} .= $workHash{"CompositeCurve"};
  delete $workHash{"CompositeCurve"};

sub add_surface() {

  while (my ($key, $value) = each(%workHash)) {
    $surfaceHash{$workHash{"id"}}{$key} = $value;
  %workHash = ();

sub add_lake() {

  if ($target_lpn eq $workHash{"LPN"} || $target_lpn eq "*" || $target_lpn eq "") {

    $lakeHash{$workHash{"id"}}{"id"} = $workHash{"id"};
    $lakeHash{$workHash{"id"}}{"ARE"} = $workHash{"ARE"};
    $lakeHash{$workHash{"id"}}{"LPN"} = $workHash{"LPN"};
    $lakeHash{$workHash{"id"}}{"AAC"} = $workHash{"AAC"};
    $lakeHash{$workHash{"id"}}{"AAC_label"} = $workHash{"AAC_label"};
    if (exists($workHash{"LDM"})) {
      $lakeHash{$workHash{"id"}}{"LDM"} = $workHash{"LDM"};
    if (exists($workHash{"HOW"})) {
      $lakeHash{$workHash{"id"}}{"HOW"} = $workHash{"HOW"};
  %workHash = ();

sub create_osm(){

  my $i = 0;
  my @array;

  open(OSM, ">$file_name.osm");
  print OSM "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
  print OSM "<osm version=\"0.5\" generator=\"KSJ2OSM\">\n";

  foreach my $item(keys %lakeHash) {
    foreach my $item2(keys %{$surfaceHash{$lakeHash{$item}{"ARE"}}}) {
      if ($item2 eq "exterior" || $item2 eq "interior") {
        @array = split(/,/, $surfaceHash{$lakeHash{$item}{"ARE"}}{$item2});
        foreach my $item3(@array) {
          @workArray = split(/,/, $curveHash{$item3});
          foreach my $coordinate(@workArray) {
            $i = write_node($i, $item, $item2, $item3, $coordinate);
          write_way($i, $item, $item2, $item3);
          $i = 0;

  print OSM "</osm>";
  close OSM;
  printf LOG 
    "***** Lake and Pond : %d nodes on %d ways\n"
    , $num_nodes, $num_ways;
  printf LOG 
    "***** Lake and Pond : water = %d ways, island = %d ways\n"
    , $num_waters, $num_lands;
    "***** Lake and Pond : %d nodes on %d ways\n"
    , $num_nodes, $num_ways;
    "***** Lake and Pond : water = %d ways, island = %d ways\n"
    , $num_waters, $num_lands;


sub write_node() {

  my ($i, $item, $item2, $item3, $coordinate) = @_;
  my $node_id = 0;
  my ($lat, $long) = split(/\s/, $coordinate);

  unless (defined($lat) && defined($long)) {
    print LOG "Skip error data : id = $item/$item2/$item3, coordinate = $coordinate\n";
    print "Skip error data : id = $item/$item2/$item3, coordinate = $coordinate\n";
    return $i;
  if (($lat eq "") || ($long eq "")) {
    print LOG "Skip error data : id = $item/$item2/$item3, coordinate = $coordinate, lat = $lat, long = $long\n";
    print "Skip error data : id = $item/$item2/$item3, coordinate = $coordinate, lat = $lat, long = $long\n";
    return $i;
  if (($lat > 90) || ($lat < -90) || ($lat == 0) 
    || ($long > 180) || ($long < -180) || ($long == 0)) {
    print LOG "Skip error data : id = $item/$item2/$item3, coordinate = $coordinate, lat = $lat, long = $long\n";
    print "Skip error data : id = $item/$item2/$item3, coordinate = $coordinate, lat = $lat, long = $long\n";
    return $i;

  my $tags = "<tag k=\"created_by\" v=\"National-Land-Numerical-Information_MLIT_Japan\"/>";
  $tags .= "<tag k=\"source\" v=\"KSJ2\"/>";
  $tags .= "<tag k=\"source_ref\" v=\"\"/>";
  $tags .= "<tag k=\"note\" v=\"National-Land Numerical Information (Lake and Pond) 2005, MLIT Japan\"/>";
  $tags .= "<tag k=\"note:ja\" v=\"国土数値情報(湖沼データ)平成17年 国土交通省\"/>";
  $tags .= "<tag k=\"KSJ2:lake_id\" v=\"$item\"/>";
  $tags .= "<tag k=\"KSJ2:ARE\" v=\"$lakeHash{$item}{'ARE'}\"/>";
  $tags .= "<tag k=\"KSJ2:curve_type\" v=\"$item2\"/>";
  $tags .= "<tag k=\"KSJ2:curve_id\" v=\"$item3\"/>";
  $tags .= "<tag k=\"KSJ2:coordinate\" v=\"$coordinate\"/>";
  $tags .= "<tag k=\"KSJ2:lat\" v=\"$lat\"/>";
  $tags .= "<tag k=\"KSJ2:long\" v=\"$long\"/>";
  $tags .= "<tag k=\"KSJ2:LPN\" v=\"$lakeHash{$item}{'LPN'}\"/>";

  if (exists($nodes{"$lat $long"})) {
    $node_id = $nodes{"$lat $long"};
  else {
    $node_id = $negative_id;
    $nodes{"$lat $long"} = $node_id;
    my $node 
      = sprintf("<node id=\"%d\" visible=\"true\" lat=\"%s\" lon=\"%s\">$tags</node>"
      , $node_id, $lat, $long);
    print OSM "$node\n";

  $node_ref .= sprintf("<nd ref=\"%d\" />", $node_id);
  printf LOG "Node %d: %s, %s\n", $node_id, $lat, $long;
  return $i;


sub write_way() {

  my ($i, $item, $item2, $item3) = @_;
  my $tmp_tags;

  my $tags = "<tag k=\"created_by\" v=\"National-Land-Numerical-Information_MLIT_Japan\"/>";
  $tags .= "<tag k=\"source\" v=\"KSJ2\"/>";
  $tags .= "<tag k=\"source_ref\" v=\"\"/>";
  $tags .= "<tag k=\"note\" v=\"National-Land Numerical Information (Lake and Pond) 2005, MLIT Japan\"/>";
  $tags .= "<tag k=\"note:ja\" v=\"国土数値情報(湖沼データ)平成17年 国土交通省\"/>";
  $tags .= "<tag k=\"KSJ2:lake_id\" v=\"$item\"/>";
  $tags .= "<tag k=\"KSJ2:ARE\" v=\"$lakeHash{$item}{'ARE'}\"/>";
  $tags .= "<tag k=\"KSJ2:curve_type\" v=\"$item2\"/>";
  $tags .= "<tag k=\"KSJ2:curve_id\" v=\"$item3\"/>";
  $tags .= "<tag k=\"KSJ2:AAC\" v=\"$lakeHash{$item}{'AAC'}\"/>";
  $tags .= "<tag k=\"KSJ2:AAC_label\" v=\"$lakeHash{$item}{'AAC_label'}\"/>";
  $tags .= "<tag k=\"KSJ2:LPN\" v=\"$lakeHash{$item}{'LPN'}\"/>";
  if (exists($lakeHash{$item}{"LDM"})) {
    $tags .= "<tag k=\"KSJ2:LDM\" v=\"$lakeHash{$item}{'LDM'}\"/>";
  if (exists($lakeHash{$item}{"HOW"})) {
    $tags .= "<tag k=\"KSJ2:HOW\" v=\"$lakeHash{$item}{'HOW'}\"/>";
#  $tags .= "<tag k=\"KSJ2:points\" v=\"$curveHash{$item3}\"/>"; # comment out to reduce data size.

  if ($item2 eq "exterior") {
    if    ($lake_type == 0) {$tmp_tags = "<tag k=\"natural\" v=\"water\"/>";}
    elsif ($lake_type == 1) {$tmp_tags = "<tag k=\"natural\" v=\"marsh\"/>";}
    elsif ($lake_type == 2) {$tmp_tags = "<tag k=\"landuse\" v=\"reservoir\"/>";}
    else                    {$tmp_tags = "<tag k=\"natural\" v=\"water\"/>";}

    if    ($water_type == 0) {$tmp_tags .= "<tag k=\"water_type\" v=\"fresh\"/>";}
    elsif ($water_type == 1) {$tmp_tags .= "<tag k=\"water_type\" v=\"salt\"/>";}
    else                     {$tmp_tags .= "<tag k=\"water_type\" v=\"fresh\"/>";}

    $tmp_tags .= "<tag k=\"layer\" v=\"-2\"/>";
    $tmp_tags .= "<tag k=\"name\" v=\"$lakeHash{$item}{'LPN'} ()\"/>";
    $tmp_tags .= "<tag k=\"name:en\" v=\"\"/>";
    $tmp_tags .= "<tag k=\"name:ja\" v=\"$lakeHash{$item}{'LPN'}\"/>";
    $tmp_tags .= "<tag k=\"name:ja_rm\" v=\"\"/>";

  else {
    $tmp_tags = "<tag k=\"natural\" v=\"land\"/><tag k=\"layer\" v=\"-1\"/>";

  my $way 
    = sprintf("<way id=\"%d\" action=\"modify\" visible=\"true\">$tmp_tags$node_ref$tags</way>"
    , $negative_id);
  print OSM "$way\n";
  printf LOG "Way %d: %d nodes id=%s/%s/%s LPN=%s\n"
    , $negative_id, $i, $item, $item2, $item3, $lakeHash{$item}{"LPN"};
  $node_ref = "";


# run this script.


# end of script