Colombia/Project-2010 floods/ImportMuni/Perl for OSMfile revisions

From OpenStreetMap Wiki
Jump to navigation Jump to search

avoid confusing this with the tool mentioned at http://lists.openstreetmap.org/pipermail/talk-co/2010-December/001619.html


On the left, before processing; on the right, after processing.

After manually revising several .osm files accoring to rules at WikiProject Colombia/OCHA Boundary Import, I thought it best to write something to help speed the process. The code below works for the files I've encountered so far on the ImportMuni page.

The image at right shows a before and after view of way data, which is the primary content altered by this script.

Prepared by — Ceyockey

use strict;
# composed by ceyockey 25 December 2010
# ceyockey is an OSM Wiki username

print "\nThis script processes .osm files used for uploading municipality borders for the country of Colombia according to revision guidelines found in the same wiki.  The scope of use is very limited - municipality .osm file revisions in relation to the 2010 November Flooding of Colombia.\n";

print "\nREQUIREMENTS\n\t* The input files need to have the extension .osm .\nThe output files will have the extension .osm and can be used in JOSM.\n\t* This was written with ActivePerl in mind and a particular file location relationship between executable and script.  This can be modified by revising the _scriptoffset_ and _baseinputfilename_ parameter values set during variable initialization.\n";

print "\nHOW IT WORKS\nThe script parses through an input file line by line.  It sets a variable to indicate whether the parser is presently in a WAY section of the XML or a RELATION section of the XML.  Output is initially to an array, which is printed at the end to an outfile.\n\t* If a line is to be removed from the file, this line is not included in the output array.\n\t* If a line is to be modified, the line is modified and pushed onto the output array.\n";
print "\nRULES USED\n\t* Lines in a WAY context which contain any of the following words are excluded from output:  clcfile, name, codane, area_ofici, entidad_te, objectid, is_in:state, DANE:departamento.\n\t* It is assumed the is_in:country appears on the same line as admin_level in the WAY context.  The script removes the is_in:country key from this line.\n\t* Lines in a RELATION context which contain area_ofici as a key have the key name changed to DANE:area.\n\t* In a RELATION context, if key names either departamen or codanedep are detected, an exception with line number is printing to screen (stanard output), but the line is allowed to pass into output.\n";

my $infile='';
my $outfile = '';
my $inlinecount='';
my $curline = '';
my @inlines=();
my @outlines=();
my $scriptoffset = "../scripts/";
my $baseinputfilename = "ADM_Municipios_ways_";
my $inway = '';
my $inrel = '';
my $waynumber = '0';
my $relnumber = '0';
my $w = '';

print "\nI assume the file name starts with \"ADM_Municipios_ways_\" with a number following.  What is that number?  ";
chomp($infile = <STDIN>);
$outfile = $scriptoffset.$baseinputfilename.$infile."cleaned.osm";
$infile = $scriptoffset.$baseinputfilename.$infile.".osm";
print "\n\nBeginning to process $infile\n";

open (OSMIN, "<", $infile) || die "Can't create $infile: $!";
@inlines = <OSMIN>;
open (OSMNEW, ">", $outfile) || die "Can't create $outfile: $!";

chomp @inlines;

$inlinecount = $#inlines + 1;

print "\n\tThere are $inlinecount lines in the origin file.\n";

for my $i (0 .. $#inlines) {
#	unless ($inlines[$i]) {next}
	$curline = $inlines[$i];
	if ($curline =~ /way id/i) {
		$waynumber = $waynumber +1;
		$inway = 'yes';
		$inrel = 'no';
		push @outlines, $curline;
		next;
	}
	if ($curline =~ /relation id/i) {
		$relnumber = $relnumber +1;
		$inway = 'no';
		$inrel = 'yes';
		push @outlines, $curline;
		next;
	}
	if ($inway =~ 'yes') {
		if ($curline =~ /clcfile|name|codane|area_ofici|entidad_te|objectid|is_in:state|DANE:departamento/i) {next}
		if ($curline =~ /admin_level/i) {
			if ($curline =~ /is_in:country/i) {
				$curline = '\t<tag k="admin_level" v="6"/><tag k="boundary" v="administrative"/>';
				push @outlines, $curline;
				next;
			}
		}
	}
	if ($inrel =~ 'yes') {
		if ($curline =~ /departamen$/i) {
			$w = $i + 1;
			print "\n\tEXCEPTION: key \"DEPARTAMEN\" found but not expected; see line number $w \n";
			next;
		}
		if ($curline =~ /codanedep/i) {
			$w = $i + 1;
			print "\n\tEXCEPTION: key \"CODANEDEP\" found but not expected; see line number $w \n";
			next;
		}
		if ($curline =~ /codane/i) {
			$curline =~ s/codane/divipola/i;
			push @outlines, $curline;
			next;
		}
		if ($curline =~ /area_ofici/i) {
			$curline =~ s/area_ofici/dane:area/i;
			$curline =~ s/dane/DANE/;
			push @outlines, $curline;
			next;
		}
	}
	push @outlines, $curline;
}

print "\n\tDone processing ... going to print out data.\n";

foreach (@outlines) {
	print OSMNEW "$_\n";
}

print "\nSummary: $waynumber ways and $relnumber relations.\n";