Talk:Colombia/Project-2010 floods/ImportDept

From OpenStreetMap Wiki
Jump to navigation Jump to search

Perl script used to process OSM files for use in uploading

I prepared and used the following Perl script for every Department boundary that I uploaded. The created .osm file was opened in JOSM as its own layer and used as a reference point and data source ("reference layer"). Typically a new layer was created and data downloaded from OSM into that layer ("working layer"). For nodes / ways which were not already present in the "working layer", a selection was made in the "reference layer", this selection was uploaded (using 'File'-'Upload Selection'), then the uploaded material was downloaded into the "working layer".

Getting an empty relation from the "reference layer" into the "working layer"

A manual step required was uploading of the empty relation shell for a Department to be uploaded. Originally, this was done using a separate Perl script, but later a simple manual process was used.

  1. In the "reference layer", select all ways which are part of the relation and remove them from the relation, leaving an empty relation (0 members).
  2. Select this alone in the "working layer" and 'Upload Selection'.
  3. Go into the Map web and find the changeset which you have just either added to or crated through the relation shell upload
  4. In the changeset, you will see the newly uploaded relation shell; click through this, then copy the URL for the "XML Download" link.
  5. Go to the "working layer" and use the 'File'-'Open Location' function, pasting the copied URL into the dialog and unselecting the 'separate layer' checkbox
  6. This will result in an empty relation being downloaded into the "working layer"
  7. Fill this relation with ways as you would any other relation

The Perl code

use strict;
# composed by ceyockey 25 December 2010
# ceyockey is an OSM Wiki username

print "\nThis script processes .osm files used for uploading department borders for the country of Colombia according to revision guidelines found in the same wiki.  The scope of use is very limited - department .osm file revisions in relation to the 2010 November Flooding of Colombia.\n";

print "\nREQUIREMENTS\n\t* The input files need to have the extension .osm .\nThe output files will have the extension .osm and can be used in JOSM.\n\t* This was written with ActivePerl in mind and a particular file location relationship between executable and script.  This can be modified by revising the _scriptoffset_ and _baseinputfilename_ parameter values set during variable initialization.\n";

print "\nHOW IT WORKS\nThe script parses through an input file line by line.  It sets a variable to indicate whether the parser is presently in a WAY section of the XML or a RELATION section of the XML.  Output is initially to an array, which is printed at the end to an outfile.\n\t* If a line is to be removed from the file, this line is not included in the output array.\n\t* If a line is to be modified, the line is modified and pushed onto the output array.\n";
print "\nRULES USED\n\t* Lines in a WAY context which contain any of the following words are excluded from output:  clcfile, name, codane, area_ofici, entidad_te, objectid, is_in:state, DANE:departamento.\n\t* It is assumed the is_in:country appears on the same line as admin_level in the WAY context.  The script removes the is_in:country key from this line.\n\t* Lines in a RELATION context which contain area_ofici as a key have the key name changed to DANE:area.\n\t* In a RELATION context, if key names either departamen or codanedep are detected, an exception with line number is printing to screen (stanard output), but the line is allowed to pass into output.\n";

my $infile='';
my $outfile = '';
my $inlinecount='';
my $curline = '';
my @inlines=();
my @outlines=();
my $scriptoffset = "../scripts/";
my $baseinputfilename = "ADM_Departamentos_ways_";
my $inway = '';
my $inrel = '';
my $waynumber = '0';
my $relnumber = '0';
my $w = '';

print "\nI assume the file name starts with \"ADM_Departamentos_ways_\" with a number following.  What is that number?  ";
chomp($infile = <STDIN>);
$outfile = $scriptoffset.$baseinputfilename.$infile."cleaned.osm";
$infile = $scriptoffset.$baseinputfilename.$infile.".osm";
print "\n\nBeginning to process $infile\n";

open (OSMIN, "<", $infile) || die "Can't create $infile: $!";
@inlines = <OSMIN>;
open (OSMNEW, ">", $outfile) || die "Can't create $outfile: $!";

chomp @inlines;

$inlinecount = $#inlines + 1;

print "\n\tThere are $inlinecount lines in the origin file.\n";

for my $i (0 .. $#inlines) {
#	unless ($inlines[$i]) {next}
	$curline = $inlines[$i];
	if ($curline =~ /way id/i) {
		$waynumber = $waynumber +1;
		$inway = 'yes';
		$inrel = 'no';
		push @outlines, $curline;
		next;
	}
	if ($curline =~ /relation id/i) {
		$relnumber = $relnumber +1;
		$inway = 'no';
		$inrel = 'yes';
		push @outlines, $curline;
		next;
	}
	if ($inway =~ 'yes') {
		if ($curline =~ /clcfile|name|codane|area_ofici|entidad_te|objectid|is_in:state|DANE:departamento/i) {next}
		if ($curline =~ /admin_level/i) {
			if ($curline =~ /is_in:country/i) {
				$curline = '<tag k="admin_level" v="4"/><tag k="boundary" v="administrative"/>';
				push @outlines, $curline;
				next;
			}
		}
	}
	if ($inrel =~ 'yes') {
		if ($curline =~ /departamen$/i) {
			$w = $i + 1;
			print "\n\tEXCEPTION: key \"DEPARTAMEN\" found but not expected; see line number $w \n";
			next;
		}
		if ($curline =~ /codanedep/i) {
			$w = $i + 1;
			print "\n\tEXCEPTION: key \"CODANEDEP\" found but not expected; see line number $w \n";
			next;
		}
		if ($curline =~ /codane/i) {
			$curline =~ s/codane/divipola/i;
			push @outlines, $curline;
			next;
		}
		if ($curline =~ /area_ofici/i) {
			$curline =~ s/area_ofici/dane:area/i;
			$curline =~ s/dane/DANE/;
			push @outlines, $curline;
			next;
		}
	}
	push @outlines, $curline;
}

print "\n\tDone processing ... going to print out data.\n";

foreach (@outlines) {
	print OSMNEW "$_\n";
}

print "\nSummary: $waynumber ways and $relnumber relations.\n";