Key:boundary/statistics

From OpenStreetMap Wiki
Jump to: navigation, search

Following the debate on trac ticket 1332 I wanted to create statistics for the size of the objects (closed ways and relations in which the ways can be arrange to form one large way) for the different zoomlevels. Having done so I want to share the results with others.

Getting the data

To get the necessary data I went to Xapi and downloaded the relevant data using this little shell script:

for i in $(seq 1 10); do
   wget -O admin_level_$i.osm "http://osmxapi.hypercube.telascience.org/api/0.5/*%5Badmin_level=$i%5D";
done

Determining way area

To determine the area I wrote a shell script which parses the osm file and handles two cases.

Closed ways

For closed ways I calculate the size using the formula for the area of an polygon from Wikipedia.

"Closed" Relations

For a relation I try the order the ways such that way i+1 has its first node where way i has its last and the last way ends with the same point the first way started with. If this is the case I call the relation "closed". For such an relation I create an temporary way by concating the nodes of the single ways and calculate the area using the same formula as above.

The Code

The code itself is partially copied from osmarender/perl and is available from http://www.petschge.de/osm/stats/areastats.pl. The raw output (before averaging) is available at http://www.petschge.de/osm/stats/admin_level_results.tar.bz2.

Averaging

The code discussed above prints one line for every closed way and relation. To make the data more useful I calculated average and standard deviation of the size for each admin level using the following short perl script:

#!/usr/bin/perl
my $areasum = 0;
my $areasquaresum = 0;
my $linecount = 0;
while (<STDIN>) {
        $areasum += $_;
        $areasquaresum += $_ * $_;
        $linecount++;
}
my $average = $areasum / $linecount;
my $s = sqrt($areasquaresum / $linecount - $average * $average);
print "average is $average, stddev is $s\n";

The results

admin_level # of closed ways # of closed relations average area standard deviation of area
1 3 0 0.0069 0.0098
2 124 62 0.1 1.3
3 2 0 0.0079 0.0079
4 95 16 0.17 0.93
5 1 3 0.215 0.039
6 696 114 0.004 0.016
7 7 2 0.030 0.078
8 26206 1581 0.0004 0.0085
9 12 6 0.00038 0.00064
10 36 26 0.0008 0.0027

All areas are given in units of arc degres squared.

Discussion

A couple of points are apparent from the data

Few closed objects

Especially at low zoom level few objects are closed. This might be related to the unresolved situation of maritime borders. One relations for each object with low admin_levels collecting all the ways forming the border would really help

Why not help resolve Maritime borders so we can get more closed objects? --Skippern 16:02, 23 February 2009 (UTC)
boundary=maritime for more about maritime borders, including territorial border. --Skippern 02:13, 16 August 2009 (UTC)

even / odd difference

even admin_levels are way more popular than odd values

high standard deviation

Possibly due to the low number of closed objects or some bug in my code the standard deviation is quite high.