Bash/Script for cleaning up the descriptive text LINZ layer

From OpenStreetMap Wiki
Jump to: navigation, search

The descriptive text LINZ layer has an assortment of unique place names and generic names like "School" or "Hospital". We can convert the generic names to OSM tags before uploading. These are all individual nodes. (Node)

See also LINZ attribute matching and LINZ geo_name matching.


  • Note: the new version of web app exports .osc files, and we'll try to do this on the tag matching site instead of a shell script. That's all a bit academic though as this layer was uploaded in its entirety in 2010, and has been slowly & manually merged into the nearby unlabeled ground features since then.

First download the descrip_text.osm.gz export from the LINZ-2-OSM web app. (I renamed it chat_descrip_text.osm.gz locally to show that it's the Chatham Islands data.)

Decompress it:

gzip -d descrip_text.osm.gz

Create a sorted list of unique names:

grep 'k="name"' descrip_text.osm  | sort | uniq -c | \
   sort -nr | cut -f1,4 -d'"' | sed -e 's/<[^"]*"//'


Finally search and replace some common generic values:

sed -i \
    -e 's/k="name" v="Aerodrome"/k="aeroway" v="aerodrome"/' \
    -e 's+k="name" v="Airstrip"+k="aeroway"  v="aerodrome" />\n      <tag k="type" v="airstrip"+' \
    -e 's/k="name" v="Camp"/k="tourism" v="camp_site"/' \
    -e 's+k="name" v="Fire lookout"+k="man_made"  v="tower" />\n      <tag k="tower:type" v="observation"+' \
    -e 's/k="name" v="Fire station"/k="amenity" v="fire_station"/' \
    -e 's/k="name" v="Grave"/k="historic" v="grave"/' \
    -e 's/k="name" v="Hall"/k="amenity" v="public_hall"/' \
    -e 's/k="name" v="Hospital"/k="amenity" v="hospital"/' \
    -e 's/k="name" v="Hotel"/k="tourism" v="hotel"/' \
    -e 's/k="name" v="Hut"/k="tourism" v="alpine_hut"/' \
    -e 's/k="name" v="Landfill"/k="landuse" v="landfill"/' \
    -e 's/k="name" v="Power generation"/k="power" v="generator"/' \
    -e 's/k="name" v="Quarry[ ]*"/k="landuse" v="quarry"/' \
    -e 's/k="name" v="Racecourse"/k="highway" v="raceway"/' \
    -e 's/k="name" v="Racetrack"/k="leisure" v="track"/' \
    -e 's/k="name" v="Reservoir"/k="landuse" v="reservoir"/' \
    -e 's/k="name" v="School"/k="amenity" v="school"/' \
    -e 's/k="name" v="Sch"/k="amenity" v="school"/' \
    -e 's/k="name" v="Silo"/k="man_made" v="silo"/' \
    -e 's/k="name" v="Substation"/k="power" v="sub_station"/' \
    -e 's/k="name" v="Substn"/k="power" v="sub_station"/' \
    -e 's/k="name" v="University"/k="amenity" v="university"/' \
    -e 's/k="name" v="Weir"/k="waterway" v="weir"/' \
    -e 's/k="name" v="Well"/k="man_made" v="well"/' \
    -e 's/k="name" v="(disused)"/k="disused" v="yes"/' \
   descrip_text.osm


Top 100 repeated names from the mainland:

   3524       Airstrip
   2185       Sch
   1034       Quarry
    798       Hall
    488       Hut
    329       Marae
    320       Gravel pit
    256       Camp
    229       Substation
    204       Landfill
    196       Reservoir
    191       Rapids
    140       Silo
    128       Mill
    110       Hospital
    109       Cableway
    100       Substn
     83       Oxidation ponds
     73       Power generation
     71       Racecourse
     69       Silos
     64       Gas valve
     51       Walkwire
     50       Oxidation pond
     48       Disused mine
     45       Huts
     45       Gun club
     43       Weir
     40       Rifle range
     38       Well
     38       Quarries
     38       Old dam
     38       Derelict
     36       Abattoir
     36       (disused)
     33       Pipeline
     31       Water treatment plant
     27       Siphon
     27       Shelter
     26       Aerodrome
     25       Rock bivouac
     25       Old gold workings
     24       Surf club
     24       Factory
     23       Derelict hut
     21       Marina
     19       Gravel pits
     19       Fire lookout
     18       Limeworks
     18       Forest headquarters
     17       Showgrounds
     16       Reservoirs
     16       Gas compound
     16       (historic)
     16       (derelict)
     15       Intake
     15       Disused gold workings
     15       Aerial hazard
     14       Old well
     13       Spillway
     13       Numerous disused gold workings
     13       Grave
     12       Thermal area
     12       Racetrack
     12       Disused
     11       Prison
     11       Camping ground
     11       Airstrips
     10       Motor camp
      9       Wildlife refuge
      9       Visitor centre
      9       Lodge
      8       Vehicle access along beach at low tide
      8       University
      8       Surge chamber
      8       Flume
      8       Fertilizer works
      8       Disused railway
      8       Derelict buildings
      8       Airport
      7       Gun emplacements
      6       Water intake
      6       Suspension bridge
      6       Speedway
      6       Settling pond
      6       Quicksand
      6       Pumice pit
      6       Old tunnel
      6       Meteorological station
      6       Gold workings
      6       Bivouac
      5       Shingle works
      5       Sale yards
      5       Riverbed subject to rapid flooding
      5       Old dams
      5       Old battery
      5       Numerous sinkholes
      5       Numerous rock outcrops
      5       Gas well
      5       Fuel tanks
     ...

and 421 more names @ 5 or less occurrences, some* more important than others.

[*] e.g. "INTERMITTENT LIVE FIRING"


Placement

On the NZOGPS mailing list, Peter S wrote:

> The point coords describe the left, vertical lower case center location of
> the label as it  was applied to the 260 series maps, and usually to be found
> in the most blank spot on the map near the proper location, the offset can
> be literally kilometers away.