Script for cleaning up the descriptive text LINZ layer
The descriptive text LINZ layer has an assortment of unique place names and generic names like "School" or "Hospital". We can convert the generic names to OSM tags before uploading. These are all individual nodes. (
)
See also LINZ attribute matching and LINZ geo_name matching.
First download the descrip_text.osm.gz export from Rob's LINZ-2-OSM web app. (I renamed it chat_descrip_text.osm.gz locally to show that it's the Chatham Islands data.)
Decompress it:
gzip -d descrip_text.osm.gz
Create a sorted list of unique names:
grep 'k="name"' descrip_text.osm | sort | uniq -c | \ sort -nr | cut -f1,4 -d'"' | sed -e 's/<[^"]*"//'
Finally search and replace some common generic values:
sed -i \ -e 's/k="name" v="Aerodrome"/k="aeroway" v="aerodrome"/' \ -e 's+k="name" v="Airstrip"+k="aeroway" v="aerodrome" />\n <tag k="type" v="airstrip"+' \ -e 's/k="name" v="Camp"/k="tourism" v="camp_site"/' \ -e 's+k="name" v="Fire lookout"+k="man_made" v="tower" />\n <tag k="tower:type" v="observation"+' \ -e 's/k="name" v="Fire station"/k="amenity" v="fire_station"/' \ -e 's/k="name" v="Grave"/k="historic" v="grave"/' \ -e 's/k="name" v="Hall"/k="amenity" v="public_hall"/' \ -e 's/k="name" v="Hospital"/k="amenity" v="hospital"/' \ -e 's/k="name" v="Hotel"/k="tourism" v="hotel"/' \ -e 's/k="name" v="Hut"/k="tourism" v="alpine_hut"/' \ -e 's/k="name" v="Landfill"/k="landuse" v="landfill"/' \ -e 's/k="name" v="Power generation"/k="power" v="generator"/' \ -e 's/k="name" v="Quarry[ ]*"/k="landuse" v="quarry"/' \ -e 's/k="name" v="Racecourse"/k="highway" v="raceway"/' \ -e 's/k="name" v="Racetrack"/k="leisure" v="track"/' \ -e 's/k="name" v="Reservoir"/k="landuse" v="reservoir"/' \ -e 's/k="name" v="School"/k="amenity" v="school"/' \ -e 's/k="name" v="Sch"/k="amenity" v="school"/' \ -e 's/k="name" v="Silo"/k="man_made" v="silo"/' \ -e 's/k="name" v="Substation"/k="power" v="sub_station"/' \ -e 's/k="name" v="Substn"/k="power" v="sub_station"/' \ -e 's/k="name" v="University"/k="amenity" v="university"/' \ -e 's/k="name" v="Weir"/k="waterway" v="weir"/' \ -e 's/k="name" v="Well"/k="man_made" v="well"/' \ -e 's/k="name" v="(disused)"/k="disused" v="yes"/' \ descrip_text.osm
Top 100 repeated names from the mainland:
3524 Airstrip
2185 Sch
1034 Quarry
798 Hall
488 Hut
329 Marae
320 Gravel pit
256 Camp
229 Substation
204 Landfill
196 Reservoir
191 Rapids
140 Silo
128 Mill
110 Hospital
109 Cableway
100 Substn
83 Oxidation ponds
73 Power generation
71 Racecourse
69 Silos
64 Gas valve
51 Walkwire
50 Oxidation pond
48 Disused mine
45 Huts
45 Gun club
43 Weir
40 Rifle range
38 Well
38 Quarries
38 Old dam
38 Derelict
36 Abattoir
36 (disused)
33 Pipeline
31 Water treatment plant
27 Siphon
27 Shelter
26 Aerodrome
25 Rock bivouac
25 Old gold workings
24 Surf club
24 Factory
23 Derelict hut
21 Marina
19 Gravel pits
19 Fire lookout
18 Limeworks
18 Forest headquarters
17 Showgrounds
16 Reservoirs
16 Gas compound
16 (historic)
16 (derelict)
15 Intake
15 Disused gold workings
15 Aerial hazard
14 Old well
13 Spillway
13 Numerous disused gold workings
13 Grave
12 Thermal area
12 Racetrack
12 Disused
11 Prison
11 Camping ground
11 Airstrips
10 Motor camp
9 Wildlife refuge
9 Visitor centre
9 Lodge
8 Vehicle access along beach at low tide
8 University
8 Surge chamber
8 Flume
8 Fertilizer works
8 Disused railway
8 Derelict buildings
8 Airport
7 Gun emplacements
6 Water intake
6 Suspension bridge
6 Speedway
6 Settling pond
6 Quicksand
6 Pumice pit
6 Old tunnel
6 Meteorological station
6 Gold workings
6 Bivouac
5 Shingle works
5 Sale yards
5 Riverbed subject to rapid flooding
5 Old dams
5 Old battery
5 Numerous sinkholes
5 Numerous rock outcrops
5 Gas well
5 Fuel tanks
...
and 421 more names @ 5 or less occurrences, some* more important than others.
[*] e.g. "INTERMITTENT LIVE FIRING"
Placement
On the NZOGPS mailing list, Peter S wrote:
> The point coords describe the left, vertical lower case center location of > the label as it was applied to the 260 series maps, and usually to be found > in the most blank spot on the map near the proper location, the offset can > be literally kilometers away.