LINZ/Address Import

From OpenStreetMap Wiki
Jump to: navigation, search

Source data

Data sourced from LINZ simple street address layer NZ Street Address

Structure of data is documented in the Data dictionary. Details are shown in the #Notes section at the bottom of this page.

Conversion and Tagging

Subsets of addresses to be processed differently.

address_type: Water | Road (initially will ignore water)

town_city + suburb_locality: Suburb+City | Locality | Town

We won't use the _ascii fields.


Node keys

in generated osmchange file

LINZ OSM Comment
address_id LINZ:address_id or ref:linz:address_id ref=* Explicit connection to source data, to be used for mainenance
full_road_name addr:street=* E.g. "Open Map Street"
address_number + address_number_suffix addr:housenumber=* E.g. "2" or "3A"
unit_value addr:unit=* E.g. "A" or "1" or "1-198" only if present in source data (about 150k addresses)
<coordinates> <location> Datum converted NZ to OSM
suburb_locality addr:suburb=* or addr:hamlet=* suburb when town_city also present, otherwise hamlet.
town_city addr:city=* When present in source, suburb_locality is always present

NOTE1: the inclusion of addr:city, addr:hamlet, addr:suburb keys is mainly for the mapper to verify against the underlying map.

When this information is already present in e.g. place=* on an area (very likely in urban areas), the redundant information will be removed by the mapper doing the upload. The mapper may also choose to create or adjust a suburb or place boundary or POI manually.

NOTE2: About 5k addresses are "ranged". I.e. they have an "address_number_high" in the source database. A few also have unit numbers. And the full address in the source data looks like "4B/22-26 High Street". This import proposal uses only "address_number" and ignores the "address_number_high" as redundant.

Changeset keys

Keys on each changeset changeset

Filtering

Identifying Duplicates

  • Obtain all OSM items that contain addr:housenumber
  • Find centroid of ways (buildings) to use as position for proximity testing.
  • Generate table of node/way id, obj type, position, addr:street, addr:housenumber
  • Convert LINZ positions to WGS84 SRID=4326 for comparison with OSM positions
  • Match with LINZ data on proximity + housenumber (+ street if present).

Other odd stuff, involving relations, interpolations etc. Initially, relations can be ignored. Members that are points or polys with addr:housenumber can be identified as duplicates by number and proximity to LINZ address with same number.


What to do with the duplicates?

Eventually, all items that are real duplicates would have LINZ id attributes added.

Nodes with only addr:housenumber that aren't part of a relation - add addr:street

Addresses that seem to be duplicates, but are far away from the LINZ point location would be reviewed by a person. E.g. sometimes houses get tagged with the correct number, but the wrong street name.

Polygons (i.e buildings), need to discuss. EliotB opinion is to add the address as a node which applies to location independent of what is built on it. The building can be demolished, but the address remains valid.


An idea of how many duplicates there might be:

# Get NZ items that contain addr:housenumber
> wget -O nz_addr.osm "http://www.overpass-api.de/api/xapi_meta?*[addr:housenumber=*][bbox=157.5,-59.0,179.9,-25.5]"
> spatialite_osm_raw -d nz_address-osm_raw.spatialite -o nz_addr.osm

Analysis : nodes 12727, ways 33009, relations 22

Non Duplicates

Approximately 3% of NZ addresses are already in OSM in some form (40K/2M). The remaining 97% will be new.

In the first pass, potential duplicates will be identified, and saved as a separate dataset for later processing. The remaining non-duplicates will be uploaded in batches.

Batch membership will be determined by town_city & suburb_locality in common.

A quick delve into the data gives

  • Water addresses 160, Road 1.9 million
  • Localities (town_city = NULL) 1930 distinct. 1 to 2800 items per locality. 25 localities with > 1000 addresses
  • town + suburb 1182 distinct, 258 of which have town=suburb. 645 have <1K addresses, 907 < 2K, 6 > 10K

Using the above distinction would give about 3100 separate batches.

An osmchange file will be generated for each place. After review, each will be uploaded as a separate changeset.

Maintenance

Record the version of the LINZ database used to generate each import.

Periodically retrive a new version of the LINZ dataset, and obtain list of additions, deletions, changes w.r.t. previous set. The list would be checked manually against OSM (potentially a bot could do the checking, can investigate after the manual process has been trialled)

How to find new things entered on OSM side? Addressed items lacking linz:address_id would be candidates.

Software Tools

  • Code for data conversion, exploration, changeset generation by EliotB: git

Related Discussions

Notes

Details of source database

Column Name Data Type Length Precision Scale Example Description
address_id integer 32 0 505588 AIMS unique identifier for an address.
change_id integer 32 0 1304726 AIMS unique identifier for the address version.
address_type varchar 20 Road The type of address. Includes: Road and Water.
unit_value varchar 70 Alpha numeric value for a unit
address_number integer 32 0 1 Address number
address_number_suffix varchar A Alpha numeric characters that may follow the address number.
address_number_high integer 32 0 High address number of a ranged address.
water_route_name varchar 100 Name of the beach the water address relates to. Currently this contains the

captured segment of coastline. This will be blank for ROAD addresses.

water_name varchar 100 Water body the address relates to. This will be blank for ROAD addresses.
suburb_locality varchar 80 Dannemora Suburb/Locality from the NZ Localities (NZ Fire Service owned dataset).
town_city varchar 80 Auckland Town/City from the NZ Localities (NZ Fire Service owned dataset).
full_address_number varchar 100 1A All number components concatenated for an address.
full_road_name varchar 250 Joe Bloggs Road All road name components concatenated for an address. This has been derived

from the ‘Landonline: Roads’ Data and will move to using the new ‘Roads’ data tables when they become available’.

full_address varchar 400 1A Joe Bloggs Road, Dannemora, Auckland All address components concatenated for an address.
road_section_id integer 32 0 199943 Landonline Road Centreline ID (RCL_ID).
gd2000_xcoord numeric 12 8 174.9255518167 NZGD2000 X-coordinate for the address in metres.
gd2000_ycoord numeric 12 8 -36.9246773 NZGD2000 Y-coordinate for the address in metres.
shape geometry <geometry> Spatial geometry for the point in long/lat GD2000 ESPG 4167.
ascii variants not going to use