LINZ/Address Import

From OpenStreetMap Wiki
Jump to navigation Jump to search
Logo.png
This page describes a historic artifact in the history of OpenStreetMap. It does not reflect the current situation, but instead documents the historical concepts, issues, or ideas.
About
This page documents the original LINZ Address Import from 2017. The more recent import is documented at Import/New Zealand Street Addresses (2021)
Captured time
2017


Initial import complete

The import described below was completed in June 2018 with nearly two million addresses imported. (Note that "complete" doesn't mean that every address was imported, just the process ended)

The import is summarised in this spreadsheet

The source data contained some sets of addresses with many addresses at exactly the same coordinate e.g. apartments or flats. Coincident points mustn't be added to OSM so either the points were manually distributed in space, or they were deleted, perhaps replace with a single address e.g. "1/3 A Street" to "100/3 A Street" replaced with "100 A Street"

The source data was split using the name of city/suburb or hamlet. During import, it was found that there are some hamlets with identical names in different parts of the country: Aramoana, Awatuna, Belmont, Blue Mountains, Clifton, Dalefield, Karamu, Kinloch, Longbush, Matahiwi, Muriwai, Ngahape, Otara, Owhiro, Purangi, Tokanui, Waikawau, Woodside. Any future import/update should consider this.

Source data

Data sourced from LINZ simple street address layer NZ Street Address

Structure of data is documented in the Data dictionary. Details are shown in the #Notes section at the bottom of this page.

Conversion and Tagging

Subsets of addresses to be processed differently.

address_type: Water | Road (initially will ignore water)

town_city + suburb_locality: Suburb+City | Locality | Town

We won't use the _ascii fields.

Node keys

in generated osmchange file

LINZ OSM Comment
address_id LINZ:address_id or ref:linz:address_id=* Explicit connection to source data, to be used for maintenance
full_road_name addr:street=* E.g. "Open Map Street"
full_address_number addr:housenumber=* E.g. "2" or "3A" or "4/56B". This follows NZ addressing conventions for unit number prefix.
<coordinates> <location> Datum converted NZ to OSM
suburb_locality addr:suburb=* or addr:hamlet=* suburb when town_city also present, otherwise hamlet.
town_city addr:city=* When present in source, suburb_locality is always present

NOTE 1: the inclusion of addr:city, addr:hamlet, addr:suburb keys is mainly for the mapper to verify against the underlying map.

When this information is already present in e.g. place=* on an area (very likely in urban areas), the redundant information will be removed by the mapper doing the upload. The mapper may also choose to create or adjust a suburb or place boundary or POI manually.

NOTE 2: About 5k addresses are "ranged". I.e. they have an "address_number_high" in the source database. A few also have unit numbers. And the full address in the source data looks like "4B/22-26 High Street". This import proposal uses only "address_number" and ignores the "address_number_high" as redundant.

Changeset keys

Keys on each changeset changeset:

Filtering

Identifying Duplicates

  • Obtain all OSM items that contain addr:housenumber
  • Find centroid of ways (buildings) to use as position for proximity testing.
  • Generate table of node/way id, obj type, position, addr:street, addr:housenumber
  • Convert LINZ positions to WGS84 SRID=4326 for comparison with OSM positions
  • Match with LINZ data on proximity + housenumber (+ street if present).

Other odd stuff, involving relations, interpolations etc. Initially, relations can be ignored. Members that are points or polys with addr:housenumber can be identified as duplicates by number and proximity to LINZ address with same number.


What to do with the duplicates?

Eventually, all items that are real duplicates would have LINZ id attributes added.

Nodes with only addr:housenumber that aren't part of a relation - add addr:street

Addresses that seem to be duplicates, but are far away from the LINZ point location would be reviewed by a person. E.g. sometimes houses get tagged with the correct number, but the wrong street name.

Polygons (i.e buildings), need to discuss. EliotB opinion is to add the address as a node which applies to location independent of what is built on it. The building can be demolished, but the address remains valid.

An idea of how many duplicates there might be:

# Get NZ items that contain addr:housenumber
> wget -O nz_addr.osm "http://www.overpass-api.de/api/xapi_meta?*[addr:housenumber=*][bbox=157.5,-59.0,179.9,-25.5]"
> spatialite_osm_raw -d nz_address-osm_raw.spatialite -o nz_addr.osm

Analysis: nodes 12727, ways 33009, relations 22

Non Duplicates

Approximately 3% of NZ addresses are already in OSM in some form (40K/2M). The remaining 97% will be new.

In the first pass, potential duplicates will be identified, and saved as a separate dataset for later processing. The remaining non-duplicates will be uploaded in batches.

Batch membership will be determined by town_city & suburb_locality in common.

A quick delve into the data gives

  • Water addresses 160, Road 1.9 million
  • Localities (town_city = NULL) 1930 distinct. 1 to 2800 items per locality. 25 localities with > 1000 addresses
  • town + suburb 1182 distinct, 258 of which have town=suburb. 645 have <1K addresses, 907 < 2K, 6 > 10K

Using the above distinction would give about 3100 separate batches.

An osmchange file will be generated for each place. After review, each will be uploaded as a separate changeset.

Using JOSM to import a dataset

  1. Ensure that your import-specific OSM user ID is active. (Edit/Preferences..)
  2. Load the generated changeset (File/Open...) e.g. The_Place.osm
  3. Download the existing OSM map data (File/Download data...) Select 'Download as new layer'. *Don't use (File/Download in current view)*, because it downloads into the same layer as the local data. Or download into existing OSM map layer. You may also want to enable an imagery layer, e.g. LINZ NZ Aerial Imagery
  4. Check that the roads implied by the address data are present.
  5. For small villages or localities, consider adding (for instance) place=hamlet at the centre.
  6. (removed this step which was deletion of addr:suburb where suburb boundary exists)
  7. Check for any conflict e.g. Run the conflation plugin with OSM data as Reference, new layer as Subject.
  8. Get the contents of The_Place.changeset_tags into the clipboard.
  9. Upload data (File/Upload Data, make sure the changeset layer is selected first). Go to [Tags of New Changeset], Paste the changeset tags: Click the button with 3 plus signs.

Common problems/solutions

  • Sometimes the LINZ data has multiple addresses at exactly the same location. This will result in a warning when you try to upload. To solve this, move one of the points a short distance away, then select all the points and use (Tools/Distribute Nodes) to spread them evenly along a line OR delete all but one of the points before upload.
  • There are a few place names where there are two localities with the same name. This can cause confusion... Aramoana, Awatunua, Blue Mountains, Clifton, Dalefield, Karamu, Kinloch, Longbush, Muriwai, Ngahape, Otara, Owhiro, Purangi, Tokanui, Waikawau, Woodside

Maintenance

Record the version of the LINZ database used to generate each import.

Periodically retrive a new version of the LINZ dataset, and obtain list of additions, deletions, changes w.r.t. previous set. The list would be checked manually against OSM (potentially a bot could do the checking, can investigate after the manual process has been trialled)

How to find new things entered on OSM side? Addressed items lacking ref:linz:address_id=* would be candidates.

Software Tools

  • Code for data conversion, exploration, changeset generation by EliotB: git[dead link]

Related Discussions

Notes

Details of source database

Name Data Type Length Precision Scale Example Description
address_id integer 32 0 505588 AIMS unique identifier for an address.
change_id integer 32 0 1304726 AIMS unique identifier for the address version.
address_type varchar 20 Road The type of address. Includes: Road and Water.
unit_value varchar 70 Alpha numeric value for a unit
address_number integer 32 0 1 Address number
address_number_suffix varchar A Alpha numeric characters that may follow the address number.
address_number_high integer 32 0 High address number of a ranged address.
water_route_name varchar 100 Name of the beach the water address relates to. Currently this contains the

captured segment of coastline. This will be blank for ROAD addresses.

water_name varchar 100 Water body the address relates to. This will be blank for ROAD addresses.
suburb_locality varchar 80 Dannemora Suburb/Locality from the NZ Localities (NZ Fire Service owned dataset).
town_city varchar 80 Auckland Town/City from the NZ Localities (NZ Fire Service owned dataset).
full_address_number varchar 100 1A All number components concatenated for an address.
full_road_name varchar 250 Joe Bloggs Road All road name components concatenated for an address. This has been derived

from the ‘Landonline: Roads’ Data and will move to using the new ‘Roads’ data tables when they become available’.

full_address varchar 400 1A Joe Bloggs Road, Dannemora, Auckland All address components concatenated for an address.
road_section_id integer 32 0 199943 Landonline Road Centreline ID (RCL_ID).
gd2000_xcoord numeric 12 8 174.9255518167 NZGD2000 X-coordinate for the address in metres.
gd2000_ycoord numeric 12 8 -36.9246773 NZGD2000 Y-coordinate for the address in metres.
shape geometry <geometry> Spatial geometry for the point in long/lat GD2000 ESPG 4167.
ascii variants not going to use