Import/LINZ Topo50 Continuation

From OpenStreetMap Wiki
Jump to navigation Jump to search
LINZ Main Page LINZ Building Import LINZ Address Import
LINZ Data Import
LINZ Address Import Screenshot.png
Author: Kylenz
License: MIT License
Platform: Web
Status: Active
Version: 1.0.0 (2021-03-09)
Language: multiple languages
Website: https://osm-nz.github.io/RapiD
Source code: osm-nz/linz-address-import
Programming language: TypeScript

Modification of RapiD to compare and update OSM data based on LINZ data

Background

Data from LINZ's Topo50 maps was imported into OSM between 2009 and 2016. Not all the datasets were imported during this time. This page documents the process used to import data since 2021. The main wiki page contains details about the tagging and the current status.

Source data & source code

This project is just a small modification of the LINZ Address Import system. Most of the code is the same.

How it works

Every imported feature has the tag ref:linz:topo50_id=* or ref:linz:place_id=*, which is the unique UFID used by LINZ. This allows the data to be easily conflated, just like NZ Street Addresses. This works as follows:

  1. A script runs daily during the import, which downloads the OSM Planet file (geofabric lets you download just Oceania + Antarctica).
  2. Every occurrence of the tag ref:linz:topo50_id=* and ref:linz:place_id=* is extracted from the OSM Planet file
  3. The list of topo50_ids to ignore is downloaded from the Google Sheet.
  4. Each incomplete LINZ layer is processed: the features that are not in OSM nor in the Google Sheet are converted to geojson.
  5. The geojson files are split up into geographic regions depending on its size:
    • Datasets with very few features are crudely split into 8 large areas (roughly equivalent to  NZ Regions)
    • Datasets with a moderate number of features are split into 33 areas according to this map (roughly the size of  NZ Districts)
    • Datasets with a large number of features are split into 'sectors', such as K15. The mainland is divided into 26 columns (A-Z) and 26 rows (1-26). Sectors span roughly 0.5 degrees of latitude and 0.5 degrees of longitude.
  6. The segmented geojson files are uploaded to the  CDN, along with the geojson files from the LINZ Address Import.
  7. --
  8. The list of datasets that were uploaded can be seen in the fork of RapiD and the JOSM download page.
  9. When you select a dataset, it becomes 'locked' for an hour.
  10. If you upload some- or, all of- that dataset, it becomes 'locked' until the next daily conflation (step 1).
  11. If you use RapiD, and click 'Ignore this feature', it gets added to the Google Sheet from step 3. Otherwise you would get prompted to add that feature forever. \

This process is a small part of the pipeline for LINZ Addresses. This page has a flowchart which shows the entire system.

Continuing partially completed layers

Some layers were partially imported between 2009 and 2016, but without the ref:linz:topo50_id=* tag. This makes conflation more difficult.

However, it is possible to re-continue these layers. This is still a work in progress.

Method A:

  1. Use overpass-turbo to identify which parts of the country are already imported. Define these areas as bboxes
  2. Update the code for that layer to skip features within those bboxes.

Method B:

  1. The data that is already in OSM is extracted using overpass-turbo, and downloaded as geojson.
  2. We loop through every feature in OSM, and find the closest feature in the LINZ data.
    • If the nearest LINZ feature is within 3 metres of the OSM feature, we add the tag ref:linz:topo50_id=* to the OSM feature. This is done in bulk using Level0 (exact steps tbc)
    • The rule above is more complicated for ways, areas, and multipolygons. We check if 80% of the nodes in OSM are within 3 metres of the a node in that LINZ feature.
  3. The next day, the conflation process will pick up the existing features in OSM, since they now have the ref:linz:topo50_id=* tag.

How do I contribute?

The tool is available here, anyone can import data. If you prefer using JOSM, you can download osmChange files from here (however, this is not the recommended option)

Potential issues

This table will be updated as the project progresses.

Issue Mitigation
Duplicate data being imported The fork of RapiD has an added feature to prevent duplicate addresses being imported based on the ref:linz:topo50_id=* tag. This conflation happens in real-time, in the browser
Multiple people editing the same dataset at the same time Users will be presented with a warning if someone else is/was editing that dataset in the last hour
Duplicate nodes when importing data that abuts existing features RapiD will intelligently re-use nodes that are already in OSM. If this is not good enough, iD#8671 will make it easy to join abutting ways.
Imported rivers/roads are disconnected from existing features ^
Imported rivers cross roads RapiD's validator will warn you about this
LINZ's data uses way too many nodes at corners We use the Douglas-Peucker algorithm to simplify the geometry during processing. The original import did not do this, so if a way imported after 2020 abuts an way imported before 2020, there may be a gap where the ways don't abut.
Ways with over 2000 nodes break OSM If an Area has >2000 nodes, it gets split into a MultiPolygon with multiple outer ways, each with at most 495 nodes.

If a MultiPolygon has >2000 nodes in one of it's ring, that ring gets split into segments with up to 495 nodes each.

LINZ's data is out of date This hasn't been an issue yet, but mappers can press CTRL + B to cycle through LINZ Aerial Imagery (2017), Maxar (2021), and the LINZ Topo50 map.
No aerial imagery available for parts of the Ross Dependency You need to use the standard OSM-Carto tileserver as your background imagery, and reference a separate map like LINZ's Ant50 series.
Merging in new features destroys the OSM object's history Fixed in iD#8708
Working with hex colours in our custom iD presets is confusing Fixed in iD#8782
Duplicate hydrographic data due to overlapping charts We only consider data from the most detailed chart available for that area. Features that cross multiple charts will be flagged and manually merged in RapiD.
Hydrographic data crosses the antimeridian We download the OSM planet extract in two chunks: west and east of the antimeridian. And we split all datasets into east/west of the antimeridian.
Some way lines and area areas cross the antimeridian For lines, we will use type=multilinestring. For areas, we will use a type=multipolygon with closure_segment=yes on the virtual boundaries
A small number of hydrographic features reference the legend of the nautical chart. These legends are not available from LDS. We will still import these features, with the tag description=see XXXXXX.txt. If these descriptions are made available, we can easily add them to the features.
The seamark tagging schema is very complicated for mappers We have created our own iD presets and rendering styles for the most common seamark tags
Some obvious data is missing (e.g. fairways, ski access lanes, coast guard stations, surf-life-saving bases, patrolled beaches) This data is managed by the local harbourmaster, and isn't included on nautical charts. We will create iD presets to make it easy to map these features.
LINZ's Topo50 data generally does not associate topographic features with names. Names are downloaded as a separate layer from the NZGB dataset. This means there will be two layers in the tool (e.g. 'Peaks' and 'Named Peaks')
type=multilinestring is not a first-class data type in OSM and is not supported by any known software. No solution, there is no other way to represent a discontiguous linear feature.
MultiPoints (site relations) are not a first-class data type in OSM and aren't supported by the planet-extraction software we use. No current solution, these features are skipped by the conflation tool (E.g. relation Redwood Station)
Hydrographic data is some areas is completely missing This issue is unresolved and still under investigation