Maa-amet building geometry update

From OpenStreetMap Wiki
Jump to navigation Jump to search

Maa-amet building geometry update is an import of Maa-amet's Estonian Topographic dataset (ETAK) which is of type building dataset covering Harjumaa in Estonia. The import is currently (as of 2021-06-03) in progress. Import account is fghj753_import. (Documentation will be updated after import is done)

Background to problem

In September 2008, undocumented building import by user Verbatium was performed around Tallinn, Estonia that later became known as the Verbatium import. Import ran from 6th of September to 8th of September and added at least total of 50 thousand new buildings. Import is virtually undocumented and there haven't been significant effort from community to address the issue due to concerns regarding legality of the source used for import. Despite Estonian building addresses are updated automatically, shapes of houses from 2008 still remain. As of 20th of May 2021 there were 37653 unmodified buildings remaining from that import[1].

This year i started investigating publicly available records regarding Verbatium user and his import. Long story short, while community believed data to be originating from Garmin (due to use of Type=0x13), imported footprints have remarkable similarity with Estonian Land Board's (Maa-amet) 1996-2007 Basic Map. There's even screenshot made by Verbatium's wiki user showing using same map layer in JOSM. For further proof, import added lot's of small non-existent buildings such as way 26905970, which were possibly artefact noise and mostly removed in changeset/13479101[2].

For the record - attempts to contact Verbatium / Kartograff were unsuccessful as his email address (retreived from talk-ee mailing list) kartograff (at) hot.ee is no longer operational. I have also consulted with Regio and EOMap (prominent local mapping companies), Estonian Rescue Board (Verbatium's employer), Estonian Land Board (likely source of import), local community members (including Rescue Board's former IT manager and talk-ee mailing list). Mapping organizations did not find any proof to suggest if data originates from them, but they admitted, that if it did, incident took place so long time age they don't consider acting worth the hassle.

If Verbatium's data did originate from Maa-amet, they gave formal permission to use their data for OSM in 2009[3]. However, while discussing old import with them I was told that WMS services were meant for public use even before, since service was first published in 2007. Currently vector data has been public since July 2018 and used for OSM since 2019[4]

Screenshot of table showing most likely import changesets

Goals

Objective: update as many Verbatium's building shapes as possible while...

  • keeping potential collateral damage (such as modifying node shared between two ways) minimal.
  • Preserving tags and history of buildings.

Schedule

Be sure to list the general timeframe of your project. Upload script takes about 9-11 hours to run. Source dataset is updated on weekly basis. Therefore it makes most sense to run import on workday. Estonian OSM has least activity on weekdays. To not affect normal mappers activities, import should be run at nighttime, starting around 7-9PM UTC. Most Estonian mappers are active 8AM-9PM UTC [5].

I also looked at statistics of OSM API DB server load[6], which will probably be most affected by the import. Since the logging was made public two weeks ago, all 16 days data was used to calculate best time suitable for import. Assuming upload will take 10 hours, it was calculated when there will be lowest average relative load over period of next 10 hours. Relative load means simply how does metric X compare against it's own daily maximum.

On the chart below, gathered from OSM's new Prometheus/Grafana server on 21st of May, it can be seen that best time to start 10-hour resource intensive task is around 19:00 UTC.

Average server load vs average load during 10h upload

Import Data

Background

Same dataset has previously been used for multiple imports, most notably by fully automatic imports since 2019 by SviMik.

Data source site: https://geoportaal.maaamet.ee/est/Ruumiandmed/Eesti-topograafia-andmekogu-p79.html
Data license: https://geoportaal.maaamet.ee/docs/Avaandmed/ETAK_ruumiandmete_litsentsileping.pdf
Link to permission (if required): Mailing list reference url - https://lists.openstreetmap.org/pipermail/talk-ee/2009-September/000171.html
OSM attribution (if required): Previous imports have added attribution as source=Maa-amet 2021 to changeset directly.
ODbL Compliance verified: yes

OSM Data Files

To prevent conflicts due to object edited shortly before import, data in OSM-compatible format is generated on the run automatically based on state of OSM map at the time script would be started. Source data will be derived from E_401_hoone_ka.shp available on Maa-amet's Geoportal. File can be downloaded as part of "Buildings SHP-file (~120M)". OSM data file is genereated by Osmapi (Python library).

Import Type

This import is currently planned as single one-time run, but script was designed to be capable of running multiple times. Script could be reused again in 2023/24 with minimal rewrite to update documented 2013 building import, that also used vectorization to generate building footprints. Running script in 2030 to update 2019 import is not currently necessary, as those buildings were already added from same dataset.

Edits are uploaded using Osmapi Python library.

Data Preparation

Data Reduction & Simplification

Describe your plans, if any, to reduce the amount of data you'll need to import. Examples of this include removing information that is already contained in OSM or simplifying shapefiles.

Script will NOT add any new ways, nodes will be added as last resort when OSM building shape has less nodes than source dataset. To prevent modifying by-standing buildings due to shared nodes or modifying tagged elements, following conditions must be met, before script modifies way.

  • Filtering done as part of Overpass query[7]
    • All nodes of a building must be last modified at least 10 years ago
    • No node can be shared between building and non-building
  • Filtering done by script itself (using OSM API)[8]
    • Every node of a building can only be part of just one way (the building)
    • None of the nodes can't have any tags (some buildings had entrance defined)
    • Building in OSM must not have more nodes than reference dataset (some buildings were added as 5-node rectangles such as way 26860921.

Tagging Plans

Not applicable in usual scope as import focuses on untagged nodes and will not add significant tags to buildings. However, removal of selected deprecated, duplicated and undocumented tags are planned.

Key Action Description
maaamet:ETAK=* Updated, added if missing Tag used by previous imports. Acts as foreign key to source dataset
CityIdx=213 Replace (46 uses) with addr:city=Tallinn Undocumented tag added in later phase of original import. All buildings with this tag are in Tallinn, but most of them don't have addr:city=Tallinn
created_by=* Remove (60 uses) Added to some ways during the last changeset of the original import
name=* Remove if duplicated (40..80 uses) Estonian address tagging convention identifies individual buildings by either addr:housenumber=* or addr:housename=*. If name=* duplicates address tags, name will be removed.
height=* Added Dataset also contains lidar-measured building height data for some buildings. While script's source code features replacement of building:height=*, tag is already rare enough that this import won't affect it's usage. Height is only added if any nodes of the building were modified, height=* is not present yet and height recorded in dataset is at least 3 m.

On other hand, this key should not be added, because adding height=* disables building levels and consequently roof-shape quests on StreetComplete.

Changeset Tags

Key Value
created_by osmapi/1.2.2
source Maa-amet 2021
import yes
comment Maa-amet building geometry import #MA-geom-21-05

Data Transformation

Describe the transformations you'll need to conduct, the tools you're using, and any specific configurations or code that will be used in the transformation.

Only significant transformation is converting source's EPSG:3301 to OSMs WGS84 coordinates following algorithm of PHP function provided by Land Board[9]. Link to script used for modification: https://github.com/kallejre/geo/tree/main/building%20geometry%20importer/building%20footprint%20updater. Note: code comments are mostly in Estonian.

Data Merge Workflow

Team Approach

Describe if you'll be doing this solo or as a team. - Solo

References

List all factors that will be evaluated in the import. Considering documentation of previous import, this page seems to be far too detailed and I don't know what to write here. For additional references to older mailing lists, see "See also" below.

Workflow

Detail the steps you'll take during the actual import.

Information to include:

  • Step by step instructions (assuming this means script's algorithm)
    • Download OSM buildings with Overpass query
      • Since Kumi was said to have sometimes integrity issues and script makes just one OP query, regular Overpass-api.de server will be used.
      • To prevent editing conflicts due to multiple users editing same building, see below.
    • Each OSM building is matched with building(s) from Maa-amet's ETAK dataset.
      • Currently every pair of buildings, that has at least 15% bounding box coverage, are recorded. This approach works because most buildings to be processed are detached houses located in suburbs. Map analysis from 2019.
      • To speed up processing, script implements spatial index using 1x1 km grid (hence the 1 km below)
    • All buildings, where more than 1 ETAK building matches OSM footprint by at least 15% are ignored.
    • Run additional safety checks while using OSM API for reading.
      • Every node of a building can only be part of just one way (the building)
      • None of the nodes can't have any tags (some buildings had entrance defined)
      • Building in OSM must not have more nodes than reference dataset.
        • Due to vectorization mistakes, some buildings were added as 5-node rectangles with unnecessary 5th node on one side, these buildings won't be updated. Example: way 26860921
      • Building nor any of the nodes can be modified in last 24 hours. This is meant to prevent editing conflicts, assuming overpass won't have more than 6 h lag, nodes are then guaranteed to be at least 10 years old.
    • Start updating building in OSM server
      • Starting from first node of way, match every existing node to the nearest unmatched corner of reference dataset
      • Write down where new nodes need to be added
      • Update nodes
        • If node isn't needed to be moved, don't move it
      • Update building
        • Update way's references to nodes
        • Update maaamet:ETAK=* and height=*
        • Optional: Remove 3 predefined deprecated tags
  • Changeset size policy
    • Changes are grouped into grid with size of 1 km
    • Grid processing starts from south-west corner moving 10 squares east, then 1 row up.
    • After finishing top row, changeset is closed and new one opened from bottom row.
      See illustrative screenshot. Note: This is not actual distribution of changesets, but helps to understand changeset region creation.
    • Real changesets are uploaded in 10km wide zones, up to 10'000 changes per changeset. To keep geographical changeset size under control, no changeset will be wider than 10 km.
      Screenshot of changesets distribution from testing VM. There's total of 40 changesets.
  • Revert plans
    • In testing environment, rollbacks were done at hypervisor level resetting entire VM back to previous snapshot.
    • For live environment, simple-revert appears to be best suited for task.
    • Script logs number of every completed grid cell. In case network connection fails or process is aborted, script can be resumed from the unfinished cell.

Conflation

Identify your approach to conflation here.

As described above, need for conflation is minimal by assuring there should be just one OSM building to match each building shape to be imported. New geometry is applied to existing OSM buildings by modifying nodes and adding new nodes only if needed. Each OSM node is matched to the closest node in source dataset, to keep nodes roughly at same corners.

Results

This chapter is based on 5 import tests ran on local testing instance of OSM API. No public servers were harmed in the process. Script maintains two logs for statistic purposes: Error messages encountered while processing buildings and node stats for successfully modified footprints. Summary of former is listed below in a table, latter is simply basic table with way ID, nodes and number of nodes added. Following picture shows screenshots of some buildings, where import added most nodes to buildings.

Amount Output code
32364 Building updated successfully
5198 No suitable building with at least 15% overlap found
2594 OSM shape is more detailed than source dataset
1897 Building has a node shared with other building
1424 Choice between multiple buildings
442 Multiple reasons (e.g building has tagged node (entrance) + OSM has more nodes than source)
6 Version conflict error (script tried to update same node twice)
11561 Modification failed
32364 Modification successful
43925 Total buildings processed

Nodes estimation: bit over 4000 nodes will be created, 275000 nodes will be modified. For reference: 2008 import made 400 thousand changes in single day, 2013 import added around 50 thousand nodes per day for a month. 2019 import had around 50-90 thousand changes on most active days.

QA

Most likely errors may happen when updated geometry starts overlapping with nearby unupdated buildings or in few occurrences there can be building with a courtyard, that was initially imported without courtyard but new dataset contains courtyard as just part of a building. These solutions need manual post-processing with aid of QA tools. Osmose seems to be best suitable for current situation, as it supports both self-intersecting and overlapping buildings. Overlapping is sometimes also caused by older imports.

Due to safety checks meant to prevent script from modifying unrelated elements (such as footway leading to building entrance), only 32 thousand out of 50 thousand of original import's footprints will be updated. Out of unmodified 18 thousand, 11 thousand were included in Overpass query [1], but didn't pass checks described above and 7 thousand were filtered by Overpass due to either sharing node with non-building or being partially modified since 2011.

See also

The email to the Imports mailing list was sent on 2021-05-30/31 and can be found in the archives of the mailing list.

Local talk-ee mailing list is mirrored in google groups website. Notice about this import was sent to talk-ee on 2021-05-27 (talk-ee (OSM), mirror (Google)) to same thread, which discussed source of Verbatium data in April 2021. Discussions about replacing Verbatium with something more modern date back to 2013 (talk-ee, mirror), first concerns about quality of data used were voiced two months after import took place. In 2019 there was proposal to delete buildings and add new ones using standard building import process, but it didn't get strong support as large portion of buildings would have been deleted (talk-ee, mirror).


References