AutoAWS

From OpenStreetMap Wiki
Jump to navigation Jump to search

autoAWS is a PHP script that automatically synchronises postal addresses of Denmark in OpenStreetMap with the official government database of postal addresses.

autoAWS was initially announced on the Danish Talk-dk mailinglist in April 2018.

For questions or comments regarding this project, please get in touch with OSM user atcomapper or follow the Talk-dk mailing list.

The script runs on a dedicated server, and the update frequency ensures that all Danish addresses are synchronised once daily (Mon-Fri). Changes done to OpenStreetMap data by the script are performed as the OSM user autoAWS.

Background

Address data in Denmark has been maintained more or less automatically since the initial import in 2009. The Danish government publishes official address data of very high quality which is easily accessible via an open API.

Previously, user AWSbot was the account used to run a script used to maintain Danish addresses. The source code for this script is available on GitHub: https://github.com/AWSbot/PHPscript. The AWSbot script was, however, finally turned off in mid 2017 after years of complaints over various problems with it.

AWSbot served as the main inspiration for autoAWS, and, although a completely new script was written, some of the general logic from the AWSbot script were reused.

Goals

The goal of this automatic data import is to maintain the existing very high quality of address tags in Denmark by adding newly created addresses, deleting addresses that no longer exist and update tags on existing address nodes that contain old data.

Schedule

The script runs once every minute and updates one postcode per run. Some postcoes take only a few seconds to check, while some of the larger ones can take a few minutes, depending on the number of updates. The main bottleneck is actually submitting the changes to the OSM API. With 1440 minutes during a day - even while assuming a few of the updates will fail and have to restart, or last longer than one minute, this frequency ensures that all Danish postcodes (1089 at the time of writing) are updated at least once daily. The script does not run on Saturday and Sunday, since it is assumed the government database of addresses is not updated during the weekend. Any changes that do happen during a weekend are imported on the following Monday.

Imported Data

Data will be imported from Danmarks Adresseregister[1] (acronym: DAR, English: The Danish Address Register[2]), via a web service known as Danmarks Adressers Web API[3] (acronym: DAWA, English: Web API of Danish Addresses). DAWA itself is a part of a suite of address services known as SDFE's Adresse Web Services (acronym: AWS) aka. AWS Suiten[4]. DAWA is run and maintained by the state agency SDFE ("Styrelsen for Dataforsyning og Effektivisering", English: The Danish Agency for Data Supply and Efficiency[5]). Note that the terms DAR, AWS and DAWA might be used interchangeably on this page.

The license terms under which data from AWS Suiten are distributed (unfortunately only available in Danish) provides a worldwide, no-cost, non-exclusive and otherwise unlimited rights to use the data which can be freely copied, distributed, published, changed, combined with other data and used both commercially and non-commercially. To identify the data source in OpenStreetMap the tag source=Danmarks Adresseregister will be added to all imported data, however, any existing source=* tag of a node will never be overwritten.

The API documentation, in Danish only, can be found at dawa.aws.dk/dok/api. The code implementing the DAWA API is itself open source software, distributed under the terms of the MIT license. In-code documentation is written in English, but fields and values are still in Danish (see fx. packages/server/apidoc/schema.json).

Data Preparation

The old address tagging scheme in Denmark contained a number of system-specific tags that were of little or no use to data consumers. As addresses were updated, some of these tags were removed/replaced by more recognisable tags. For example, osak:municipality_name=* was replaced by addr:municipality=*. For a complete list of tags, please see the following sections.

Including OIS fixes

Some addresses in DAR contain errors. In particular, some street names are abbreviated in the database. This is the result of a limitation in an earlier version of DAR where the street name could not exceed 20 characters.

An effort has been made to make a list of any such errors so that they can be corrected - both in OSM and eventually directly in DAR.

The list of fixes, available here, is used to correct data imported from DAR.

Anyone noticing errors in the imported data is strongly encouraged to contact the local municipality (kommune), and request that they correct the error directly in DAR. This has the added benefit that any other organization using data from DAR will also receive improved data. Correcting mistakes directly in OSM, without contacting the local municipality, is discouraged.

Description of the Script Workflow

The autoAWS script is written in PHP and constructed around an SQL database. PHP is used as an interface between the various API's and the database layer, whereas as much of the data processing as possible happens in the database itself using SQL. All updates are automatically submitted to the OSM API (node PUT and DELETE calls) using cURL.

An update is run for one Danish postcode at the time. At the time of writing, Denmark is divided into 1.089 different postcodes.

Overpass API is used to identify existing address nodes in OSM by searching for all nodes containing the osak:identifier=* tag and the addr:postcode=nnnn tag where nnnn is the postcode currently being updated.

All nodes found via Overpass API are downloaded using the OSM API.

The corresponding address data for the given postcode is downloaded from the DAWA API.

Address data from OSM is compared with address data from DAR using SQL. The two database tables are joined on the osak:identifier=* key.

Updating addresses

Any one of the following conditions will trigger an update, however, any node with the autoaws=ignore tag present will be ignored:

  • The position (lat and lon) of the node is not equal to the DAR address position (coordinates are rounded to 6 decimal points)
  • addr:city=* is not equal to the DAR city name
  • addr:housenumber=* is not equal to the DAR house number
  • addr:street=* is not equal to the DAR street name (corrected version using the OIS fixes list)
  • addr:municipality=* is not equal to the DAR municipality name
  • addr:place=* is not equal to the DAR supplementary city name

The mismatching tag values will be updated, and the node is uploaded via the OSM API. Any additional tags on the node (non address tags, for example shop=*) will be preserved without change.

When updating an address, the following tags, if present, will be removed:

Deleting addresses

The data is tested for any osak:identifier=* values that exist in OSM data but do not exist in DAR data. Such nodes are added to a database table for deletion. Nodes with the autoaws=ignore tag will not be deleted, even if the address no longer exists in DAR.

Using the osak:identifier=* tag for this comparison means that if, for some reason, an address node has a wrong osak:identifier value, the address node will be deleted. However, the script will then notice the missing address and add a new node, with the correct ID.

There are three possible outcomes when an address is to be deleted:

  • If the node does not have any additional (non address) tags, a DELETE call is sent to the OSM API, deleting the entire node.
  • If the node does contain additional tags, it is first checked if a new address is going to be added in the same position (lat and lon pair) as the address being deleted. (This can happen if the address ID has, for some reason, been changed in DAR.) If a new address is being added in the same position, the old node will be deleted and the additional tags will be transferred to the new address node.
  • If the node contains additional tags, but no new address is going to be added in the same position, all address tags are removed, and an updated node containing only the non-address tags is submitted to the OSM API. A fixme=* tag will be added to such nodes, explaining that the address no longer exists, and the additional tags should be manually verified.

Adding addresses

Similarly to how addresses are picked for deletion, for any address IDs that exist in DAR data, but does not exist in OSM data, the address is saved as a new address to be added to OSM. A new node will be CREATEd and pushed to the OSM API. The new node will contain the following tags:

Reverting changes

A procedure for automatically reverting changes done by autoAWS currently does not exist.

A few examples have been observed of the script adding duplicate data (new addresses added twice). This rare issue was caused by a race condition in the script, which has been fixed in later versions, combined with the fact that the data in Overpass API is lagging slightly behind the data in the OSM database. It is important to note that in case duplicate addresses are added, the script will automatically detect and remove them the next day. No manual intervention should be required at any time.

Version history

autoAWS 0.1

17 April 2018

  • First published draft.


autoAWS 0.2

30 April 2018

  • Addresses with the ois:fixme=* tag will not be updated (but will still be deleted if the address no longer exists)
  • Street name fixes from https://oisfixes.iola.dk/ are now included when downloading AWS data
  • Added error handling in case overpass API is down
  • Functions for node handling (update, create, delete) added
  • Logic added to handle cases where tags from an address being deleted need to be transferred to a new address:
  • - If an address is changed but the position remains the same, additional tags will be kept
  • - If an address is deleted, address tags will be removed but additional tags will be kept. The following is added: fixme=This address no longer exists. Please check if the tags on this node are still valid

First tests on live data run 2 May 2018. See https://www.openstreetmap.org/changeset/58616759 & https://www.openstreetmap.org/changeset/58617113.


autoAWS 0.3 rc1

6 May 2018

  • Fixed a bug where nodes were not being correctly updated
  • Improved performance significantly by bulk-downloading nodes from the OSM API instead of downloading nodes individually
  • Fixed a bug where additional (non-address) tags were not being correctly handled on deleted addresses
  • The Supplerende Bynavn from DAR is now imported and added as the addr:place=* tag. The osak:subdivision=* tag is removed when an address is updated
  • osak:municipality_name=* will now be replaced with addr:municipality=* when an address is updated. New addresses will also get the addr:municipality=* tag
  • Added a debug-mode to make testing easier
  • Fixed an encoding bug where special characters (such as letters æ, ø, å) were garbled
  • Following the discussion here, nodes with ois:fixme=* will no longer be ignored by the script. Instead, nodes with autoaws=ignore will be ignored.
  • addr:country=DK is now added to nodes where the the addr:country=* tag is missing
  • Added logic to handle cases where a postcode is discontinued or a new postcode is established


autoAWS 0.4

13 May 2018

  • Added handling of very large edits (will be split into multiple changesets if needed; OSM does not accept changesets larger than 10.000 edits)
  • Added an option to manually start an update for a given postcode
  • Fixed a bug where extra tags were not always transferred to new address nodes in vary large postcodes
  • Improved handling of addresses moving between postcodes. Including cases where a postcode is split into multiple postcodes (new postcodes being established) and cases where multiple postcodes merge to one (old postcodes being discontinued)
  • Supplementary city name will be discarded if equal to the postcode city name. So addr:city=* and addr:place=* tags with equal values will be avoided.
  • Fixed a bug where duplicate address nodes were not being deleted
  • Code refactoring, additional error handling, minor performance improvements

First larger edits (more than 10 addresses): https://www.openstreetmap.org/changeset/58825242


autoAWS 0.5 rc2

12 June 2018

  • Fixed a bug where changes were not submitted to the last 99 nodes in a postcode if the last node downloaded had an invalid (more than 32 characters) osak:identifier=*
  • The address ID (osak:identifier=* tag) is now returned in the same format used in DAR (lower case, with a dash after 8, 12, 16 and 20 characters). For unknown reasons, the script previously used converted ID's to uppercase and removed the dashes, which made it hard to look up the address in DAR.
  • osak:revision=* tag retired
  • Database optimization and minor code refactoring
  • Fixed a bug where existing duplicate address nodes in OSM were not deleted until the second update of a postcode
  • Added handling of cases where creating a changeset fails
  • ois:fixme=* tags will now also be removed from a node if an address is deleted
  • If an address node needs to be updated, and the node is a member of a way, a new address node will now be created with the updated values, instead of updating the old node. This is to prevent ways becoming deformed because an address node is moved.

autoAWS 1.0

No changes compared to 0.5.

Source code: tag named v1.0 in GitLab repository


autoAWS 1.1

16 August 2018

First version to be run continuously on a dedicated server.

  • Fixed a bug related to the deletion of postcodes
  • Increased max execution time to prevent the script from terminating during the processing of very large changesets (needed because the processing time of individual addresses was increased by some of the changes introduced in 0.5)
  • Add an option to force an update of all addresses, useful when changing the tagging scheme
  • Add an option to force an update of an address node if a certain tag is present, useful when changing the tagging scheme

Source code: tag named v1.1 in GitLab repository


autoAWS 1.2

6 December 2018

  • Small fix that automatically resets the script if it got stuck because it was terminated unexpectedly

autoAWS 1.3

13 August 2019

  • Minor performance improvements (nodes from OSM are now downloaded in chunks of 500 instead of 100)
  • Various improvements to reduce memory usage (previous version was crashing on a server with 512MB RAM)
  • autoAWS is now once again running continuously on a server, ensuring extremely frequent, fully automated updates

autoAWS 1.3a

2 June 2020

  • Small bugfix to catch an error in address data that caused the script to crash

autoAWS 1.4

2 June 2020

  • Added additional data validation to prevent crashes in case of invalid data in DAR
  • Added logging to make debugging easier

autoAWS 1.5

3 August 2020

  • Addresses with status Preliminary (Foreløbig) in DAR will no longer be included. Only valid and active addresses (Gældende) will be submitted to OSM. At the time of writing, about 3,400 addresses in DAR have the Preliminary status.

References