autoAWS is a PHP script that automatically synchronises postal addresses in Denmark with the official government database of postal addresses.
autoAWS was initially announced on the Danish Talk-dk mailinglist in April 2018.
For questions or comments regarding this project, please get in touch with OSM user or follow the Talk-dk mailing list.
The script runs on a dedicated server, and the update frequency ensures that all Danish addresses are synchronised once daily (Mon-Fri). Changes done to OpenStreetMap data by the script are performed as the OSM user .
- 1 Background
- 2 Goals
- 3 Schedule
- 4 Imported Data
- 5 Data Preparation
- 6 Description of the Script Workflow
- 7 Reverting changes
- 8 Version history
- 9 References
Address data in Denmark has been maintained more or less automatically since the initial import in 2009. The Danish government publishes official address data of very high quality which is easily accessible via an open API.
Previously, user was the account used to run a script used to maintain Danish addresses. The source code for this script is available on GitHub: https://github.com/AWSbot/PHPscript. The AWSbot script was, however, finally turned off in mid 2017 after years of complaints over various problems with it.
AWSbot served as the main inspiration for autoAWS, and, although a completely new script was written, some of the general logic from the AWSbot script were reused.
The goal of this automatic data import is to maintain the existing very high quality of address tags in Denmark by adding newly created addresses, deleting addresses that no longer exist and update tags on existing address nodes that contain old data.
The script is automatically run on the server using a cron job. It is run every minute from the beginning of an hour until minute 50 of that hour. A small cleanup script is then run within the 10 minutes window until the next hour starts, which essentially ensures that the script is restarted in case it encounters problems/gets stuck. One post code (zip) is synchronised per run. With up to 51 runs per hour, 1.224 runs per day, all Danish post codes (1.089 at time of writing) are updated at least once daily. Note that the script does not run on Saturday and Sunday, since it is assumed the government database of addresses is not updated during the weekend. Any changes that do happen during a weekend are imported on the following Monday.
Data will be imported from Danmarks Adresseregister (acronym: DAR, English: The Danish Address Register), via a web service known as Danmarks Adressers Web API (acronym: DAWA, English: Web API of Danish Addresses). DAWA itself is a part of a suite of address services known as SDFE's Adresse Web Services (acronym: AWS) aka. AWS Suiten. Note that the terms DAR, AWS and DAWA might be used interchangeably on this page.
The license terms under which data from AWS Suiten are distributed (unfortunately only available in Danish) provides a worldwide, no-cost, non-exclusive and otherwise unlimited rights to use the data which can be freely copied, distributed, published, changed, combined with other data and used both commercially and non-commercially. To identify the data source in OpenStreetMap the tag source=Danmarks Adresseregister will be added to all imported data, however, any existing source=* tag of a node will never be overwritten.
The old address tagging scheme in Denmark contained a number of system-specific tags that were of little or no use to data consumers. As addresses were updated, some of these tags were removed/replaced by more recognisable tags. For example, osak:municipality_name=* was replaced by addr:municipality=*. For a complete list of tags, please see the following sections.
Including OIS fixes
Some addresses in DAR contain errors. In particular, some street names are abbreviated in the database. This is the result of a limitation in an earlier version of DAR where the street name could not exceed 20 characters.
An effort has been made to make a list of any such errors so that they can be corrected - both in OSM and eventually directly in DAR.
The list of fixes, available here, is used to correct data imported from DAR.
Anyone noticing errors in the imported data is strongly encouraged to contact the local municipality (kommune), and request that they correct the error directly in DAR. This has the added benefit that any other organization using data from DAR will also receive improved data. Correcting mistakes directly in OSM, without contacting the local municipality, is discouraged.
Description of the Script Workflow
The autoAWS script is written in PHP and constructed around an SQL database. PHP is used as an interface between the various API's and the database layer, whereas as much of the data processing as possible happens in the database itself using SQL. All updates are automatically submitted to the OSM API (node PUT and DELETE calls) using cURL.
An update is run for one Danish postcode at the time. At the time of writing, Denmark is divided into 1.089 different postcodes.
Overpass API is used to identify existing address nodes in OSM by searching for all nodes containing the osak:identifier=* tag and the addr:postcode=nnnn tag where nnnn is the postcode currently being updated.
All nodes found via Overpass API are downloaded using the OSM API.
The corresponding address data for the given postcode is downloaded from the DAWA API.
Address data from OSM is compared with address data from DAR using SQL. The two database tables are joined on the osak:identifier=* key.
- The position (lat and lon) of the node is not equal to the DAR address position (coordinates are rounded to 6 decimal points)
- addr:city=* is not equal to the DAR city name
- addr:housenumber=* is not equal to the DAR house number
- addr:street=* is not equal to the DAR street name (corrected version using the OIS fixes list)
- addr:municipality=* is not equal to the DAR municipality name
- addr:place=* is not equal to the DAR supplementary city name
The mismatching tag values will be updated, and the node is uploaded via the OSM API. Any additional tags on the node (non address tags, for example shop=*) will be preserved without change.
When updating an address, the following tags, if present, will be removed:
- osak:subdivision=* (replaced by addr:place=*)
- osak:house_no=* (duplicate of addr:housenumber=*)
- osak:municipality_no=* (unique ID of the municipality in DAR, of little or no use to map users)
- osak:street_name=* and osak:street=* (duplicate of addr:street=*)
- osak:street_no=* (unique ID of the street in DAR, of little or no use to map users)
- osak:revision=* (date/time of the last change of the address in DAR, superfluous in OSM since all nodes have a version and full history)
The data is tested for any osak:identifier=* values that exist in OSM data but do not exist in DAR data. Such nodes are added to a database table for deletion. Nodes with the autoaws=ignore tag will not be deleted, even if the address no longer exists in DAR.
Using the osak:identifier=* tag for this comparison means that if, for some reason, an address node has a wrong osak:identifier value, the address node will be deleted. However, the script will then notice the missing address and add a new node, with the correct ID.
There are three possible outcomes when an address is to be deleted:
- If the node does not have any additional (non address) tags, a DELETE call is sent to the OSM API, deleting the entire node.
- If the node does contain additional tags, it is first checked if a new address is going to be added in the same position (lat and lon pair) as the address being deleted. (This can happen if the address ID has, for some reason, been changed in DAR.) If a new address is being added in the same position, the old node will be deleted and the additional tags will be transferred to the new address node.
- If the node contains additional tags, but no new address is going to be added in the same position, all address tags are removed, and an updated node containing only the non-address tags is submitted to the OSM API. A fixme=* tag will be added to such nodes, explaining that the address no longer exists, and the additional tags should be manually verified.
Similarly to how addresses are picked for deletion, for any address IDs that exist in DAR data, but does not exist in OSM data, the address is saved as a new address to be added to OSM. A new node will be CREATEd and pushed to the OSM API. The new node will contain the following tags:
- source=Danmarks Adresseregister
- addr:place=* (only if DAR contains a supplementary city name)
A procedure for automatically reverting changes done by autoAWS currently does not exist.
A few examples have been observed of the script adding duplicate data (new addresses added twice). This rare issue is caused by a race condition in the script, combined with the fact that the data in Overpass API is lagging slightly behind the data in the OSM database. It is important to note that in case duplicate addresses are added, the script should automatically detect and remove them the next day. No manual intervention should be required at any time.
17 April 2018
- First published draft.
30 April 2018
- Addresses with the ois:fixme=* tag will not be updated (but will still be deleted if the address no longer exists)
- Street name fixes from https://oisfixes.iola.dk/ are now included when downloading AWS data
- Added error handling in case overpass API is down
- Functions for node handling (update, create, delete) added
- Logic added to handle cases where tags from an address being deleted need to be transferred to a new address:
- - If an address is changed but the position remains the same, additional tags will be kept
- - If an address is deleted, address tags will be removed but additional tags will be kept. The following is added: fixme=This address no longer exists. Please check if the tags on this node are still valid
autoAWS 0.3 rc1
6 May 2018
- Fixed a bug where nodes were not being correctly updated
- Improved performance significantly by bulk-downloading nodes from the OSM API instead of downloading nodes individually
- Fixed a bug where additional (non-address) tags were not being correctly handled on deleted addresses
- The Supplerende Bynavn from DAR is now imported and added as the addr:place=* tag. The osak:subdivision=* tag is removed when an address is updated
- osak:municipality_name=* will now be replaced with addr:municipality=* when an address is updated. New addresses will also get the addr:municipality=* tag
- Added a debug-mode to make testing easier
- Fixed an encoding bug where special characters (such as letters æ, ø, å) were garbled
- Following the discussion here, nodes with ois:fixme=* will no longer be ignored by the script. Instead, nodes with autoaws=ignore will be ignored.
- addr:country=DK is now added to nodes where the the addr:country=* tag is missing
- Added logic to handle cases where a postcode is discontinued or a new postcode is established
13 May 2018
- Added handling of very large edits (will be split into multiple changesets if needed; OSM does not accept changesets larger than 10.000 edits)
- Added an option to manually start an update for a given postcode
- Fixed a bug where extra tags were not always transferred to new address nodes in vary large postcodes
- Improved handling of addresses moving between postcodes. Including cases where a postcode is split into multiple postcodes (new postcodes being established) and cases where multiple postcodes merge to one (old postcodes being discontinued)
- Supplementary city name will be discarded if equal to the postcode city name. So addr:city=* and addr:place=* tags with equal values will be avoided.
- Fixed a bug where duplicate address nodes were not being deleted
- Code refactoring, additional error handling, minor performance improvements
First larger edits (more than 10 addresses): https://www.openstreetmap.org/changeset/58825242
autoAWS 0.5 rc2
12 June 2018
- Fixed a bug where changes were not submitted to the last 99 nodes in a postcode if the last node downloaded had an invalid (more than 32 characters) osak:identifier=*
- The address ID (osak:identifier=* tag) is now returned in the same format used in DAR (lower case, with a dash after 8, 12, 16 and 20 characters). For unknown reasons, the script previously used converted ID's to uppercase and removed the dashes, which made it hard to look up the address in DAR.
- osak:revision=* tag retired
- Database optimization and minor code refactoring
- Fixed a bug where existing duplicate address nodes in OSM were not deleted until the second update of a postcode
- Added handling of cases where creating a changeset fails
- ois:fixme=* tags will now also be removed from a node if an address is deleted
- If an address node needs to be updated, and the node is a member of a way, a new address node will now be created with the updated values, instead of updating the old node. This is to prevent ways becoming deformed because an address node is moved.
No changes compared to 0.5.
Source code: tag named v1.0 in GitLab repository
16 August 2018
First version to be run continuously on a dedicated server.
- Fixed a bug related to the deletion of postcodes
- Increased max execution time to prevent the script from terminating during the processing of very large changesets (needed because the processing time of individual addresses was increased by some of the changes introduced in 0.5)
- Add an option to force an update of all addresses, useful when changing the tagging scheme
- Add an option to force an update of an address node if a certain tag is present, useful when changing the tagging scheme
Source code: tag named v1.1 in GitLab repository
6 December 2018
- Small fix that automatically resets the script if it got stuck because it was terminated unexpectedly
13 August 2019
- Minor performance improvements (nodes from OSM are now downloaded in chunks of 500 instead of 100)
- Various improvements to reduce memory usage (previous version was crashing on a server with 512MB RAM)
- autoAWS is now once again running continuously on a server, ensuring extremely frequent, fully automated updates