AutoAWS

From OpenStreetMap Wiki
Jump to: navigation, search

autoAWS is a web app for automatically maintaining address nodes within Denmark.

autoAWS is currently under development. This wiki page serves as an information hub for the project and will be updated regularly. It has been created to comply with step 4 of the import guidelines

autoAWS was initially announced on the danish talk-dk mailinglist in april 2018.

For questions or comments regarding this project, please get in touch with OSM user JKHougaard or follow the Talk-DK mailing list.


Background

Address data in Denmark has been maintained more or less "automatically" since the initial import in 2009. The Danish government publishes official address data of very high quality which is easily accessible via an open API.

Until recently, user AWSbot was the account used to run a script used to maintain Danish addresses. The source code for this script is available on GitHub: https://github.com/AWSbot/PHPscript. The AWSbot script was, however, finally turned off in mid 2017 after years of complaints over various problems with it.

AWSbot has served as the main inspiration for autoAWS, and, although a completely new script has been written, some of the general working principles from the AWSbot script have been reused.

Goals

The goal of this automatic data import is to maintain the existing very high quality of address tags in Denmark by adding newly created addresses, deleting addresses that no longer exist and update tags on existing address nodes that contain old data.

Schedule

The script will be designed so that it can be automatically run on a scheduled basis. The rate will be adjusted so that each postcode is updated roughly every 3 months, perhaps with an option to manually force an updated of a single postcode.

Import Data

Data will be imported from the Danish Address Register, DAR (http://danmarksadresser.dk/dar), via the AWS suite (http://aws.dk/) address web API, DAWA (http://dawa.aws.dk/). Note that the terms DAR, AWS and DAWA might be used interchangeably on this page.

The data license, available here in Danish only: http://aws.dk/licens, provides a worldwide, free, non-exclusive, unlimited access to the data which can be copied, distributed, published, changed, combined with other data and used both commercially and non-commercially. source=Danmarks Adresseregister will be added to all imported data (however any source=* tag on existing nodes will not be overwritten)

Data Preparation

The current address tagging scheme in Denmark contains a number of system-specific tags that are of little or no use to data consumers. As addresses are updated, some of these tags will be removed/replaced by more recognisable tags. For example, osak:municipality_name=* will be replaced by addr:munucipality=*. For a complete list of tags, please see the following sections.

Including OIS fixes

Some addresses in DAR contain errors. In particular, some street names are abbreviated in the database. This is the result of a limitation in an earlier version of DAR where the street name could not exceed 20 characters.

An effort has been made to make a list of any such errors so that they can be corrected - both in OSM and eventually directly in DAR.

The list of fixes, available here, is used to correct data imported from DAR.

Description of the Script Workflow

The autoAWS script is written in PHP and constructed around an SQL database. PHP is used as an interface between the various API's and the database layer, whereas as much of the data processing as possible happens in the database itself using SQL. All updates are automatically submitted to the OSM API (node PUT and DELETE calls) using cURL. Note that the OSM API seems to be the main bottleneck, execution-time wise, in the script.

An update is run for one Danish postcode at the time. At the time of writing, Denmark is divided into 1090 different postcodes.

Overpass API is used to identify existing address nodes in OSM by searching for all nodes containing the osak:identifier=* tag and the addr:postcode=nnnn tag where nnnn is the postcode currently being updated.

All nodes found via Overpass API are downloaded using the OSM API.

The corresponding address data for the given postcode is downloaded from the DAWA API. The following data is loaded into the database:

Address data from OSM is compared with address data from DAR using SQL. The two database tables are joined on the osak:identifier=* key. Any one of the following conditions will trigger an update, however, any node with the autoaws=ignore tag present will be ignored:

  • The position (lat and lon) of the node is not equal to the DAR address position
  • addr:city=* is not equal to the DAR city name
  • addr:housenumber=* is not equal to the DAR house number
  • addr:street=* is not equal to the DAR street name (corrected version using the OIS fixes list)
  • addr:municipality=* is not equal to the DAR municipality name
  • addr:place=* is not equal to the DAR supplementary city name

The incorrect tag values will be updated, and the node is uploaded via the OSM API. Any additional tags on the node (non address tags, for example shop=*) will be preserved without change.

When updating an address, the following tags, if present, will be removed:

osak:subdivision=* (replaced by addr:place=*)

osak:house_no=* (duplicate of addr:housenumber=*)

osak:municipality_no=* (unique ID of the municipality in DAR, of little or no use to data consumers)

osak:street_name=* and osak:street=* (duplicate of addr:street=*)

osak:street_no=* (unique ID of the street in DAR, of little or no use to data consumers)

osak:revision=* (date/time of the last change of the address in DAR, superfluous in OSM since all nodes have a version and full history)

Deleting addresses

The data is tested for any osak:identifier=* values that exist in OSM data but do not exist in DAR data. Such nodes are added to a database table for deletion. Nodes with the autoaws=ignore tag will not be deleted, even if the address no longer exists in DAR.

Using the osak:identifier=* tag for this comparison means that if, for some reason, an address node has a wrong osak:identifier value, the address node will be deleted. However, the script will then notice the missing address and add a new node again.

There are now three possible outcomes when an address is to be deleted:

  • If the node does not have any additional (non address) tags, a DELETE call is sent to the OSM API, deleting the entire node.
  • If the node does contain additional tags, it is first checked if a new address is going to be added in the same position (lat and lon pair) as the address being deleted. (This can happen if the address ID has, for some reason, been changed in DAR.) If a new address is being added in the same position, the old node will be deleted and the additional tags will be transferred to the new address node.
  • If the node contains additional tags, but no new address is going to be added in the same position, all address tags are removed, and an updated node containing only the non-address tags is submitted to the OSM API. A fixme=* tag will be added to such nodes, explaining that the address no longer exists, and the additional tags should be manually checked.

Adding addresses

Similarly to how addresses are picked for deletion, for any address IDs that exist in DAR data, but does not exist in OSM data, the address is saved as a new address to be added to OSM. A new node will be created and pushed to the OSM API. The new node will contain the following tags:

source=Danmarks Adresseregister

osak:identifier=*

addr:city=*

addr:country=DK

addr:housenumber=*

addr:postcode=*

addr:street=*

addr:municipality=*

addr:place=* (only if DAR contains a supplementary city name)

Reverting changes

A procedure for automatically reverting changes done by autoAWS currently does not exist.

Version history

autoAWS 0.1

17 April 2018

  • First published draft.


autoAWS 0.2

30 April 2018

  • Addresses with the ois:fixme=* tag will not be updated (but will still be deleted if the address no longer exists)
  • Street name fixes from https://oisfixes.iola.dk/ are now included when downloading AWS data
  • Added error handling in case overpass API is down
  • Functions for node handling (update, create, delete) added
  • Logic added to handle cases where tags from an address being deleted need to be transferred to a new address:
  • - If an address is changed but the position remains the same, additional tags will be kept
  • - If an address is deleted, address tags will be removed but additional tags will be kept. The following is added: fixme=This address no longer exists. Please check if the tags on this node are still valid

First tests on live data run 2 May 2018. See https://www.openstreetmap.org/changeset/58616759 & https://www.openstreetmap.org/changeset/58617113.


autoAWS 0.3 rc1

6 May 2018

  • Fixed a bug where nodes were not being correctly updated
  • Improved performance significantly by bulk-downloading nodes from the OSM API instead of downloading nodes individually
  • Fixed a bug where additional (non-address) tags were not being correctly handled on deleted addresses
  • The Supplerende Bynavn from DAR is now imported and added as the addr:place=* tag. The osak:subdivision=* tag is removed when an address is updated
  • osak:municipality_name=* will now be replaced with addr:municipality=* when an address is updated. New addresses will also get the addr:municipality=* tag
  • Added a debug-mode to make testing easier
  • Fixed an encoding bug where special characters (such as letters æ, ø, å) were garbled
  • Following the discussion here, nodes with ois:fixme=* will no longer be ignored by the script. Instead, nodes with autoaws=ignore will be ignored.
  • addr:country=DK is now added to nodes where the the addr:country=* tag is missing
  • Added logic to handle cases where a postcode is discontinued or a new postcode is established


autoAWS 0.4

13 May 2018

  • Added handling of very large edits (will be split into multiple changesets if needed; OSM does not accept changesets larger than 10.000 edits)
  • Added an option to manually start an update for a given postcode
  • Fixed a bug where extra tags were not always transferred to new address nodes in vary large postcodes
  • Improved handling of addresses moving between postcodes. Including cases where a postcode is split into multiple postcodes (new postcodes being established) and cases where multiple postcodes merge to one (old postcodes being discontinued)
  • Supplementary city name will be discarded if equal to the postcode city name. So addr:city=* and addr:place=* tags with equal values will be avoided.
  • Fixed a bug where duplicate address nodes were not being deleted
  • Code refactoring, additional error handling, minor performance improvements

First larger edits (more than 10 addresses): https://www.openstreetmap.org/changeset/58825242


autoAWS 0.5 rc2

12 June 2018

  • Fixed a bug where changes were not submitted to the last 99 nodes in a postcode if the last node downloaded had an invalid (more than 32 characters) osak:identifier=*
  • The address ID (osak:identifier=* tag) is now returned in the same format used in DAR (lower case, with a dash after 8, 12, 16 and 20 characters). For unknown reasons, the script previously used converted ID's to uppercase and removed the dashes, which made it hard to look up the address in DAR.
  • osak:revision=* tag retired
  • Database optimization and minor code refactoring
  • Fixed a bug where existing duplicate address nodes in OSM were not deleted until the second update of a postcode
  • Added handling of cases where creating a changeset fails
  • ois:fixme=* tags will now also be removed from a node if an address is deleted
  • If an address node needs to be updated, and the node is a member of a way, a new address node will now be created with the updated values, instead of updating the old node. This is to prevent ways becoming deformed because an address node is moved.

autoAWS 1.0

No changes compared to 0.5.

Source code: https://pastebin.com/KCRYfY3W

autoAWS 1.1

16 August 2018

First version to be run continuously on a dedicated server.

  • Fixed a bug related to the deletion of postcodes
  • Increased max execution time to prevent the script from terminating during the processing of very large changesets (needed because the processing time of individual addresses was increased by some of the changes introduced in 0.5)
  • Add an option to force an update of all addresses, useful when changing the tagging scheme
  • Add an option to force an update of an address node if a certain tag is present, useful when changing the tagging scheme