AutoAWS

From OpenStreetMap Wiki
Jump to: navigation, search

autoAWS is a web app for automatically maintaining address nodes within Denmark.

autoAWS is currently under development. This wiki page serves as an information hub for the project and will be updated regularly. It has been created to comply with step 4 of the import guidelines

It was initially announced on the danish talk-dk mailinglist in april 2018.

For questions or comments regarding this project, please get in touch with OSM user JKHougaard or follow the Talk-DK mailing list.


Background

Address data in Denmark has been maintained more or less "automatically" since the initial import in 2009. The Danish government publishes official address data of very high quality which is easily accessible via an open API.

Until recently, user AWSbot was the account used to run a script used to maintain Danish addresses. The source code for this script is available on GitHub: https://github.com/AWSbot/PHPscript. The AWSbot script was, however, finally turned off in mid 2017 after years of complaints over various problems with it.

AWSbot has served as the main inspiration for autoAWS, and, although a completely new script has been written, some of the general working principles from the AWSbot script have been reused.

Goals

The goal of this automatic data import is to maintain the existing very high quality of address tags in Denmark by adding newly created addresses, deleting addresses that no longer exist and update tags on existing address nodes that contain old data.

Schedule

The script will be designed so that it can be automatically run on a scheduled basis. To prevent overloading the API's used and creating excessive changesets, the update rate will be limited so that each Danish postcode can only be updated once every 30 days.

Import Data

Data will be imported from the Danish Address Register, DAR (http://danmarksadresser.dk/dar), via the AWS suite (http://aws.dk/) address web API, DAWA (http://dawa.aws.dk/). Note that the terms DAR, AWS and DAWA might be used interchangeably on this page.

The data license, available here in Danish only: http://aws.dk/licens, provides a worldwide, free, non-exclusive, unlimited access to the data which can be copied, distributed, published, changed, combined with other data and used both commercially and non-commercially.

Data Preparation

The data will be imported using the existing tagging scheme for addresses in Denmark, found at Da:Adresser

For a complete list of tags, please see below.

Including OIS fixes

Some addresses in DAR contain errors. In particular, some street names are abbreviated in the database. This is the result of a limitation in an earlier version of DAR where the streetname could not exceed 20 characters.

An effort has been made to make a list of any such errors so that they can be corrected - both in OSM and eventually directly in DAR.

The list of fixes, available here, is considered when updating addresses.

Description of the Script Workflow

The autoAWS script is written in PHP and constructed around an SQL database. PHP is used as an interface between the various API's and the database layer, whereas as much of the data processing as possible happens in the database itself using SQL. All updates are automatically submitted to the OSM API (node PUT and DELETE calls) using cURL. Note that the OSM API seems to be the main bottleneck, execution-time wise, in the script.

An update is run for one Danish postcode at the time. At the time of writing, Denmark is divided into 1090 different postcodes.

Overpass API is called with a search for all nodes containing the osak:identifier=* tag and the addr:postcode=nnnn tag where nnnn is the postcode currently being updated.

For each of the returned address nodes, the following values are loaded into the database: osak:identifier=* (unique key), Node ID, Node position (lat and lon), addr:city=*, addr:country=*, addr:housenumber=*, addr:postcode=*, osak:municipality_name=*, osak:revision=*.

Updating addresses

Next, the corresponding address data for the given postcode is downloaded from the DAWA API. The following data is loaded into the database:

  • Address ID (=osak:identifier=*)
  • Address position (lat and lon)
  • Address city
  • Address house number
  • Address postcode
  • Address street name (if a fixed street name is available in the OIS fix database, that one is used instead of the one returned from AWS)
  • Address municipality name
  • Address version (date)

Address data from OSM is now compared with address data from DAR using SQL. The two database tables are joined on the osak:identifier=* key. Any one of the following conditions will trigger an update, however, any node with the ois:fixme=* tag present will be ignored:

  • osak:revision=* is not equal to or newer than the AWS revision date
  • The position (lat and lon) of the node is not equal to the AWS address position
  • addr:city=* is not equal to the AWS city name
  • addr:housenumber=* is not equal to the AWS house number
  • addr:street=* is not equal to the AWS street name
  • osak:municipality_name=* is not equal to the AWS municipality name

Any nodes failing one of the above tests will be saved to a new table as a node needing an update. Each of these nodes are now downloaded via the OSM API, the incorrect tag values are updated, and the node is uploaded again via the OSM API. Any additional tags on the node (non address tags, for example shop=*) will be preserved without change.

Note that the osak:house_no=* and osak:street_name=* tags will be deleted when a node is updated, since these tags are simply duplicates of the addr:housenumber=* and addr:street=* tags respectively. In addition, it has been suggested that the osak:municipiality_no=* and osak:street_no=* tags, currently added to a large number of Danish address nodes, are deleted since they probably serve little or no use to data consumers (both are keys used within DAR to uniquely identify municipalities and streets.)

Deleting addresses

The data is tested for any osak:identifier=* values that exist in OSM data but do not exist in AWS data. Such nodes are added to a database table for deletion.

Using the osak:identifier=* tag for this comparison means that if, for some reason, an address node has a wrong osak:identifier value, the address node will be deleted. However, the script will then notice the missing address and add a new node again.

For each of the addresses to be deleted, the address node is downloaded from the OSM API. There are now three outcomes:

  • If the node does not have any additional (non address) tags, a DELETE call is sent to the OSM API, deleting the entire node.
  • If the node does contain additional tags, it is first checked if a new address is going to be added in the same position as the address being deleted. (This can happen if the address ID has, for some reason, been changed in AWS.) If a new address is being added in the same position, the old node will be deleted and the additional tags will be transferred to the new address node.
  • If the node contains additional tags, but no new address is going to be added in the same position, all address tags are removed, and an updated node containing only the non-address tags is submitted to the OSM API. A fixme=* tag will be added to such nodes, explaining that the address no longer exists, and the additional tags should be manually checked.

Adding addresses

Similarly to how addresses are picked for deletion, for any address IDs that exist in AWS data, but does not exist in OSM data, the address is saved as a new address to be added to OSM. A new node will be created and pushed to the OSM API. The new node will contain the following tags:

source=AWS, osak:identifier=*, addr:city=*, addr:country=DK, addr:housenumber=*, addr:postcode=*, addr:street=*, osak:municipiality_name=*, osak:revision=*.

Reverting changes

Note: Subject to change. It still needs to be clarified if and how a useful revert procedure can be implemented.

The autoAWS script will save all the old address nodes affected by an edit for a period of 2 days, making it easy to revert a change containing major errors. Reverting a changeset will be possible by anyone by following a link in the changeset comments. When a changeset is reverted, all future updates of addresses in the given postcode are automatically disabled and must be manually enabled again after a manual review of the changeset in question and the reason for reverting.

Version history

autoAWS 0.1

17 April 2018

  • First published draft.


autoAWS 0.2

Under development

Planned changes:

  • Any address node with the ois:fixme=* tag will be ignored
  • Street name fixes from https://oisfixes.iola.dk/ are now included when downloading AWS data
  • Added error handling in case overpass API is down