Durham County North Carolina Address Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

Goals

To add every missing address in Durham County to OpenStreetMap without creating duplicates, and to merge the addresses with existing features.

Schedule

The data has been converted to OSM XML format and duplicates of existing addresses in the OSM database have been removed. The data can be found here. The import is currently under way, using the OSM US Tasking Manager. Please see the project for more information.

Import Data

Background

Data source site: Durham Open Data

Data license: Open Database License

Type of license: Open Database License (ODbL)

Link to permission: N/A

ODbL Compliance verified: Yes.

OSM Data Files

Already transformed data

Import Type

A one time import that will be completed in many small uploads via the OSM US Tasking Manager.

Data Preparation

Data Reduction & Simplification

Tagging Plans

For the source tagging, "source:addr"="Durham Open Data" should be used on each address. Using "source:addr" rather than "source" has been decided because after merging the address with a building, it seems as if the building comes from Durham Open Data, which is not true. I will also use addr:housenumber, addr:street, addr:city, addr:state, and addr:postcode. addr:unit will be used on addresses that include a unit. addr:country will not be used. Some people argue that adding source tags to each address is unnecessary, but I believe that it helps new mappers know where the address came from when they are editing it, helping them make better decisions about merging, editing, or deleting the address.

Changeset Tags

"source"="Durham Open Data", "source:website"="https://opendurham.nc.gov/explore/dataset/addresses/export/", "source:date"="July, 2018", and adequate comments such as "Imported addresses in Durham County. (upload 3/15)" will be used on each of the 15 changsets.

Data Transformation

All needed data transformation has been completed using a combination of JOSM, the opendata plugin, and a custom made XML editing Java program.

  • First, the data was downloaded in KML format from Durham Open Data.
  • Second, the data was opened with JOSM using the opendata plugin.
  • Then, the file was saved in OSM XML format without changing the tags.
  • Every object (node, way, relation) with "addr:housenumber" and "addr:street" in Durham County was then downloaded using the Overpass API. They were saved as well.
  • The address editing program created by Leif Rasmussen was run, plugging in the new data and existing data. The program corrected casing (CHAPEL HILL -> Chapel Hill), created addr:street ("streetDirection"="N", "streetName" = "COLUMBIA", & "streetType"="ST" -> "addr:street"="North Columbia Street"), and removed duplicates of existing addresses by comparing the dataset to the existing addresses in the OSM database.
  • The file was then opened and the data was cleaned up.
  • The file was finally split into 15 manageable chunks and saved on Google Drive.

Data Transformation Results

Transformed data.

Data Merge Workflow

Team Approach

Some talk has been going on on the imports mailing list, and the opportunity for growing the local mapping community has been highlighted. The import is now using the tasking manager from OSM US (project 46). Anyone with enough experience is welcome to start mapping addresses!

Workflow

  • Open a square from the OSMUS Tasking Manager.
  • Download the addresses in OSM format with duplicates of existing addresses already removed.
  • Merge layers .
  • Remove data not in square.
  • Merge addresses with amenities and buildings manually.
  • Upload to OSM server.

Conflation

My data transformation Java program automatically removes duplicates of existing addresses from the dataset so that only missing addresses are added. It accounts for casing, abbreviations, and other issues with existing data to provide the most accurate duplicate removal possible. Conflation will not be a major problem, only in the cases where existing addresses have incorrect information.

User accounts