King County Import

From OpenStreetMap Wiki
Jump to: navigation, search

In Progress

This plan is currently being finalized. However, much of the original planning was carried out in the Seattle Import.

About

This page is intended to document plans for an upcoming import of data from King County. King County Address import is a continuation of the original Seattle Import

We plan to follow & update the plan found at import checklist, but our general high level plan is this:

  • Identify data to import
  • Translate, tag, and otherwise tenderize the data
  • Assemble a team of locals
  • Work together on a plan
  • Make sure the community - OSM (imports@), OSM-US (talk-us@), OSM-Seattle - are all on board with the plan.
  • Train the team
  • Divvy up the work using the HOT Tasking Manager
  • Do the work: Import/merge the data
  • QA the data
  • Beverage of choice & on to the next task

The intent is to begin this effort in earnest in early 2014.

Goals

The goal of this effort is to radically improve the quality of King County address information in OpenStreetMap.

Schedule

  • Planning: The bulk of the planning was completed in the original Seattle Import
  • Continuing Training: First JOSM import training will occur during the January 2014 #Editathon
  • Import: Resume importing the new King County data with new and existing local community members
  • QA: post-import

Import Data

Background

Data source site: http://www5.kingcounty.gov/gisdataportal/Default.aspx
Data license: http://www5.kingcounty.gov/gisdataportal/
Permissions: http://wiki.openstreetmap.org/wiki/Contributors#King_County.2C_Washington

Data Files

King County provides its information as shapefiles in Washington State Plane North.

Address Data Files: coming shortly


Import Type

This is an OSM Seattle community-based, one-time import.

There are currently no plans to script or automate this import.

We have decided to include the King County SITEID field to subsequent updates. While no processes exist for updating the data, one needs to be developed.

Data Preparation

Tagging Plans

No source tags will be added. However, the SITEID field will be include as source:addr:id=SITEID

Changeset Tags

Need to learn more about how to use these in order to make sure we set things up properly.

Tag in import-related changesets: import=King County GIS

Data Transformation

The source files are .shp-based and will need to be converted to OSM XML. King County shp files will be processed in Postgresql to expand street addresses. Code is available on github

We have used Paul Norman's org2osm to convert from Postgresql to an OSM XML for this. The translation script is:

def filterTags(attrs):
  if not attrs: 
    return
  tags = {}
  if 'ADDR_HN' in attrs:
    tags['addr:housenumber'] = attrs['ADDR_HN']	
  if 'ROAD' in attrs:
    tags['addr:street'] = attrs['ROAD']
  if 'CITY' in attrs:
    tags['addr:city'] = attrs['CITY']
  if 'SITEID' in attrs:
    tags['source:addr:id'] = attrs['SITEID']
  if 'ZIP5' in attrs:
    tags['addr:postcode'] = attrs['ZIP5']
  return tags

Data Transformation Results

Output OSM XML files can be reviewed here: AWS S3 containing all 1500+ voting district files.

Data Merge Workflow

Team Approach

The work for this effort will be divided up into sections, with each section constituting a voting district. Voting district data have been loaded into US Tasking Manager.

References

Using the JSOM, each volunteer will begin to work through the data for that district.

  • Local knowledge
  • Bing aerial layer
  • Existing OSM data
  • Address data import

Workflow

  • Click on the task tab above and claim a task on the map to the right by clicking on an area you'd like to work in
  • Click the "JOSM" button, this will open the area in JOSM and load up existing OSM data
  • Click the ".osm" button, this will open new address data in a separate layer
  • Select the new addresses layer and validate the layer, fix all issues (more in Tips & Tricks below)
  • Copy all geometry from addresses layer and paste it into the existing OSM data layer.
  • Run validation on the existing OSM data layer that now contains new data, resolve all issues emanating from collisions between existing and new data (see merge rules).
  • Do a sanity check on data: Do addresses correspond to adjacent roads? (more on imagery below)
  • Merge with existing address data. Use the JOSM "m" key to merge nodes. Check for addresses in POI and merge to new address data.
  • To merge with a building outline, use the Replace Geometry (Ctrl+Sift+G) by selection the existing building outline and the new address node. Note: Only merge building outline with an address node if the building outline contains just one address node.
  • Upload data to OSM
  • Go back to the Tasking Manager and mark the task as done (but not as validated)

Importer Quality Checks

The accepting tasks from the Tasking Manager should fix the following issues when importing addresses to help insure a high quality import:

  • Street addresses match nearby street name. If the addr:street does not match the street, verify the OSM tag against the latest TIGER road overlay. If there is no match the district should be skipped. Notify Clifford Snow of discrepancy. Additionally, add a note asking for a verification of street names.
  • Look for duplicate addresses. Check POI for address tags. Often POI's are added with incomplete addresses. Merge the imported address with the POI.
  • If a single address is inside of a building outline, merge the two. Leave multiple addresses inside building outlines as individual nodes.
  • Attempt to clean up misaligned roads by using the Bing background image as well as the "New & Misaligned TIGER Roads (TIGER 2013)"


Conflation Tools

Since the JOSM Conflation plugin is still broken, users will manually conflate address points

Known Dataset Conflation Issues

Local knowledge Issue: this is a great opportunity to infuse a lot of richness into King County's OSM data.
Approach: volunteers will be asked to add information they know along the way. For example: if you know the corner address for a building is a coffeeshop, add that information while you're buzzing about in the import.

Dedicated Import Account

Users will be expected to obtain an unique import account, for example, user Glassman might pick Glassman_Import.

QA

Validation

  • Pre-import training
  • Use of validation tools in the Tasking Manager process
  • Group activities and IRC for question answering during the import.