Seattle Import

From OpenStreetMap Wiki
Jump to: navigation, search

Neighborhood sign up sheet: Seattle Import/Work Table

In Progress

This plan is currently being implemented.

About

This page is intended to document plans for an upcoming import of data from data.seattle.gov.

NOTE: The term "Import" is highly loaded in the OSM community. "A distributed and curated merge," is a more accurate description of what Seattle OSM planning to do.

We plan to follow & update the plan found at import checklist, but our general high level plan is this:

  • Identify data to import
  • Translate, tag, and otherwise tenderize the data
  • Assemble a team of locals
  • Work together on a plan
  • Make sure the community - OSM (imports@), OSM-US (talk-us@), OSM-Seattle - are all on board with the plan.
  • Train the team
  • Divvy up the work
  • Do the work: Import/merge the data
  • QA the data
  • Beverage of choice & on to the next task

The intent is to begin this effort in earnest in 2013, but if we can complete planning by the Christmas season, we hope to take advantage of holiday leisure time to make good progress.

Goals

The primary goal of this effort is to radically improve the quality of City of Seattle address information in OpenStreetMap.

Secondary goals include improving the population of building shapes and to take better advantage of public domain data that is of interest to the Seattle OSM community.

Schedule

  • Planning: First half of December 2012 (or as long as this requires to gain consensus)
  • Training Meeting: tbd, before import kicks off, hopefully second half of December
  • Import: Starting the second half of December 2012 or the beginning of January 2013, ending: whenever... as long as it takes to do this carefully
  • QA: post-import

Import Data

Background

Data source site: http://data.seattle.gov
Data license: https://data.seattle.gov/page/data-policy
Permissions: http://wiki.openstreetmap.org/wiki/Contributors#City_of_Seattle.2C_Washington

Data Files

The City of Seattle provides its information as shapefiles in WGS84 or a local projection.

Address Data Files: https://data.seattle.gov/dataset/Master-Address-File/3vsa-a788
Building Data Files: https://data.seattle.gov/dataset/2009-Building-Outlines/y7u8-vad7

todo: Add information about size of these records.

Import Type

This is an OSM Seattle community-based, one-time import.

There are currently no plans to script or automate this import.

There are no plans for taking in or processing subsequent updates (e.g. feeds, diffs, etc.) that data.seattle.gov might provide. This would be a nice capability, but it is outside the scope of this immediate effort.

Data Preparation

Tagging Plans

Current plans for tagging:

  • Building outlines - just ways, not nodes - source=data.seattle.gov/nameoffiletbd
  • Building outlines - just ways, not node - building=yes
  • Addresses - source:addr=data.seattle.gov/nameoffiletbd
  • Addresses & outlines - doublechecked:no

We have discussed whether to add the individual source tags and this is still an open item. Ideally, this information would be tagged, but done in an automated fashion, so as to be easily observed by the casual observer. It is possible that we will add other building-related information from other sources at a later date & we would like to be able to easily identify these separate sources.

Changeset Tags

Need to learn more about how to use these in order to make sure we set things up properly.

Tag in import-related changesets: seattleimport=yes

Data Transformation

The source files are .shp-based and will need to be converted to OSM XML.

We have used ogr2osm from Paul Norman for this.

(Yes, github or attachments work better than adding the code to a wiki page... this is likely temporary...)

SeattleBuildings Transformation

def filterTags(attrs):
  if not attrs: 
    return
  tags = {}
  tags['building'] = 'yes'
  tags['source'] = 'data.seattle.gov'
  return tags

SeattleAddress Transformation

# Data from observation and Appendix B (p271) of 
# http://www.census.gov/geo/www/tiger/tiger2006se/TGR06SE.pdf
def expandName(geoname):

	if not geoname:
		return

	newName = ''

	expandName = {
	'AL':'Alley',
    	'ALY':'Alley',
    	'ARC':'Arcade',
    	'AVE':'Avenue',
    	'BLF':'Bluff',
    	'BLVD':'Boulevard',
    	'BR':'Bridge',
    	'BRG':'Bridge',
    	'BYP':'Bypass',
    	'CIR':'Circle',
    	'CRES':'Crescent',
    	'CSWY':'Crossway',
    	'CT':'Court',
    	'CTR':'Center',
    	'CV':'Cove',
    	'DR':'Drive',
	'ET':'ET',
    	'EXPY':'Expressway',
    	'EXPWY':'Expressway',
    	'FMRD':'Farm to Market Road',
    	'FWY':'Freeway',
    	'GRD':'Grade',
    	'HBR':'Harbor',
    	'HOLW':'Hollow',
    	'HWY':'Highway',
    	'LN':'Lane',
    	'LNDG':'Landing',
    	'MAL':'Mall',
    	'MTWY':'Motorway',
    	'OVPS':'Overpass',
    	'PKY':'Parkway',
    	'PKWY':'Parkway',
    	'PL':'Place',
    	'PLZ':'Plaza',
    	'RD':'Road',
    	'RDG':'Ridge',
    	'RMRD':'Ranch to Market Road',
    	'RTE':'Route',
    	'SKWY':'Skyway',
    	'SQ':'Square',
    	'ST':'Street',
    	'TER':'Terrace',
    	'TFWY':'Trafficway',
    	'THFR':'Thoroughfare',
    	'THWY':'Thruway',
    	'TPKE':'Turnpike',
    	'TRCE':'Trace',
    	'TRL' :'Trail',
    	'TUNL':'Tunnel',
    	'UNP':'Underpass',
	'VIS':'Vista',
    	'WKWY':'Walkway',
    	'XING':'Crossing',
    	### NOT EXPANDED
    	'WAY':'Way',
    	'WALK':'Walk',
    	'LOOP':'Loop',
    	'OVAL':'Oval',
    	'RAMP':'Ramp',
    	'ROW':'Row',
    	'RUN':'Run',
    	'PASS':'Pass',
    	'SPUR':'Spur',
    	'PATH':'Path',
    	'PIKE':'Pike',
    	'RUE':'Rue',
    	'MALL':'Mall',
    	'N':'North',
    	'S':'South',
    	'E':'East',
    	'W':'West',
    	'NE':'Northeast',
    	'NW':'Northwest',
    	'SE':'Southeast',
    	'SW':'Southwest'}

	newName = expandName.get(geoname)
	return newName.strip()
	
def filterTags(attrs):
	if not attrs: 
		return
	tags = {}
	pre_di = ''
	suf = ''
	suf_di = ''
	addr = ''
	housenumber = ''
	
	if 'MAF_HOUSEN' in attrs:
		housenumber = attrs['MAF_HOUSEN'].strip()
		if 'MAF_HOUSEM' in attrs:
			housenumber = housenumber + attrs['MAF_HOUSEM'].strip()
			tags['addr:housenumber'] = housenumber

	if 'GEO_PRE_DI' in attrs:
		pre_di = expandName(attrs['GEO_PRE_DI'])

	if 'GEO_STRE_1' in attrs:
		suf = expandName(attrs['GEO_STRE_1'])

	if 'GEO_SUF_DI' in attrs:
		suf_di = expandName(attrs['GEO_SUF_DI'])

	if 'GEO_STREET' in attrs:
		if attrs['GEO_STREET'].isalpha():
			street = attrs['GEO_STREET'].title()
		else:
			street = attrs['GEO_STREET'].lower()

	if pre_di:
		addr = pre_di + ' ' + street
	else:
		addr = street
	if suf:
		addr = addr + " " + suf
	if suf_di:
		addr = addr + ' ' + suf_di
	tags['addr:street'] = addr
	tags['source'] = 'data.seattle.gov'
	return tags

Data Transformation Results

Output OSM XML files can be reviewed here: https://www.dropbox.com/sh/bgwazxgsci2o2u3/NtWexk55Zr

Data Merge Workflow

Team Approach

The work for this effort will be divided up into sections, with each section constituting some neighborhood (or part of a neighborhood) in Seattle.

Sample reference files for Seattle neighborhoods:

Each section or neighborhood will be assigned to an import team volunteer. See the Seattle Import/Work Table.

We are also in the process of reviewing the use of the NZ Import Tools for this effort. These tools appear to be very helpful for distributed merge efforts like this project.

References

Using the editor of choice, each volunteer will begin to work through the data for that section, block by block, street by street, whatever makes sense for that person. Each volunteer will consider the following information when importing the data:

  • Local knowledge
  • Bing aerial layer
  • Existing OSM data
  • Address data import
  • Building outline import

Workflow

  1. Review overall area to be merged / imported / conflated.
  2. Identify subsection of area for immediate effort.
  3. Copy address points to the OSM data layer & removed from source layer.
  4. Copy building ways to the OSM data layer & removed from source layer.
  5. Conflate information in the copied area using guidance and training. Manually conflate until conflation plugin bug fixed. --Glassman (talk) 00:14, 1 February 2013 (UTC)
  6. Visually verify against Bing image. Remove buildings from vacant lots. Add in new buildings shown on Bing image.
  7. Ensure any non-import-related errors are removed.
  8. Validate the data using tools in the editors of choice.
  9. Commit the changeset.
  10. Move to the next subsection of work.
  11. Once complete, have another team member do QA, if appropriate.

Address Conflation

All team members will be asked to review the information at Key:addr and Addresses and Proposed_Features/Multiple_addresses as background for our discussion.

Conflation Tools

We are considering the use of the JOSM Conflation Plugin for volunteers using JOSM.

We will also consider testing Paul Norman's address conflation tool found at: https://github.com/pnorman/addressmerge. See: http://lists.openstreetmap.org/pipermail/imports/2012-December/001615.html.

Known Dataset Conflation Issues

There are a variety of address conflation issues to address in the data.seattle.gov files - below is an attempt to identify the various flavors we've seen so far:

Addresses as Nodes or Ways?
Issue: Addresses from the address file are tagged as nodes. Buildings are included as ways without addresses.
Planned Approach: address node tags will be copied to building ways wherever possible. This won't always be possible. See all following discussions for distinctions.

Multiple address nodes inside the same building
Issue: Buildings may also contain multiple address nodes.

Planned Approach:

  • Building ways containing 1 address node: (a) make sure it's the proper address node for that building, (b) copy the address node tags to the building way (e.g. OSX JOSM cmd-c on node, alt-cmd-c on way), (c) delete the node.
  • Building ways with more than 1 address node: (a) identify if there is a primary address for the building, (b) if so, copy the primary address node's tags to the building way and leave the other addresses as nodes inside that building, (c) if not, leave all address nodes inside the building way.
  • Method for identifying a primary address: residences: use the primary mailing address for a residence (requires local knowledge); for office buildings: again, use local knowledge.
  • If no primary address can be identified, then both nodes should be left inside the building way until it can be clarified later.

Multiple address nodes at the same geolocation
Issue: When there are multiple addresses at the same location, the files from data.seattle.gov tend to put them all at the same place. In addition, many corner residences have multiple addresses.
Planned Approach:

  • Movement: Address nodes whose information cannot be combined with their associated building's way will be moved to an approximate location of where they are within the building's outline, using local knowledge.

Address nodes with no supporting building ways
Issue: many addresses are for vacant lots.
Planned approach: leave them as is. We should not be destroying data in this exercise.

Building files differ from Bing Imagery or local knowledge or existing OSM ways Issue: the building data files are from 2009 and may not reflect ground truth.
Planned approach: volunteers will use their judgment in which building way should be used. The building ways that have been reviewed so far are fairly high quality and are usually more detailed than OSM contributors' artisinal vector shapes.

Entrances tbd

Local knowledge Issue: this is a great opportunity to infuse a lot of richness into Seattle's OSM data.
Approach: volunteers will be asked to add information they know along the way. Easy Seattle joke & example: if you know the corner address for a building is a coffeeshop, add that information while you're buzzing about in the import.

Dedicated Import Account

The Seattle Import team feels that a dedicated import account is not needed. Each changeset is tagged with a unique changeset tag to allow for easy search. Creating one import account to be shared by all impoters is a data security violation. Additionally, the importer is expect to visually verify the import against the Bing image. According to conversations with DWG, the visual verification probably takes this process out of their classification as an import. Thus, dedicated accounts are not required. SeattleImport will be used for high volume importing to avoid conflict with DWG. --Glassman (talk) 23:33, 23 April 2013 (UTC)

QA

In-process / first phase QA

This section needs to be fleshed out, but it will consist of:

  • Pre-import training
  • Use of validation tools during conflation (e.g. JOSM validation)
  • We are also interested in validating / reviewing any pre-existing OSM data as we continue this effort. For example, there are many streets marked as "Foo Avenue" that should be "South Foo Avenue." We will need to identify a process for this.
  • Group activities and IRC for question answering during the import.
  • Spot checking by team members of other team members' work.

Post-initial pass QA

This section also needs to be fleshed out, but it is likely to consist of:

  • A follow-up pass / comparison of imported data to data contained in OSM - this may be scripted and automated.
  • A second set of eyes will be required for this QA - no one should double-check their own work.