Import/Catalogue/Address import for Allen County Indiana

From OpenStreetMap Wiki
Jump to navigation Jump to search

Address import for Allen County Indiana is an import of the Allen County subset of COUNTY_ADDRESS_POINTS_IDHS_IN dataset which is of type (points) covering Allen County Indiana. The import is currently (as of April 20th, 2018) at the planning stage.

Screenshot of address points
A screenshot showing all of the address points that can be imported before processing for the import. The different colors show the different zipcodes that will be used to split the import into smaller chunks.

Status

Import Guidelines steps
Status Notes
Step 1 - Prerequisites [Complete] Prerequisites reviewed.
Step 2 - Community Buy-in [Complete] Discussed on the imports mailing list, and the osm-us slack site. The talk-us mailing list was notified, but no one responded.
Step 3 - License Approval [Complete] Data is licenced as public domain.
Step 4 - Documentation [Complete] This Wiki Page.
Step 5 - Import Review [Complete] After a suggestion from the imports mailing list the addr:city tag is now included in the import.
Step 6 - Uploading

Goals

The goal of this import is to include addresses in the OpenStreetMap database.

Schedule

Be sure to list the general timeframe of your project.

Import Data

Background

Provide links to your sources.

Data source site: http://maps.indiana.edu/layerGallery.html?category=Streets
Data license: Public Domain
Type of license (if applicable): Public Domain
Link to permission (if required): License confirmed via email
OSM attribution (if required): Contributors#IndianaMap.2C_Address_points (Not required, but requested)
ODbL Compliance verified: yes/no

OSM Data Files

The data is downloaded from the following file: http://maps.indiana.edu/download/Infrastructure/AddressPoints_SHP_Counties.zip. The file contains a zip file for each county, this import only uses the one for Allen County.

Existing OSM data to prevent duplicated data will be downloaded from the overpass API using the following query:

/*
This has been generated by the overpass-turbo wizard.
The original search was:
“addr:housenumber=*”
*/
[out:json][timeout:25];
// gather results
(
  // query part for: “"addr:housenumber"=*”
  node["addr:housenumber"]({{bbox}});
  way["addr:housenumber"]({{bbox}});
  relation["addr:housenumber"]({{bbox}});
);
// print results
out body;
>;
out skel qt;

A sample of data that has been processed can be inspected at https://gitlab.com/jgon6/allen_county_address_import/blob/master/Sample-data/46835.osm

Import Type

This import will be a onetime import using a combination of automated scripts, and manual conversion. The process is talked about in more detail in the Process section, but it will use Qgis, custom written scripts, and Josm for preparation of the data. The upload of the new data will be completed using Josm

Data Preparation

Data Reduction & Simplification

Describe your plans, if any, to reduce the amount of data you'll need to import.

Examples of this include removing information that is already contained in OSM or simplifying shapefiles.

Two steps will be taken that will reduce the data to be imported. The first is remove points that are sharing the same coordinates. These duplicate points will be removed using the "loadshp.py" script found in the repository.

The step that will reduce the data being imported will be removing the points that are already in OSM. These points will be removed using the "pre-exist.py" script, and the geojson file for the local area discussed in the OSM data files section

Tagging Plans

Describe your plan for mapping source attributes to OSM tags.

The import will use these fields from the source file:

Field name notes
ADDR_HN This is the house number
ADDR_PD Prefix direction
ADDR_PT Prefix type
ADDR_SN Street name
ADDR_ST Suffix type
ADDR_SD Suffix direction
ZIPCODE This is the zipcode

The street type has the following unique values, with five of them having been shortened.:

  • Road
  • Crest
  • NULL
  • Cliffs
  • Creek
  • Glen
  • Trail
  • Grove
  • Shores
  • Ave (Avenue)
  • Run
  • Calle
  • Cove
  • Square
  • Center
  • Pass
  • Row
  • Lane
  • Park
  • Commons
  • Circle
  • Way
  • Ford
  • St Extension (Street Extension)
  • Hills
  • Hill
  • Ridge
  • Boulevard
  • Knolls
  • Drive
  • Trace
  • Hollow
  • Place
  • Ave Extension (Avenue Extension)
  • Landing
  • Knoll
  • Point
  • Court
  • Plaza
  • St (Street)
  • Hwy (Highway)
  • Pike
  • Stream
  • Parkway
  • Shoals
  • Crossing
  • Path
  • Passage
  • Bend
  • Expressway
  • Terrace

The shortened street types first need to be corrected, this is the first step taken by "final-convert.py".

After the street type has been fixed, if needed, the street string is assembled from the street fields. This street string is stored into the final shapefile, and later renamed to become the "addr:street" tag

ADDR_PD + ADDR_PT + ADDR_SN + ADDR_ST + ADDR_SD = street -> addr:street

The ADDR_HN field is saved as "number" and later converted to the "addr:housenumber" tag.

The ADD_CITY field stores the name of the city or a two letter code representing the city. If the point uses a code this is first converted into the real city name, and then it is stored in the city field and later renamed to "addr:city".

Code real name
AU Auburn
CH Churubusco
DE Decatur
FW Fort Wayne
GR Grabill
HA Harlan
HO Hoagland
HU Huntertown
LC Leo-Cedarville
MO Monroeville
NH New Haven
OS Ossian
RO Roanoke
SP Spencerville
WO Woodburn
YO Yoder

The ZIPCODE field is saved as the same name and later converted to the "addr:postcode" tag

Changeset Tags

  • comment = Allen County Addresses Import <zipcode> -- <zipcode> will be replaced with the zipcode being uploaded, optionally with a letter added to the end if the zipcode needs to be broken up for upload.
  • created_by = Josm -- Use the default from Josm here it includes the version number
  • source = COUNTY_ADDRESS_POINTS_IDHS_IN: Address Points Maintained by County Agencies in Indiana, Twentieth Harvest (Indiana Department of Homeland Security, Point feature class), 20151217
  • source:license = Public Domain
  • type = import
  • url:en = https://wiki.openstreetmap.org/wiki/Import/Catalogue/Address_import_for_Allen_County_Indiana

Data Transformation

The only data transformations used by this import are those that reduce the size of the import and the remapping of tag values. Both of which have been described elsewhere.

Data Merge Workflow

Team Approach

This import will be performed by the user Jgon6. The import will be uploaded using the Indianamap_imports user.

References

List all factors that will be evaluated in the import.

Factors evaluated in this import are points that share the same coordinates and points that are already in the database.

Workflow

Address import for Allen County Indiana process.svg.png

Step 1: Download the data

Download the data and examine it.

Step 2: Split the data by zipcode

Split the data by zipcode so the chunks are smaller to process. This helps so you are not comparing nearly 170,000 comparisons per point.

Step 3: loadshp.py

Run loadshp.py for each zipcode's shapefile. This file doesn't take any arguments so you need to change the file references inside the file between each run.

After running this script there will be two files per zipcode. The first is named dupes and hold points that have more than one address point stored with the same coordinates. the second file is called no-dupes.shp and holds the points to feed into the next step

Step 4: download current osm data.

download the existing address data for the county from overpass using query found above.

Step 5: pre-existing.py

Run pre-existing.py, this file take the .shp from step 3, and the osm data saved as a geojson file, and splits out the points that are already in the osm database.

The output is two files. Exists.shp is a shapefile with the points already in OSM. new.shp holds the points that are new and will be imported.

Step 6: final-convert.py

Run final-convert.py. This script formats the new points into a new shapefile with properties on each point corresponding to tags for OSM. Due to shapefiles only supporting 10 character fields the names are placeholders.

Step 7: JOSM

Load the shapefile from step 6 into Josm and save it as a .osm file This step requires the OpenData plugin for JOSM

Step 8: rename placeholder tags

A text editor will be used to rename the placeholder tags into the following OSM tags:

number -> addr:housenumber postcode -> addr:postcode street -> addr:street

Step 9: Upload

The osm files will then be loaded into JOSM and checked for any visible errors. If they look good the data will then be uploaded to the osm servers using JOSM. Six of the zipcode splits are over 10,000 points before they have been processed and reduced, these files will be split in half using Qgis, or Josm before being uploaded.

Changeset tags:

  • comment = Allen County Addresses Import <zipcode> -- <zipcode> will be replaced with the zipcode being uploaded, optionally with a letter added to the end if the zipcode needs to be broken up for upload.
  • created_by = Josm -- Use the default from Josm here it includes the version number
  • source = COUNTY_ADDRESS_POINTS_IDHS_IN: Address Points Maintained by County Agencies in Indiana, Twentieth Harvest (Indiana Department of Homeland Security, Point feature class), 20151217
  • source:license = Public Domain
  • type = import
  • url:en = https://wiki.openstreetmap.org/wiki/Import/Catalogue/Address_import_for_Allen_County_Indiana

Step 10: Review discarded points.

The import process just ignores addresses already in the osm database that might have additional data missing, such as a zipcode. This information can be manually added after the main import process has taken place.

There are also the points with duplicate coordinates. These points can't be imported as is and will need to be ground verified, but there will be the shapefiles showing where they are. These files can be used to find locations that require a closer look using traditional mapping techniques.

Reverts

if any changeset needs to be reverted standard reverting procedures will be followed. Changesets from this import should revert easily because it will only import points that do not already exist.

Conflation

A simple approach will be taken to conflation. By not importing points that are already in the database there should be no conflicts, but if there are Josm should catch them before uploading, and conflicts will be handled manually.


QA(TODO)

Add your QA plan here.

Data Counts

Numbers of nodes by zipcode
Zipcode Raw Feature Count Imported Feature Count Status
43526 3
45880 8
45832 28
45813 6
46777 74
46770 53
46763 1
46733 70
46725 29
46706 224
46802 5455
46803 5530
46804 14008
46805 10414
46806 11806
46807 7204
46808 9539
46809 4935
46819 3856
46835 15718
46825 13273
46815 11163
46845 9819
46818 9200
46816 7958
46774 7292
46814 5694
46748 2525
46765 2009
46797 1731
46773 1652
46741 1398
46783 1117
46798 1035
46743 965
46745 791
46788 767
46723 726