Import/Catalogue/Address import for Allen County Indiana
Address import for Allen County Indiana is an import of the Allen County subset of COUNTY_ADDRESS_POINTS_IDHS_IN dataset which is of type (points) covering Allen County Indiana. The import is currently (as of April 20th, 2018) at the planning stage.
Status
Status | Notes | |
---|---|---|
Step 1 - Prerequisites | [Complete] | Prerequisites reviewed. |
Step 2 - Community Buy-in | [Complete] | Discussed on the imports mailing list, and the osm-us slack site. The talk-us mailing list was notified, but no one responded. |
Step 3 - License Approval | [Complete] | Data is licenced as public domain. |
Step 4 - Documentation | [Complete] | This Wiki Page. |
Step 5 - Import Review | [Complete] | After a suggestion from the imports mailing list the addr:city tag is now included in the import. |
Step 6 - Uploading |
Goals
The goal of this import is to include addresses in the OpenStreetMap database.
Schedule
Be sure to list the general timeframe of your project.
Import Data
Background
Provide links to your sources.
Data source site: http://maps.indiana.edu/layerGallery.html?category=Streets
Data license: Public Domain
Type of license (if applicable): Public Domain
Link to permission (if required): License confirmed via email
OSM attribution (if required): Contributors#IndianaMap.2C_Address_points (Not required, but requested)
ODbL Compliance verified: yes/no
OSM Data Files
The data is downloaded from the following file: http://maps.indiana.edu/download/Infrastructure/AddressPoints_SHP_Counties.zip. The file contains a zip file for each county, this import only uses the one for Allen County.
Existing OSM data to prevent duplicated data will be downloaded from the overpass API using the following query:
/*
This has been generated by the overpass-turbo wizard.
The original search was:
“addr:housenumber=*”
*/
[out:json][timeout:25];
// gather results
(
// query part for: “"addr:housenumber"=*”
node["addr:housenumber"]({{bbox}});
way["addr:housenumber"]({{bbox}});
relation["addr:housenumber"]({{bbox}});
);
// print results
out body;
>;
out skel qt;
A sample of data that has been processed can be inspected at https://gitlab.com/jgon6/allen_county_address_import/blob/master/Sample-data/46835.osm
Import Type
This import will be a onetime import using a combination of automated scripts, and manual conversion. The process is talked about in more detail in the Process section, but it will use Qgis, custom written scripts, and Josm for preparation of the data. The upload of the new data will be completed using Josm
Data Preparation
Data Reduction & Simplification
Describe your plans, if any, to reduce the amount of data you'll need to import.
Examples of this include removing information that is already contained in OSM or simplifying shapefiles.
Two steps will be taken that will reduce the data to be imported. The first is remove points that are sharing the same coordinates. These duplicate points will be removed using the "loadshp.py" script found in the repository.
The step that will reduce the data being imported will be removing the points that are already in OSM. These points will be removed using the "pre-exist.py" script, and the geojson file for the local area discussed in the OSM data files section
Tagging Plans
Describe your plan for mapping source attributes to OSM tags.
The import will use these fields from the source file:
Field name | notes |
---|---|
ADDR_HN | This is the house number |
ADDR_PD | Prefix direction |
ADDR_PT | Prefix type |
ADDR_SN | Street name |
ADDR_ST | Suffix type |
ADDR_SD | Suffix direction |
ZIPCODE | This is the zipcode |
The street type has the following unique values, with five of them having been shortened.:
- Road
- Crest
- NULL
- Cliffs
- Creek
- Glen
- Trail
- Grove
- Shores
- Ave (Avenue)
- Run
- Calle
- Cove
- Square
- Center
- Pass
- Row
- Lane
- Park
- Commons
- Circle
- Way
- Ford
- St Extension (Street Extension)
- Hills
- Hill
- Ridge
- Boulevard
- Knolls
- Drive
- Trace
- Hollow
- Place
- Ave Extension (Avenue Extension)
- Landing
- Knoll
- Point
- Court
- Plaza
- St (Street)
- Hwy (Highway)
- Pike
- Stream
- Parkway
- Shoals
- Crossing
- Path
- Passage
- Bend
- Expressway
- Terrace
The shortened street types first need to be corrected, this is the first step taken by "final-convert.py".
After the street type has been fixed, if needed, the street string is assembled from the street fields. This street string is stored into the final shapefile, and later renamed to become the "addr:street" tag
ADDR_PD | + | ADDR_PT | + | ADDR_SN | + | ADDR_ST | + | ADDR_SD | = | street | -> | addr:street |
---|
The ADDR_HN field is saved as "number" and later converted to the "addr:housenumber" tag.
The ADD_CITY field stores the name of the city or a two letter code representing the city. If the point uses a code this is first converted into the real city name, and then it is stored in the city field and later renamed to "addr:city".
Code | real name |
---|---|
AU | Auburn |
CH | Churubusco |
DE | Decatur |
FW | Fort Wayne |
GR | Grabill |
HA | Harlan |
HO | Hoagland |
HU | Huntertown |
LC | Leo-Cedarville |
MO | Monroeville |
NH | New Haven |
OS | Ossian |
RO | Roanoke |
SP | Spencerville |
WO | Woodburn |
YO | Yoder |
The ZIPCODE field is saved as the same name and later converted to the "addr:postcode" tag
Changeset Tags
- comment = Allen County Addresses Import <zipcode> -- <zipcode> will be replaced with the zipcode being uploaded, optionally with a letter added to the end if the zipcode needs to be broken up for upload.
- created_by = Josm -- Use the default from Josm here it includes the version number
- source = COUNTY_ADDRESS_POINTS_IDHS_IN: Address Points Maintained by County Agencies in Indiana, Twentieth Harvest (Indiana Department of Homeland Security, Point feature class), 20151217
- source:license = Public Domain
- type = import
- url:en = https://wiki.openstreetmap.org/wiki/Import/Catalogue/Address_import_for_Allen_County_Indiana
Data Transformation
The only data transformations used by this import are those that reduce the size of the import and the remapping of tag values. Both of which have been described elsewhere.
Data Merge Workflow
Team Approach
This import will be performed by the user Jgon6. The import will be uploaded using the Indianamap_imports user.
References
List all factors that will be evaluated in the import.
Factors evaluated in this import are points that share the same coordinates and points that are already in the database.
Workflow
Step 1: Download the data
Download the data and examine it.
Step 2: Split the data by zipcode
Split the data by zipcode so the chunks are smaller to process. This helps so you are not comparing nearly 170,000 comparisons per point.
Step 3: loadshp.py
Run loadshp.py for each zipcode's shapefile. This file doesn't take any arguments so you need to change the file references inside the file between each run.
After running this script there will be two files per zipcode. The first is named dupes and hold points that have more than one address point stored with the same coordinates. the second file is called no-dupes.shp and holds the points to feed into the next step
Step 4: download current osm data.
download the existing address data for the county from overpass using query found above.
Step 5: pre-existing.py
Run pre-existing.py, this file take the .shp from step 3, and the osm data saved as a geojson file, and splits out the points that are already in the osm database.
The output is two files. Exists.shp is a shapefile with the points already in OSM. new.shp holds the points that are new and will be imported.
Step 6: final-convert.py
Run final-convert.py. This script formats the new points into a new shapefile with properties on each point corresponding to tags for OSM. Due to shapefiles only supporting 10 character fields the names are placeholders.
Step 7: JOSM
Load the shapefile from step 6 into Josm and save it as a .osm file This step requires the OpenData plugin for JOSM
Step 8: rename placeholder tags
A text editor will be used to rename the placeholder tags into the following OSM tags:
number -> addr:housenumber postcode -> addr:postcode street -> addr:street
Step 9: Upload
The osm files will then be loaded into JOSM and checked for any visible errors. If they look good the data will then be uploaded to the osm servers using JOSM. Six of the zipcode splits are over 10,000 points before they have been processed and reduced, these files will be split in half using Qgis, or Josm before being uploaded.
Changeset tags:
- comment = Allen County Addresses Import <zipcode> -- <zipcode> will be replaced with the zipcode being uploaded, optionally with a letter added to the end if the zipcode needs to be broken up for upload.
- created_by = Josm -- Use the default from Josm here it includes the version number
- source = COUNTY_ADDRESS_POINTS_IDHS_IN: Address Points Maintained by County Agencies in Indiana, Twentieth Harvest (Indiana Department of Homeland Security, Point feature class), 20151217
- source:license = Public Domain
- type = import
- url:en = https://wiki.openstreetmap.org/wiki/Import/Catalogue/Address_import_for_Allen_County_Indiana
Step 10: Review discarded points.
The import process just ignores addresses already in the osm database that might have additional data missing, such as a zipcode. This information can be manually added after the main import process has taken place.
There are also the points with duplicate coordinates. These points can't be imported as is and will need to be ground verified, but there will be the shapefiles showing where they are. These files can be used to find locations that require a closer look using traditional mapping techniques.
Reverts
if any changeset needs to be reverted standard reverting procedures will be followed. Changesets from this import should revert easily because it will only import points that do not already exist.
Conflation
A simple approach will be taken to conflation. By not importing points that are already in the database there should be no conflicts, but if there are Josm should catch them before uploading, and conflicts will be handled manually.
QA(TODO)
Add your QA plan here.
Data Counts
Zipcode | Raw Feature Count | Imported Feature Count | Status |
---|---|---|---|
43526 | 3 | ||
45880 | 8 | ||
45832 | 28 | ||
45813 | 6 | ||
46777 | 74 | ||
46770 | 53 | ||
46763 | 1 | ||
46733 | 70 | ||
46725 | 29 | ||
46706 | 224 | ||
46802 | 5455 | ||
46803 | 5530 | ||
46804 | 14008 | ||
46805 | 10414 | ||
46806 | 11806 | ||
46807 | 7204 | ||
46808 | 9539 | ||
46809 | 4935 | ||
46819 | 3856 | ||
46835 | 15718 | ||
46825 | 13273 | ||
46815 | 11163 | ||
46845 | 9819 | ||
46818 | 9200 | ||
46816 | 7958 | ||
46774 | 7292 | ||
46814 | 5694 | ||
46748 | 2525 | ||
46765 | 2009 | ||
46797 | 1731 | ||
46773 | 1652 | ||
46741 | 1398 | ||
46783 | 1117 | ||
46798 | 1035 | ||
46743 | 965 | ||
46745 | 791 | ||
46788 | 767 | ||
46723 | 726 |