Centre County Building and Address Import
The Centre County Building and Address Import is an import that was started and largely completed in January 2021. The goal is to add all missing buildings and addressable structures in, PA, as well as adding addresses to buildings where missing and improving pre-existing building geometries. Currently, the project is in the process of cleaning up imported errors and manually adding address tags to buildings where they are missing.
The data will come from Centre County Open Data, which per their description is “for the free distribution of GIS Data”. It is also noted on the website that "Not all of the building address points have building polygons. Some landmark points do not have actual structure and do not contain a building polygon."
The subtype of building is defined through the FCODE attribute. For the time being, the following FCODEs will be the only ones imported:
|1905||Multiple Addressable Structure||building=yes|
|1940||Building w/ Pay Phone Inside||building=yes|
The reasons for not including the other FCODEs are (1) too many of the features are not actually buildings, or (2) PSU buildings and features have already been mapped extensively. The complete list of FCODEs can be found here.
City and Zip Code
While the Centre County data does have attribute columns for city and postal code, the columns are not populated. This data will instead come from the TIGER ZCTA5 (zip code tabulation areas). Note: In the US, there are not actually zip code polygons/boundaries, but rather linear features associated with roads and addresses. However, an approximate encompassing area by TIGER will suffice for applying the tags to the features within their "boundaries".
TIGER provides the zip code as the GEOID10 attribute. The accompanying city name can be found from here. For all Centre County zip codes except 16823, there is only one city name that the USPS will accept. Therefore, while there may be buildings with the same addr:postcode=* tag within multiple municipalities, there should only be one addr:city=* tag to accompany that specific addr:postcode=*.
Zip code 16823 is an exception - the primary preferred city is addr:city=* tag, and all other features with zip code 16823 will have Bellefonte as their addr:city=* tag. However, this is open for debate and modification in the future.but the USPS will also accept Pleasant Gap, Hublersburg and Wingate. Of the three secondary, acceptable cities, is the only census designated area that currently has a census boundary mapped in OSM. For the sake of simplicity, all features with the Pleasant Gap boundary with have that name as their
There was a fair amount of data manipulation that was required before it was ready to be imported, and the plan went as follows:
Address Tag Manipulation
The .dbf file was loaded into Excel to manipulate the address data to comply with OSM tagging scheme.
Excel's find and replace tool was used to replace all abbreviations. All replacements came from the USPS list of street abbreviations. The wiki also has a table of the most common ones found on TIGER roads.
Conflation functions (e.g. =cell1&" "&cell2) were used to combine the PRE_DIR, STR_NAME, STR_TYPE AND POS_DIR. However, some =if() functions and helper columns had to be used to ensure that the final addr:street=* column did not have any extra spaces created as a result of combining a blank cell, since none of the street names have values under all four attributes.
The =proper() command worked well for converting the conflated name from all caps to proper case, but created a couple instances that required clean up once imported in JOSM (e.g. 1St Street vs. 1st Street, Mckee vs McKee).
Once all modifications were complete, the last steps were remove all attributes except ADDRESSLAB (renamed addr:housenumber=*), addr:street=* (the conflated street name), FCODE and OBJECTID. Finally, the tags building=yes, source=Centre County GIS, addr:state=PA and addr:country=US were added to all addressable structures, and noaddress=yes was added to all nonaddressable structures. A new workbook was created and all of the data was copied and pasted as values (Alt+HVV).
Transfer Address Tags to Shapefile
The original shapefile and newly created Excel file were opened in QGIS3 as separate vector layers (Ctrl+Shift+V). All attributes except the OBJECTID were deleted from the shapefile layer by editing the attribute table.
The Excel table layer was joined to the shapefile layer by setting the "Join Field" and "Target Field" to be the OBJECTID (which is a unique value for each object). In addition, "Custom field name prefix" was enabled and set to be blank. This last step was not necessary since the keys will not appear as they were listed in the Excel table (":" are changed to "_"), but it made it easier to identify and edit in the JOSM tag window. More detailed instructions for joining layers can be found here.
Prepare TIGER Zip Boundaries
The TIGER ZCTA5 shapefile covers the entire US, and as a result, the shapefile is over 38 MB. To greatly reduce the file size and remove all unnecessary data, the shapefile was opened into a new QGIS window and all features that were not in the vicinity of Centre County were . This reduced the file size to 689 KB, which helped tremendously in the next when trying to merge with the building shapefile.
Add City and Zip Code Tags
The building and zip code shapefiles were opened in JOSM using the OpenData plugin, and merged into the same layer (Ctrl+A, then Ctrl+Shift+M). For some computers, it may be necessary to merge the zip code boundaries into the building layer one or a few at a time.
To select all of the buildings inside of the boundary, the boundary was selected, then all features inside were selected (Alt+Shift+I). Doing this will also select the boundary and all nodes, but the nodes can be unselected by pressing Shift+U. Both of these selection features are available through utilsplugin2. The addr:city=* and addr:postcode=* were then added inside one boundary at a time through this method.
One problem with this method is that any buildings that intersect the boundary were not included in the selections. This required a manual review of these buildings and a determination of the proper zip code through the USPS Zip Code Lookup.
There are some ways that only have 3 nodes that are present just for the address tags. These were be removed for now, and can potentially added later on as address nodes.
As mentioned above, there were some addr:street=* names that had be cleaned up as a result of the limitations of the =proper() command. These were evaluated individually against the street name already in OSM. Additionally, many of the addr:housenumber=* tags contained a unit number that had to be transferred to the addr:unit=* tag.
Some buildings with shared walls did not properly interface with one another. These were detected using JOSM validator, and fixed by merging the nodes of the buildings in contact with one another. JOSM validator was also used to identify buildings that were overlapping highways and waterways. In most cases, the building positioning was correct the highway or waterway needed adjusted.
The last cleanup step before importing was removing the OBJECTID and FCODE keys.
Conflation was done manually. If the existing building had a more accurate geometry, the imported building was be deleted. If the imported building had a more accurate geometry, the features were be merged through the “Replace Geometry” function. All existing tags were preserved when doing this. Extreme care was be taken when mapping sections in and near State College, as an extensive amount of mapping has already been done in this region by many users. However, many of the prior buildings in this area had recently imported using MapWithAI (example changeset), and there was great opportunity to replace them with more accurate geometries and addresses.
Review and Cleanup
A number of cleanup tasks are underway to ensure that Centre County has the highest quality data possible. These include, but are not limited to:
- Reviewing street names and addr:street=* tags that do no match and correcting accordingly, made possible through the address view on OSM Inspector.
- Correcting misformatted addr:housenumber=* tags.
- Dividing interconnected homes in and near State College into separate, addressable structures. These included semidetached houses, terraces and apartments.
- Reviewing buildings that are missing all or some address tags through the following Overpass query:
All buildings except those with FCODE 1901 and 1902 will be tagged with the following scheme: