Address Data Import for Fulton County, Georgia
The 2005 address data for Fulton County, Georgia is in a semi-public domain (see licensing below). In addition, the 2005 building data from Fulton County GIS has no use restrictions other than reading the liability statement. The plan is to import this data into OpenStreetMap. The objective of this is to add address data and building data for Fulton County into OpenStreetMap. Only addresses that do not already exist in OpenStreetMap and that meet the conditions and buildings that don't touch existing buildings or touch other buildings in the dataset itself will be imported.
Converted OSM file (from main data set): https://drive.google.com/file/d/0B30vrP6AZTFybUR2dE50b2x5ajQ/edit
- Suburbs_errors.osc: https://drive.google.com/file/d/0B30vrP6AZTFyaFR0Sm5CaVVBakE/edit?usp=sharing
Converted OSM file(s) with areas (from building data set): https://drive.google.com/file/d/0B30vrP6AZTFyaEpIb1FNQk5EN0k/edit?usp=sharing (.tar.gz) or https://drive.google.com/file/d/0B30vrP6AZTFyY2NkY1ZjNm9hTUk/edit?usp=sharing (.tar.xz)
Converted OSM file (from tax parcels data set): https://drive.google.com/file/d/0B30vrP6AZTFyWTJIb0VpZFJNcjg/edit?usp=sharing
For the address dataset, the PDF file provided with the data states "Freely use, copy and distribute as long as credit is given". The metadata information that is on the main download page states no access restrictions, but includes the standard disclaimer for GIS data in the use restrictions. Below is the actual text:
DATA Disclaimer: Fulton County makes known to user, and user acknowledges notice by Fulton County that this Data contains known errors and inconsistencies. Fulton County in no way ensures, represents or warrants the accuracy and/or reliabilibity for any purpose.
For the building dataset, in an email with Steve Williams, it was said that "Fulton County has no objections to the incorporation of our publicly distributed data into OpenStreetMap."
The data is in a shapefile, which consists of 328,528 address points. However, because of how the Georgia GIS stores the address data, some addresses are duplicated to show the access point and the building itself. As a result, there are (at least) 296,458 unique addresses.
The shapefile was converted to OSM XML using ogr2osm. The projection was correctly detected automatically as the Georgia West projection.
NOTE: After the conversion process, opening it in JOSM yields only 321,528 nodes (which was confirmed by using grep to detect the start of the node tags), whereas QGIS and the dbf file (opened in LibreOffice Calc) states that there are 328,528 nodes.
The tags in the shapefile are as follows (bolded tags are tags planned to be used in the conversion process):
- NAME: Street name; the quadrant, if any, is abbreviated and can be easily expanded.
- STR_NUM_LO and STR_NUM_HI: Represents the low and high ranges of the address numbers for that point. Many of the points have the same number for both fields. In the case of different numbers for the two fields, this is likely to mean that it is an apartment complex, office building, or a building with multiple addresses or suite numbers recorded.
- STR_DIR: The direction of the street (N, S, E, W). This will likely not be used in the conversion and upload.
- STR_NAME: The name of the street, without any suffix or quadrant. This will likely not be used in the conversion and upload.
- STR_TYPE: The suffix of the street name. This will likely not be used in the conversion and upload.
- STR_SUFFIX: The quadrant of the street, if any. This will likely not be used in the conversion and upload.
- COMMUNITY: The city of the address.
- ZIP_CODE: The zip code of the address.
- STATUS: The status of the address point. If the status is not "A", this address will be skipped in the conversion and upload.
- FEAT_TYPE: The type of the address point. This is well-explained in the PDF provided with the data. The current plan is to use either the primary point ("driv") or the structural point ("stru") in the import of the addresses. If there is a structural point, then this will be used for the address. If there is no structural point, or if there are multiple structural points (as is the case for some houses), then the primary point will be used.
- SUNRISE and SUNSET: The month the address was added/removed.
- METHOD: This will likely not be used in the conversion and upload.
- SOURCE: The source from which the address was added to the data set. This will likely not be used in the conversion and upload.
- CITY_CODE: This will likely not be used in the conversion and upload.
- LAST_MOD: The date of the last modification. This will likely not be used in the conversion and upload.
- UID: The user responsible for adding/editing the address. This will likely not be used in the conversion and upload.
- FILEVER: This will likely not be used in the conversion and upload.
- GEO_OID: This represents an immutable ID. This will likely not be used in the conversion and upload.
The tags that will be used in the final upload are addr:housenumber, addr:street, addr:city, and addr:postcode.
This data is also in a shapefile, and was converted to OSM XML using the OpenData plugin in JOSM. However, the resulting file was about 600 MB. Therefore, a splitter program was used to divide up the OSM XML into several files having no more than 750,000 nodes in each file. This produced 8 files, with the largest being 105 MB.
The tags in the shapefile are as follows:
- AreaSqFt: Note that some buildings have a value of 0 for this tag.
- FeatType: Whether it is a commercial or residential building
- LUCDesc: A short description of the type of building.
- Stories: Note that some buildings have a value of 0 for this tag.
- YearBuilt: What year the building was built. Note that some buildings have a value of 0 for this tag.
The LUCDesc tag will be used to add a building/amenity tag describing what it is. If the tag cannot be used, then the building will only have the tag building=yes.
Tax Parcels Dataset
The shapes in this dataset will be used to help merge the addresses into the appropriate buildings. Nothing within this dataset will be directly imported into OpenStreetMap.
A Qt program will be written to do the conversion (source code (in progress)). The program will give an OSM change file that can be uploaded. The conversion conditions and tags will be as follows:
- addr:housenumber will contain the number in STR_NUM_LO if and only if STR_NUM_LO and STR_NUM_HI are the same number.
- This would mean that those address points that cover apartment complexes and other buildings with multiple address numbers are not included in the import. One possible solution to include these addresses is to just list out the address numbers between the two numbers given in the addr:housenumber field; however, the difference between some of the numbers is in the thousands.
- addr:street will contain the street name in NAME, but the quadrant will be expanded to the full name and the capitalization will be corrected to match the road name in OpenStreetMap.
- addr:postcode will contain the zip code in ZIP_CODE if the address is located inside the correct zip code polygon (based on ZCTA data).
- The STATUS value must be "A" (active). Otherwise, the address point will be skipped.
- The SUNSET must be either -1 or 999999. Otherwise, the address point will be skipped.
- If the FEAT_TYPE is not "driv" or "stru", the address point will be skipped.
- For each address, if there is exactly one address point that has a FEAT_TYPE of "stru", then this address point will be used. If there is either zero or more than one address point that has a FEAT_TYPE of "stru", then the address point that has a FEAT_TYPE of "driv" will be used.
- If an address in the dataset already exists in OSM (that is, if there is a node, way, or relation that has a matching addr:housenumber and an addr:street that matches the street name), then this address will be skipped from the import process. If there is a node that has an addr:housenumber, but not an addr:street, then the address will be considered as "not existing" and will be included in the import process.
- If an address is not in between 5 and 100 meters from the street it is "linked" to, then that address will be skipped.
- If an address is less than 2 meters away from another address, then one of the two addresses will be skipped.
- If a building that is imported from the dataset intersects with another building in the dataset, then one of the buildings will be skipped. The one that is skipped is determined by the YearBuilt tag; the building with the highest value for the YearBuilt tag (ie most recent) will be kept. If they both have the same value, then either one will be skipped.
- If a building that is imported from the dataset intersects with another building already in OpenStreetMap, then the building will be skipped.
- If there is exactly one address and one building in the enclosing tax parcel, then the address and building will be merged. If there is exactly one address and multiple buildings in the enclosing tax parcel, then the largest building in terms of area will be determined, and the address will be merged into that building. If there is more than one address in the enclosing tax parcel, then nothing will be changed at this stage.
- If there is one and exactly one address point located inside the building (whether it is from the dataset or from OSM), then the address and building data will be merged into the building way. Otherwise, the nearest address point that is within 25 meters of the building will be merged into the building way
- The following is the list of tag conversions for the building:
The OSM change file will contain additions of addresses and buildings and modifications of building (where address data is added) and the additions will have negative IDs.
One issue regarding address number ranges was brought up above.
Another issue is that some amenities and buildings already have address information, and so adding the data from here as well will result in duplicates. If it is assumed that the existing address data is either just as accurate or more accurate than this data set, then the program could check to see if there is a point with the same address number and street name. If it finds one, then the address is skipped.
Another issue is that there may have been smaller address imports in Atlanta. The same guideline as above could apply here.
Another issue is that the street names in OpenStreetMap may not match up with street names in this data set. In the event that a street name in the data set cannot be matched with a street in OpenStreetMap, the address will be skipped. More than likely, the street name in OpenStreetMap is correct, and the street name in the dataset needs to be updated.
There are three issues that exist in the OSM data currently. One is that some nodes that have the addr:housenumber tag don't have the addr:street tag, and so the street name cannot be matched up. Another is that sometimes, addr:street doesn't match the street name. In some cases, this is just a matter of the street name being abbreviated here. A third issue is that there is at least one street name that is partially abbreviated.
Many of the buildings in the dataset have unnecessary nodes where there is no change in the angle (it's a straight line). These nodes will be removed from the building.
Fulton County has been divided into 976 overlapping regions roughly according to Census block groups. It is expected that there will be at most 4000 node/way additions/changes (although there seems to be at least one region that has 5000 node/way additions/changes). The upload itself will take place in JOSM.
Shapefile containing grid: https://drive.google.com/file/d/0B30vrP6AZTFyWVVRRkJfM2JIZ2s/edit?usp=sharing
The import will begin in the city of Atlanta, which is the largest city in Fulton County, and will extend north and southwest. The account used for the import is Saikrishna_FultonCountyImport.
Update 4/22/2014: Much of Atlanta has been done. The import will soon expand into the southwest and the north.
Update 6/11/2014: Nearly all of Fulton County has been completed. Some preliminary visual analysis of the results shows that there is some manual data cleaning which is necessary. See the Manual Data Cleaning section below for more details on the issues which have been identified and plans for how they can be remedied.
In the case of a future update, the address number and street can be used to match up addresses in OSM and the new dataset. A separate ID is not planned to be stored.
Manual Data Cleaning
Data issues which have been identified as a result of this import.
Building Footprint Error
Looking at the Old Fourth Ward Park in Atlanta, it was discovered that a building footprint has been added during the import which is erroneous. This area has been recently redeveloped (circa 2012) into a park with a large water retention pond, and these features have been added to the map. The solution is to edit OSM and delete the erroneous building footprints.
Houses Missing Addresses
In residential areas, some address tags have been assigned to small outbuildings (e.g. garages or sheds), but not to the main house structure. The solution is to start and editing session in OSM, and add the address information to the house building feature's address tag.