Hamilton County Building Import
This is proposal describes the import, currently underway, of roughly 300,000 building footprints in Hamilton County, Ohio, United States. Hamilton County is the most populous county in the Cincinnati metropolitan area. Data is available from the Cincinnati Area Geographic Information System (CAGIS) via the City of Cincinnati under a public domain license. Imported building footprints include some address, height, and building use information.
There are two datasets involved in this import. Both are from the same source.
- Building footprints  contains attributes like estimated height, number of units, zoning use category and number of stories. The dataset contains 358,167 features.
- Parcel polygons  contains address information. We are not importing parcel boundaries but rather use these to add addresses to building footprints (prior to import) where buildings can be matched to parcels unambiguously. The dataset contains 419,342 features.
Data comes projected in NAD83 Ohio south state plane, EPSG:3735.
On its website, CAGIS doesn't explicitly indicate the copyright status of the building and parcel datasets but only requires that a disclaimer be acknowledged. The City of Cincinnati, a CAGIS member agency, lists both datasets as being in the public domain.
If you agree with the disclaimer provided on our data site and provide CAGIS with a data creator credit, then you may use our data.
So, please make sure that if you want to use CAGIS data (available GIS layers), refer us properly and provide CAGIS disclaimer on your website.
THE PROVIDER MAKES NO WARRANTY OR REPRESENTATION, EITHER EXPRESSED OR IMPLIED WITH RESPECT TO THIS INFORMATION, ITS QUALITY, PERFORMANCE, MERCHANTABILITY, OR FITNESS A PARTICULAR PURPOSE. AS A RESULT THIS INFORMATION IS PROVIDED 'AS IS'. AND YOU, THE REQUESTER, ARE ASSUMING THE ENTIRE RISK AS TO ITS QUALITY AND PERFORMANCE.
IN NO EVENT WILL THE PROVIDER BE LIABLE FOR DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES RESULTING FROM ANY DEFECT IN THE INFORMATION. EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. IN PARTICULAR, THE PROVIDER SHALL HAVE NO LIABILITY FOR ANY OTHER INFORMATION, PROGRAMS OR DATA USED WITH OR COMBINED WITH THE REQUESTED INFORMATION, INCLUDING THE COST OF RECOVERING SUCH INFORMATION, PROGRAMS OR DATA.
"Available GIS layers" refers to the shapefiles available for download on CAGIS's main website. We use a subset from the CAGIS Open Data Portal and credit CAGIS in the form of source=CAGIS on changesets and have added their disclaimer to Contributors#CAGIS.
Code used in the preprocessing of data prior to import, as described in the following sections, is be available on Github.
The import will have two phases, each with a separate tasking manager project:
- (Ongoing) Add CAGIS buildings that do not intersect with existing or recently deleted OSM buildings.
- (Planned) Manually conflate the remaining CAGIS buildings with OSM buildings. Where the CAGIS geometry and tags are superior to OSM data, this JOSM plugin will be used to transfer the geometry and tags to the existing way, preserving history.
When we started, there were already ~62,000 building footprints in the county contributed by OSM users. 82.5% of the import data (~295,000 buildings) however did not intersect with these. Buildings in the central neighborhoods were largely complete, so initially, the import has mostly affected suburban parts of the county. This makes things a bit easier as most of the buildings to be imported do not share nodes with existing geometries.
Very often, a building is deleted from OSM when the physical building is demolished. To avoid restoring demolished buildings, the conflation process will account for deletions that occurred after the CAGIS dataset was last updated. In case the deletion merely represented the replacement of one way with another way, CAGIS buildings that intersect with recently deleted OSM buildings will be included in the second phase for manual conflation. Participants are also tagging many demolished buildings as demolished:building=*, to keep these buildings from being restored either in the import or by armchair mappers by accident.
Assigning addresses based on parcels is simple in some cases and more complex in others. To avoid using sliver parcels, we first only consider a building as belonging to a parcel if 90%+ of its footprint overlaps the parcel. We start by assigning addresses in the simplest case where there is a clear 1:1 correspondence between parcels and buildings.
Next we move to parcels that match to multiple buildings. Most of these contain minor outbuildings like sheds or garages that are subsidiary to a major building like a house. We want to assign an address only to the major (largest) building in these cases. Some large parcels have many buildings (e.g. > 20), so in this step we only look at parcels with 2 or 3 buildings.
What remains are cases where address assignment is more ambiguous.
- Parcels with more than three buildings
- Buildings that sit across multiple parcels
- Multiple parcels with identical geometry underlying one or more buildings
This last case is seemingly used to indicate multiple ownership on the same parcel such as condos or duplexes. In these cases, we assign multiple semicolon-separated addresses to a single buildings sitting on multiple identical parcels.
As one final attempt to match addresses, any buildings sitting across multiple parcels, but where only one parcel has an address, and where the building is larger than 1000 square feet to avoid outbuildings, are assigned addresses from that parcel.
In total, the combination of these techniques, escalating from simplest to more complex produces address assignments (or correct non-assignments) for 86.5% of all buildings. The contribution of each method is displayed in the table below.
|Address assignment category||Number of buildings||% of total||Cumulative %|
|1:1 match with parcel||166900||46.6%||46.6%|
|multiple buildings per parcel (major building, address assigned)||62300||17.4%||63.9%|
|multiple buildings per parcel (minor outbuilding, no address)||67795||18.9%||82.9%|
|single building, multiple identical parcels, multiple housenumbers with semicolon||900||0.25%||83.2%|
|building larger than 1000sqft overlapping single parcel with address||11,875||3.3%||86.5%|
The remaining buildings not assigned addresses are quite scattered and present many issues that will just need to be dealt with manually at a later date, or by import with a better address dataset in future.
Many buildings contain points midway along an essentially straight line. Such points are simplified away with a tolerance of 0.2 meters. Topology is maintained where buildings share nodes.
Only the building footprints are being imported but some tags are be drawn from the parcel dataset. The buildings dataset has information on building height (including levels) and use category. The parcel dataset has address information. The address information goes into two tags, addr:housenumber=* and addr:street=*. addr:street=* is be derived from two fields: a name for the street and the street suffix, e.g. Avenue, Road, etc. Street addresses were be expanded from abbreviated forms and checked for Title Case Capitalization. Details on the method of address assignment from parcels are in the next section.
|Source field||OSM tag|
Buildings also have some information on use that could be mapped into various building=* tags.
|Source value||OSM value||Building Count||% of total|
Known Quality Issues
This is a list of data quality issues discovered during import. Keep an eye out for them as you validate.
- Some larger sheds/garages are tagged building:levels=2 or building:levels=3, which often seems implausible.
- Many buildings have one extra node that should have been removed by simplification. These should be removed if possible.
- Some buildings need squaring - orthogonality seems to vary a lot by neighborhood.
There is occasional disagreement between addr:street names and the names on streets, for example whether a way is named '...Street' or '...Road'. These can be hard to catch during editing - a query after the import may be the best way to catch these cases. This usually effects all buildings on a street.
Tasking manager projects:
- Initial import (underway)
- Manual conflation (planned)
The first phase of the import began in December 2018 and is ongoing.
The second phase can begin after the first is completed. Notice will be given on the imports mailing list when the second phase is ready.
Contributors will use special-purpose import accounts with names ending in
- Nate Wessel Nate_Wessel_cincyimport (on osm, edits, contrib, heatmap, chngset com.)
- Minh Nguyen Minh Nguyen_cincyimport (on osm, edits, contrib, heatmap, chngset com.)
- doktorpixel14 doktorpixel14_import (on osm, edits, contrib, heatmap, chngset com.)
- jonsger jonsger_import (on osm, edits, contrib, heatmap, chngset com.)
- Add your name here!