Pittsburgh Building Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

The Pittsburgh Building Import is a planned import of buildings in Pittsburgh and the surrounding area. It is also a continuation of the Allegheny County Building Import, which was mostly completed when it was abandoned due to software changes. As a resident and active mapper in the Pittsburgh area, I wish to improve the map by adding building from a government dataset.

Goals

Improve OSM's coverage and completeness by continuing and completing the import of Allegheny County government building data, which currently stands at 78% complete. Finishing the import will add roughly 100,000 buildings to OSM. This matters because in areas where buildings haven't been imported, most buildings are missing. Generally, only notable buildings like schools, industrial buildings, and larger shopping plazas are traced, with smaller businesses, organizations, and homes being mostly absent. Furthermore, importing these buildings is better than hand-tracing due to the tediousness of the task. An import will free up future mappers to do other tasks and allow them to add POIs more accurately.

Schedule

Schedule is flexible depending on my free time and availability. Currently I have a goal of resuming the import by July 1, 2019.

Import Data

Data Source: http://openac-alcogis.opendata.arcgis.com/datasets/allegheny-county-building-footprint-locations

The data contains accurate building footprints for all of Allegheny County, plus a small buffer around it. It contains approximately 500,000 buildings!

Data License

I previously emailed the county government for permission to use this data in 2017, but the data has changed since. I will ask for permission again before importing, just to be safe.

Type of License:

ODbL Compliance Verified:

OSM Data Files

Will be available once I undertake the task of processing the data.

Import Type

A one-time import, with data merged into OSM via JOSM.

Data Issues

Our data is good but not perfect. Most building are traced accurately and neatly, while a small fraction are missing. There are also minor instances of invalid geometries.

Position Offset

The building have a small offset with respect to Bing imagery. I originally believed this offset was variable but now suspect that it's generally constant because every zone I imported required roughly the same correction when imagery was from the same source. Last time, I left the offset in the data and asked editors to fix the offset in JOSM, which worked out poorly. Often, contributors forgot to fix it. This time, I will rectify the offset in the processing stage as much as possible.

Cut-Off Buildings

Our dataset extends slightly beyond the borders of Allegheny county, which is a small bonus. On the other hand, the data cuts off at the edges in an inconvenient manner: If a building straddles the edge of the data coverage, it's just cut in half. These cut-off buildings only occur right at the edge of the area and they're a tiny fraction of the total import. Before making the data available to the community in OSM Tasking Manager, I will go manually remove the bad buildings in JOSM. This is practical because the buildings are in known locations and small in number.

Data Warnings

A small fraction of imported buildings create warnings or errors in JOSM when validated. The causes include self-intersecting ways and overlapping nodes. To counter this, I will ask all contributors to validate the import data layer in JOSM and fix all warnings and errors before adding it to OSM.

Out of Date Data

In some places the government data is out of date, even older than aerial imagery. Recently demolished or constructed buildings aren't up to date. This mostly manifests itself in declining urban areas where buildings in disrepair have been torn down, and in wealthy suburban areas where new housing is being put up. We will mitigate this by asking contributors not to import anything that contradicts satellite imagery.

Data Preparation

The data preparation method will be similar to the old import.

Tagging Plans

The data only contains two useful attributes: The shape of the buildings and the building type: residential, unknown, outbuilding, industrial/commerical, or public building. We will translate the tags in the same way as the old import. Residential and industrial commercial are mostly compatible with our definition of these terms, but I don't think outbuilding or public building fit into our tagging scheme. Thus, I will leave those as building=yes just like unknown buildings. Here's the tagging scheme:

FEATURECOD Government Tag OSM Tag
210 Residential building=residential
NULL, 200, 250 or 295 Unknown building=yes
240 Outbuilding building=yes
220 Industrial/Commercial building=commercial
230 Public building=yes

Changeset Tags

Add source=Allegheny County GIS, as well as a hashtag in changeset comments.

Data Transformation

This is a thorough description of how I prepared that data for import. I tried to make it detailed enough that someone could reproduce my work if they wanted.

Necessary Software

I did this on a PC running Ubuntu 18.10, some adaptations might be needed for other platforms.

QGIS, ogr2osm, JOSM, osmconvert

Gather Sources

Download the building footprints for Pennsylvania and unzip the zip file. Now you can inspect the shapefile in QGIS. Also download my GitHub repository, which contains several files I created.

Conversion to OSM Format

Now we need to convert the buildings shapefile to OSM format. We can do this with the ogr2osm utility, using the translation file I wrote:

./ogr2osm.py -t ./allegheny-translation.py "./building footprints/Allegheny_County_Building_Footprint_Locations.shp" -o "allegheny county buildings.osm"

Be advised that this command takes several minutes to run and requires around 4 GiB free RAM. After the conversion, you can open the resulting OSM file in JOSM given sufficient RAM (about 2 GiB).

Data Merge Workflow

Not done yet. Will probably resemble the previous import.

Current Status

This project is in the early planning phases. I still have to document it, process the data, and get community approval.