TM Open Buildings
TM Open Buildings is a dataset of manually-drawn building outlines covering 12 Philippine cities with detailed annotations on building and roof attributes as seen over satellite imagery.
Download Links
The full TM Open Buildings dataset contributed last Dec 2023 may be downloaded in a single file format through the following links:
Source | Page |
---|---|
Kaggle | link |
HDX | link |
Alternatively, you may also download a single tile by referring to our table of locations, and using the coordinates for a query in Overpass, or any other OSM tool
The organization
Thinking Machines Data Science, Inc. (Thinking Machines or TM) is technology consultancy based in Southeast Asia that builds AI and data platforms for large organizations. We work with decision-makers in both public and private sectors to bring them data-driven insights.
Background
Ensuring data completeness is an ongoing challenge across the globe. In the Philippines, many areas and points of interest still remain unmapped on OpenStreetMap, especially for more rural regions and informal settlements.
At TM, we are avid users of open geospatial data from map data like OpenStreetMap to freely available satellite imagery. For many of our social impact projects like estimating poverty levels in the Philippines, mapping solar suitability for renewable energy projects, and identifying suitable aquaculture areas for mangrove restoration, OSM data is a key data input for gaining geospatial insights. After years of actively using OSM, our team at TM is excited to also be data contributors and help improve data completeness for vulnerable communities in the Philippines.
With support from the Lacuna Fund, TM and local partners from climate, health, and informal housing sectors are developing open-source datasets to support the currently limited research on health impacts for vulnerable communities caused by climate change.
Data contribution
After receiving feedback from the local mapping community to better align our attributes to that of the mapping conventions of the local mappers, we made corrective edits on our OSM data contributions to ensure adherence to the community's defined tagging standards. Specifically, we removed the following tags to all of the buildings we uploaded:
building:part=roof
roof:material=*
roof:shape=*
roof:levels=*
residential=gated
We are releasing TM Open Buildings, a dataset of manually-drawn building outlines covering 12 Philippine cities with detailed annotations on building and roof attributes as seen over satellite imagery. This dataset is developed by TM's Geospatial Team for open-source building detection computer vision models in collaboration with data annotators. The outlines are uploaded in OpenStreetMap starting Q4 2023 as part of the project, nicknamed Project CCHAIn (Climate Change, Health, and Artificial Intelligence).
Outlines are drawn over 250x250m tiles within the following cities: Dagupan City, Palayan City, Navotas City, Mandaluyong City, Muntinlupa City, Legazpi City, Iloilo City, Mandaue City, Tacloban City, Cagayan de Oro City, Zamboanga City and Davao City. These tiles are selected to target primarily residential areas over a variety of neighborhoods, terrain, and climate types. Data on the locations of the tiles are given in the following table.
We drew the outlines using Mapbox imagery as of August 2023. The team has consulted HOTOSM Philippines to ensure that our contributions are documented properly. We will take into consideration the feedback from local mappers as local knowledge always precedes, and will always provide changeset comments that are in compliance with OSM changeset guidelines.
For more information, our team's mapping project, annotation, and data processing steps are documented publicly in our Github repository. Please use the Issues tab to raise concerns about our data.
Data attributes
We used the building roof to define the extent of our outlines and the attributes we tagged. Based on named places in the basemap and local knowledge, we were able to classify buildings into 2 categories:
- Settlement - purely residential buildings, or commercial/mixed-use establishments with residential upper floors
- Nonsettlement - purely commercial/mixed, industrial, and institutional buildings
Settlement
We further divide settlement into these two categories
Single settlement
These are individual buildings that are (1) visually separate from neighboring buildings OR (2) adjacent to neighboring buildings, but separation of roofs is clearly seen The following are the list of attributes and the equivalent OSM key and tag we used for single settlements as of March 2024 (Please see note above for changes)
Attribute | Type | Characteristics | OSM Key and Tag |
Roof Material | Looks rusty when old, silver/gray when new, lines and patches are usually evident | roof:material= metal_sheet
| |
Metal Sheet/Tiled | Whole roof is usually one solid color, tiled roofs have texture | roof:material= roof_tiles
| |
Concrete | Flat, usually has raised white edges, no visible roof “folds”, may be smooth or have objects on top | roof:material= concrete
| |
Roof Layout | Single Layer Basic | No complex architecture. Plain flat or rectangular roof. At most 4 faces are visible. 1 single layer visible. | roof:shape= gabled
|
Single Layer Intricate | Complex shapes on rooftop and multiple vertices, more than 4 faces visible, but still 1 single layer | roof:shape= hip-and-gabled
| |
Multilayer | One roof on top of another. A shadow separating the roofs is seen. | roof:levels= 2
| |
Within gated community | Uniform roof and lot sizes, structured street layout, “themed” street names, development name given in address | residential = gated
|
Dense settlement
These are clusters of buildings that are drawn as an area, and are typically:
- Densely packed with overlapping rooftops, impossible to distinguish from above
- Have ≤50 sqm roof area
- Roof materials mostly natural/light, mixed, galvanized iron (GI)
- Very narrow streets, or no visible streets between houses
The following are the list of attributes and the equivalent OSM key and tags we used for dense settlements:
Attribute | Type | Characteristics | OSM Key and Tag |
Is a dense settlement | - | Densely packed small houses with overlapping rooftops
Rooftop materials mostly natural/light, mixed, galvanized iron (GI) Narrow one-lane streets, or no visible streets between houses |
residential = irregular_settlement
|
Nonsettlement
We only have one attribute tag, building height, for nonsettlements.
Attribute | Type | Characteristics | OSM Key and Tag |
Building height | Low | 1-5 storeys | note=”This building has 1-5 storeys" |
Medium | 6-15 storeys | note=”This building has 6-15 storeys" | |
High | >15 storeys | note=”This building has more than 15 storeys" |
Process Flow
We describe our annotation and quality checks processes below
Phase 1: Data Annotation
The data annotation process begins with drawing outlines on a tile. After the outlines are drawn, the annotator adds specific attributes to each of these outlines. To ensure the quality and accuracy of the annotations, there are two levels of quality checks involved. The first quality check is done by the annotator themself, after which the data can either be accepted or rejected. If accepted, it undergoes a second quality check on a sample of 10% of all buildings. For both quality step checks in this phase, a rejected tile goes back to the annotator for needed corrections.
Phase 2: Post-processing
Post-processing is done to ensure all manually drawn outlines comply with OSM standards. First, TM fills in any missing data that might still be present. Next, TM flags outlines which are likely mapping errors, which include the following:
- Small houses <20 sqm
- Thin houses (area <50 sqm and any side <4m)
- Sharp corners (<60 for all houses, and additionally >120 degrees for bigger houses)
- Intersecting outlines
- Anomalous attributes
TM checks all of the flags and corrects the outlines and attributes to ensure that the data is accurate and consistent.
Phase 3: Upload
During the upload phase, TM first obtains the existing OSM buildings data for each tile. If the tile does contain OSM buildings, the TM resolves conflicts by conflation procedures and adds building attributes, otherwise, TM directly uploads the buildings.
Frequently Asked Questions
What does the data include?
TM Open Buildings is a dataset of building footprint outlines of settlement and non-settlements with annotated physical characteristics as seen on satellite imagery.
What is the data intended to be used for and why is the data being released?
The data was created with funding from the Lacuna Fund as part of the datasets developed under Project CCHAIn (Climate Change, Health, and Artificial Intelligence) which aims to address the knowledge gap on health impacts from climate change for informal settlements in the Philippines. For example, overlaying this data with various hazard information such as flooding, landslides, and fault lines can bring much more granular and targeted insights to disaster risk reduction research and response.
How updated is the data?
The data was developed from August-October 2023. To support the nature of the OSM platform, we welcome users to actively participate in the continuous updating and improvement of the data based on local knowledge.
How can I download the data?
You may download in Kaggle or view in OpenStreetMap.
How is the data created?
Thinking Machines collaborated with data annotators to draw and annotate satellite imagery in select areas around the country, in line with the project’s geographical focus. These annotations were then quality checked, post-processed, and conflated by TM with any existing OSM tags before uploading on the OSM platform.
What is the coordinate reference system?
We used EPSG: 4326 to draw the outlines.
How can I use the data for a machine learning project?
You can use outlines in combination with the basemap imagery and the tile bounding boxes provided in this table to create an annotated tile that can be used to train a computer vision model. This model can detect buildings and/or assigned roof attributes on other areas we have not yet covered.
Will there be more data coming for new areas?
There are currently no plans to release more data within the scope of the project, but TM is interested in continuously contributing to the OpenStreetMap community.
I want to add details and/or make changes. Do I need to inform Thinking Machines?
Yes, improvements to our datasets are always welcome. In accordance with OSM policies, you do not need to ask permission to modify existing data, but please keep in mind OSM’s Code of Conduct and mapping best practices at all times.
How large is this dataset?
I know this building from our location- why does it seem smaller in your outlines?
We acknowledge this possibility. Our outlines may be smaller than the actual building because we drew outlines using the building’s roof rather than its walls or lot extent. Those are usually the same shape but there are a few cases where the roofs are smaller.
Why do some buildings appear misaligned from the satellite imagery?
We used a Mapbox basemap as of August 2023. While Mapbox updates its imagery, there may be instances where the images used to identify buildings are not the same as the current ones due to different timeframes of the source imagery. Additionally, when there is a sizable difference in the satellite camera vantage angle, buildings may seem to be offset compared to the current imagery.
Can I create a derivative dataset and release it?
Yes, the Open Data Commons Open Database License (ODbL) license allows you to use our dataset for a derivative product. However, if you create a derivative work using ODbL-licensed data, you must release that derivative work under the same ODbL license, i.e., it must also be open and share-alike.
Legal Notices
The License granted for TM Open Buildings does not grant rights to use the name, logo, or trademarks of Thinking Machines.
Thinking Machines reserve all other rights whether under their respective copyrights, patents, trademarks, or other intellectual property, whether by implication, estoppel or otherwise.
TM Geospatial team
Active contributors
Planned future contributors
- TMPH-Anica
- TMPH-Abby
Feedback
For specific issues about the building outlines, kindly file it as an issue on our Github repository. For more information, suggestions, or general feedback, contact the TM Geospatial team at data-for-development@thinkingmachin.es.