From OpenStreetMap Wiki
Jump to: navigation, search

I decided to try and vectorise the OS Opendata StreetView product for buildings in the London Borough of Southwark, with the intention of manually merging buildings into OpenStreetMap to give a reasonable coverage across the local area.

After doing some very detailed building and address work in East Dulwich I realised that StreetView isn't particularly accurate for building outlines, and that manually surveying them to get the precise building shapes in takes ages. So I don't see any problem with an automatic import of the rough building outlines from the rough StreetView tiles as an interim solution until fellow obsessives use ground surveys, aerial photography and the OS data to get the buildings right.

I did a lot of this with the help of Scott Day from the London Borough of Southwark, Robert Scott and others at the Hack weekend & technical workshop June 2010‎.

Step 1 - Getting a simplified raster image

First, Scott used Photoshop (or could easily use The Gimp) to pick out the orange buildings and turn the image into a big grey and black tile. This makes it much easier for tracing software to pick out the building shapes and ignore all the other features.

Swk buildings sample.png

Step 2 - Vectorising the raster image

Next, I used potrace (with the KDE GUI) to vectorise the buildings. After a little experimentation these settings seemed to work quite well:

  • Black level: 0.2
  • Corner threshold: 0.34
  • Optimisation tolerance: 0.2
  • All other options as default

The resulting EPS looks a bit like this:

Swk buildings eps.png

I then converted the EPS to a DXF file (a CAD format) using pstoedit, and used QGIS to import the DXF and save it as a shapefile. I also used some circuitous postgis wizardry to turn the multilines into polygons as follows:

 # Create the SQL file from the shapefile
 shp2pgsql -s 27700 swk_buildings swk_buildings > buildings.sql
 # Import the sql into postgis
 psql --cluster 8.4/main -f buildings.sql swk_buildings
 # Dump the postgis database with the two-step converstion
 pgsql2shp -p 5433 swk_buildings "SELECT gid , myid , ST_MakePolygon ( ST_LineMerge ( the_geom ) ) FROM swk_buildings;" -f swk_buildings_poly

UPDATE: I've just realised you can choose to create polygons rather than multilines when doing the DXF-SHP conversion in QGIS. Whoops.

Step 4 - Re-projecting the shapefile

Scott used ESRI to re-project the shapefile, since it was currently just a set of geometry polygons without co-ordinates and strangely was about 100 times smaller than it should be. I haven't yet worked out how to do this step with free software on Linux. This is the result before the reprojection anyway, which I think is pretty good:

Swk buildings shp.png

Since Scott's software produced a shapefile in OSGB projection I then re-projected this into ESPG:4326. At first I tried using ogr2ogr but it came out shifted about 50m out of position. So we used postgis instead, which is a bit of a palaver but it worked perfectly. First create a postgis database (in this case called swk_buildings) then process the shapefile:

 # Create the SQL file from the shapefile
 shp2pgsql -s 27700 swk_buildings_t1_region swk_buildings > buildings.sql
 # Import the sql into postgis
 psql --cluster 8.4/main -f buildings.sql swk_buildings
 # Dump the postgis database into the right projection
 pgsql2shp -p 5433 swk_buildings "SELECT gid , id , ST_Transform ( the_geom , 4326 ) FROM swk_buildings;" -f swk_buildings_4326

Step 5 - Converting the shapefile to an OSM XML file

Finally, you can turn that shapefile into an OSM XML file using shp2osm.

Using the simplify way and orthogonalise tools in JOSM made most buildings a pretty good match for the StreetView tiles, and considering that StreetView is pretty inaccurate anyway I think the result is worth merging into OSM. I'm not going to do this until Scott and I have improved steps 2 and 3, however.

Comparison with Mapseg

Mapseg is a handy Python tool written by TimSC that does a similar job. I ran the script on the same area and compared the results.

In terms of the time they both take, mapseg is far slower. On my computer (and with Scott's work computer for the one step I cant yet do myself) it took at most 20 minutes to complete the trace for a dense StreetView tile. With mapseg it took at least four hours; I actually left my computer on overnight to complete so I'm not 100% sure how long it took.

In terms of quality, I think this approach using potrace is much more accurate. In the example below you can see the mapseg output on the left and the potrace-etc. output on the right. My output is closer to StreetView in most areas, and in particular is better at picking up curved edges. The polygons are more complex and do require a bit of simplification.

You can see some results in Peckham in OSM here.

This looks like a pretty good result by potrace. It is probably the best option for inner city automatic tracing. Mapseg doesn't like buildings that are hard up against roads for some reason. On the other hand, potrace has may redundant nodes that could be removed. A square building should only need 4 nodes while it has used 10 nodes in some cases.--TimSC 10:35, 3 August 2010 (BST)

Mapseg tc comparison.png