User:SK53/OS OpenData

From OpenStreetMap Wiki
Jump to: navigation, search

Thoughts, ruminations, notes to self (and anyone else if interested).

Contents

Tools

A minimum toolset (i.e., w/o programming ones own) to manipulate OS data under MS windows is as follows:

Users in other environments (Linux, MacOS) etc will use pretty much the types of tools. The list above is specifically about working within the constraints imposed by Windows.

OS Locator

This is a large CSV file (colon separated) without headers. I have found it most useful to add a header line (essential if you want the Lat/Long conversion to work in qGIS etc).

Getting bits of the file

Subsections can be selected by searching for council names:

grep "NOTTINGHAM" 

or in a more sophisticated way:

awk '{FS=":"} /$11 == "NOTTINGHAM"/}

Conversion to WGS84

A different process to that below is described here - OS Locator#Create GPX file of OSL data. This uses PHP which I am not familiar with so I tried something else. (See Discussion page.)

+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +a=6377563 +b=6356256.161 
+towgs84=446.448,-125.157,542.06,0.1502,0.247,0.8421,-20.4894 +units=m +no_defs

Vector Data ShapeFiles

Projections

Editing the .prj file to add these (see the example on the wikipedia link, but the parameters given there are incorrect), makes a big difference to the .OSM output. Note these parameters still only give an accuracy in the order of 5m, but they do seem to allow GDAL-based utilities to be used.

Here's the format used, the addition being the TOWGS84 section:

PROJCS["British National Grid (ORD SURV GB)", 
GEOGCS["unnamed", DATUM["D_OSGB_1936", 
SPHEROID["Airy - 1848",6377563,299.319997677743], 
TOWGS84[446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894]], 
PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]], 
PROJECTION["Transverse_Mercator"], 
PARAMETER["latitude_of_origin",49],  
PARAMETER["central_meridian",-2], P
ARAMETER["scale_factor",0.9996012717],  
PARAMETER["false_easting",400000], 
PARAMETER["false_northing",-100000], 
UNIT["METER",1]]

Its useful to have a script which copies a template file and overwrites the files delivered in the OSGB download.

Things to Watch

Using OS Locator to name roads in VMD

I've started playing around with this in postGIS, so here are my first thoughts /notes.

I've done everything in EPSG 27700 (the uncorrected OSGB36 projection). Using a CSV file with header from the OS Locator data, and using QGIS to upload a single 10km tile of Vector Map District (SK53 of course)with the Sit plugin. The PostGIS populate_geometry _columns function needs to be called to ensure all the data is accessible to PostGIS manipulation (other techniques such as manual population of the geometry_columns table don't work for me).

My intial idea was just to attach a name based on the centroid. To do this I found all lines in the VMD data which were in the bounding box of the locator road and then found that closest to the centroid:

CREATE TABLE blabla AS
SELECT a.rowid
     , a.osgb_name
     , a.bbox_min_e
     , a.bbox_max_e
     , a.bbox_min_n
     , a.bbox_max_n
     , a.osgb_bbox_geom 
     , b.gid
     , b."FEATDESC"
     , b.osgb_line_geom
     , a.osgb_point_geom
FROM SK53_osgb_Locator a, "SK53_Road_Line" b
WHERE b.osgb_line_geom && a.osgb_bbox_geom 

The various non-geometry columns are carried over to make it easier to investigate the data. Next we determine the distance of the centroid (osgb_point_geom) to the VMD road line (osgb_line_geom):

CREATE TABLE SK53_roads_distance AS
SELECT a1.rowid
      , a1.osgb_name
      , a1.gid
      , a1."FEATDESC"
      , ST_Distance(a1.osgb_point_geom, a1.osgb_line_geom) as distance
     FROM blabla a1 
;
CREATE TABLE new_road_line AS
select a.rowid
     , a.osgb_name
     , d.*
FROM 
     SK53_roads_distance c,
     (SELECT a1.rowid, a1.osgb_name, min(a1.distance) as distance
        FROM SK53_roads_distance a1
    GROUP BY a.rowid, a.osgb_name) a,
     "Road_Line" d
WHERE c.distance= a.distance
AND c.gid = d.gid;

This crude code works but two main problems emerge:

The basic principle being established the following approach seems more productive:

The initial matching is reasonably fast for this size dataset, and seems to substantially improve the ST_WITHIN step (probably because I have no idea how to tune/index for this type of query).

More to follow...

Personal tools
Namespaces
Variants
Actions
site
Toolbox