OslVosm

From OpenStreetMap Wiki
Jump to navigation Jump to search

OslVosm (http://osm.tiiiim.com/oslVosm ) is a script developed to compare the OS Locator (OSL) data to the OpenStreetMap (OSM) data for a particular area. It is not recommended to use this script to compare the entire UK OSM database - for such a script, please see http://humanleg.org.uk/code/oslmusicalchairs.

About OS Locator

OS Locator is a file with a list of road names and positions for the UK. These road names, amongst other things, have waypoints for their centroid. The road names can be compared with data in the OSM database, highlighting the following issues:

  • Missing roads in OSM
  • Roads with missing names in OSM
  • Roads with mis-spelt names in OSM

OslVosm will perform these comparisons on a particular area.

Prerequisites

This script has been tested using Ubuntu 9.04. It requires the following:

Preparation

Before using this script, the raw OSL data file must be massaged to provide only the road names for the particular area in which we're interested. This is done using the same method as described on the OS_Locator page. Note that this only needs to be done once every time the OS Locator data is updated - it does not need to be done every time this script is run!

Script

Download

Old updates

  • Changed url in wiki output. The url pointing to the OpenStreetMap location of the discrepancy now uses mlat/mlon, which puts a marker on the map at the correct position.
  • Changed the level of error reporting. The script will now only report level 1 (i.e. critical) errors: all lower level warnings are suppressed (as they highlight my lazy coding...!).


Once downloaded, it must be made executable:

chmod +x oslVosm

Usage

  • Normal use, providing both the OSM and OSL data, and outputting a GPX file:
./oslVosm osm_data.osm osl_data.gpx --gpx
  • Normal use, providing both the OSM and OSL data, and outputting a GPX, KML and WIKI file:
./oslVosm osm_data.osm osl_data.gpx --gpx --kml --wiki
  • Smart use, providing the OSL data and letting the script download the OSM data. Also outputting a GPX, KML and WIKI file:
./oslVosm osl_data.GPX --gpx --kml --wiki
  • The options (--wiki, --gpx, --kml) are not case sensitive (so you can use --WIKI, --gpx, --KmL if you really want!).
  • The ordering of the file arguments and options is not important, BUT:
    • The OSL data must have a *.gpx file extension.
    • The OSM data must have a *.osm file extension, else it will not be recognised and the script will attempt to auto-download the OSM data.
    • The file extensions are case-insensitive, so can be *.GPX and *.OSM if required.

Manually downloading OSM data

Normally, you can just provide the script with the OSL GPX file, and the script will download the relevant OSM data for the same bounding box. If you know the bounding box you want to compare and want to download it manually, use wget as follows (for Bath):

wget http://api.openstreetmap.org/api/0.6/map?bbox=-2.45299059999,51.33705469999,-2.2772094,51.4249453 -O Bath_Data.osm

Script workings

  • The primary aim of the script is to compare the road names in the OSL data with those in the OSM data, and provide feedback in the form road discrepancy files. This is achieved in the following manner:
    • Parse the OSM data, and determine a bounding box for each OSM highway.
    • Parse the OSL data, and see if the OSL waypoint sits inside any of the OSM way bounding boxes.
    • If an OSL road fits inside an OSM way bounding box, the OSL roadname is compared to the OSM way name using the php similar_text() function.
      • If the two names match perfectly, the function returns a 100% match. It is assumed that the OSM road is the same as the OSL way, and does not need editing.
      • If the two names are 89% to 99% similar, it is assumed that there is a spelling discrepancy between the OSL and OSM names. This needs looking into, so the road is saved and output to the relevant files.
      • If the two names are <89% similar, it is assumed that this OSM way is not that same as the OSL road.
    • If an OSL road does not fit inside an OSM way bounding box, OR if an OSL way does fit inside an OSM way bounding box but there are no name matches >89% it is assumed that the road does not exist in OSM. The road is saved and output to the relevant files.

OSM way bounding box

For each OSM way, all of its constituent nodes are collected. The latitude and longitude values of each node are compared to each other node to provide the minimum and maximum latitude and longitude values of the way, and thus the bounding box. A little bit extra (~15 metres) is added to each bound for a slight 'fudge-factor'.

Special cases

The word 'Saint' may have been abbreviated in the OSL data. According to Editing Standards and Conventions#Street Names, all occurrences of 'Saint' must not be abbreviated, so all OSL roads with 'St' or 'St.' at the start of their names are expanded to 'Saint' (it is assumed that no road name begins with St or St. = Street!). All abbreviated forms of 'St' and 'St.' in the OSM data are left alone, as these are then assumed to be spelling errors.

Ignoring some discrepancies

Say a road in the OSL data is Michaels Way. However, in OSM it is Michael's Way - this has been verified by surveying the area and checking the street signs. In order for the script to not output this road as one which has an incorrect name, ITO are using the not:name=* tag, into which would go the OSL name. oslVosm will pick up those highways with a not:name=* tag and not include them in any of the comparisons.

The old method of ignoring OSM ways was to add osl_ignore to the the note=* tag. The script will strip the punctuation and spaces from the two road names, compare them and if a 100% match is found (a location match must also be found) then the OSL spelling will be ignored, and the OSM data presumed correct. This is still currently supported, but it's best to move over to the not:name=* convention as future releases of oslVosm will no longer support note=osl_ignore.

Script output files

  • The script can output the following file types:
    • GPX file. This can be loaded into JOSM or Google Earth (GE), or onto a GPS receiver. The GPX file contains all the OSL roads which were not found in the OSM data for the given area as waypoints. The description field of each waypoint hints at any road mis-spellings (if the script finds any).
    • KML file. This can be loaded into GE, or used as a layer within an OpenLayers web application (such as here). The file contains all the OSL roads which were not found in the OSM data for the given area. It also provides names of roads (if any) which almost fit those of the OSL file, but not quite (i.e. a spelling error in OSM or OSL).
    • WIKI file. This plain-text file contains a wiki-formatted table of all the road name discrepancies found by the script, ready for copy/pasting to the OSM wiki. The table contains links to maps for each discrepancy, and if the script thinks that a road is mis-spelt, this is also included in the table along with a link to Potlatch for easy editing. See Bath/OSLocator_Comparison for an example.

Other features

  • If no OSM data file is provided to the script, the script will download the OSM data for the same bounding box as used for the OSL GPX file.

Script processing time

For an area the size of Bath, with 2486 OSM ways, the script takes around 20 seconds to compare all the OSM data to the OSL data, and provide the requested files.