OS Locator Musical Chairs

From OpenStreetMap Wiki
Jump to: navigation, search
Musical Chairs showing the effects of the May 2013 OS Locator update
OS Locator Musical Chairs ([1]) is map tool for Quality Assurance and completeness measurement in the UK. It is a browseable map interface to the results of a "smart" fuzzy-matched musical chairs algorithm, highlighting disagreements between OS Locator and OSM.

http://ris.dev.openstreetmap.org/oslmusicalchairs

Bookmarklet

For a little added convenience, here's a bookmarklet that can be used to jump from the openstreetmap.org main map page to the same view on musical chairs:

javascript:(function(){var a=/map=(\d+)\/(-?\d+(.\d*)?)\/(-?\d+(.\d*)?)/.exec(window.location.hash);window.location="http://ris.dev.openstreetmap.org/oslmusicalchairs/map?zoom="+Math.min(18,parseInt(a[1]))+"&lat="+a[2]+"&lon="+a[4]}())

To install it, create a new bookmark and paste the above code into the destination url field. Run it by loading the bookmark when viewing the desired location on openstreetmap.org (more information). This bookmarklet works at time of writing, but could in theory break if the front page hash scheme changes significantly.

FAQ

What is the difference between the circles and the rectangles?

Musical chairs shows both authoritative and non-authoritative views. Non-authoritative views are shown when there are too many results in a particular view to show or load at once. They only show the first n (currently 1024) results and hence are non-authoritative because they are not showing you everything.

Once there are fewer than n results in a view (usually because you are zoomed in enough), musical chairs can show you all of them. This is an authoritative view.

Non-authoritative views show results as small circular points, generally because they behave better when zoomed way out. Authoritative views show the raw OS Locator bounding boxes.

So basically, when you're seeing circles you're not seeing every result, when you're seeing rectangles, you are.

I don't find these non-authoritative views useful. Get rid of them.

No. If you want you can effectively turn them off by choosing the only authoritative views mode. In this mode if there are more than n (1024) results in a view it will simply show nothing. You'll have to zoom in further before you see anything.

What do the different colours mean?

Red is used for entries that have no match in OSM.

Entries that have matches vary from green (near perfect match) to blue (unlikely match).

Entries that have no name (i.e. just a ref) and don't match exactly an OSM way with that ref and no name are marked in orange. These technically are a "no match" but are perfectly possible and valid situations. They exist due to a slight logical impossibility in the matching algorithm which I may fix sometime.

Entries whose matches have a "not:name" tag exactly the same (case insensitive) as the entry are bright pink.

I see some of the boxes/circles have dashed lines. What does that mean?

Dashed lines are used for entries whose matches don't have matching refs.

How often are the matches updated?

As of 2010-08-10 they're updated twice a day.

Why does it say Near perfect match for matches that are perfect?

The normalization step of the algorithm [2] totally ignores various things such as capitalization and punctuation and partially ignores common abbreviations. A match that the algorithm considers effectively identical might not be totally identical. For a start OS Locator supplies all of its street names in full caps, so the two are probably not identical capitalization-wise.

What OS Locator version is this based on?

As of 2015-05-21 this is based on the OS Locator May 2015 release (OS_Locator2015_1_OPEN.txt).

What's this "id" you're using for Locator entries?

The Locator database doesn't come with any sort of primary key to uniquely identify its entries, so I had to come up with my own. Beyond that, I had to try and assign ids that were "stable" across Locator releases, so that match history wouldn't be lost and references wouldn't be broken.

The scheme is based on all the OS Locators that were released since the initial release on 2010-04-01 (OS_Locator2009_2.txt), successively modified to cope with the newer releases. Locator entries in the first release were given ids based on their line number in the C(olon)SV file.

For each entry in release n, release n-1 is checked for an entry which has an identical bounding box, name and ref fields. If one is found, the entry is assumed to be the same and is assigned its old id. All remaining entries in release n that weren't found in release n-1 are assumed to be new and are given new ids, starting at the highest previously-assigned id, in the order of the entry's appearance in release n. This process is repeated for each release. The resultant data files are available at http://ris.dev.openstreetmap.org/oslmusicalchairs/data/oslocator in the off chance case anyone finds them useful.

Can I filter the streets that I see by ref or name?

Funny you should ask that, imaginary person. As a matter of fact, you can now pass the map view a little cheat code - name= and ref= parameters, and the map will only show you Locator entries that exactly (case insensitively) match the specified string. I had to add this to help me figure out wtf the A6 was now doing around Bedford with a new Locator version. It's very useful for things like that. I used the url:

http://ris.dev.openstreetmap.org/oslmusicalchairs/map?ref=a6

Or to only show streets named "high street" you'd use:

http://ris.dev.openstreetmap.org/oslmusicalchairs/map?name=high+street

Can I have a dump of the data?

Sure - the only data oslmc generates itself is a correspondence list. All other data is sourced straight from OSM or OS Locator. So I generate a pretty basic dump to http://ris.dev.openstreetmap.org/oslmusicalchairs/data/matchdumps/ once a week listing the latest match status for each OS Locator street. It's a simple UTF-8 encoded csv (as generated by the default options of python's csv module) with the field order:

OS Locator id , OS Locator name , OS Locator ref , OS Locator bounding box centroid WGS84 longitude , ...and latitude , ldist (a sort of hybrid levenshtein distance) of latest match , latest match OSM way id , latest match OSM way version , latest match OSM way name , latest match OSM way ref

Obviously OS Locator streets with no current match will have the latter fields blank.

I've included only the bare amount of information to be both useful and easily readable - further data fields can be pulled from OS Locator or OSM.

Is there an RSS feed?

Yup. Three of them.

Each takes an optional bbox= parameter to restrict the covered area, but these links can be generated automatically if you access the RSS feeds from the top right corner menu of musical chairs when viewing the area.

They aren't amazingly fast at the moment, especially for big areas.