User:Harry Wood/OAF discussion

From OpenStreetMap Wiki
Jump to navigation Jump to search

An IRC chat with RichardF SK53 about UK Open Address File ideas Friday 10th July 2014

IRC log:

15:23 harry-wood: The ODI folks are asking me to do an OpenStreetMap talk at this: http://www.eventbrite.co.uk/e/open-addresses-symposium-tickets-12175542375
15:25 harry-wood: I think I'll have to poach something from SK53's blog. Some clever GIS analysis related to address data
15:37 PeterMead: The Open Addresses Symposium sounds interesting.
15:40 PeterMead: I was intending to create an open address database myself with data primarily sourced from OSM. But then I started adding addresses to OSM instead and that has filled what little free time I had.
15:58 Firefishy: harry-wood: or ask SK53 if he is willing? ;-)
16:01 harry-wood: Yeah! Come along and do a talk about OSM addresses data SK53!
16:09 ris: oh yes a jerry rant would be good
16:10 harry-wood: One thing about the OAF plans, is that they're very much hoping to avoid a sharealike clause in the licensing of the dataset
16:11 harry-wood: So that kind of rules out the use of OpenStreetMap as one of the seed sources of data, and also as a platform for allowing community contribution/correct of the data
16:13 RichardF: well, good
16:13 harry-wood: except... a suggestion I floated, was that they could do some analysis of how many UK contributors of address data there actually are (I suspect few than <1000) and maybe we could look at asking those people for dual licensing permissions
16:14 harry-wood: complicated I know. Both from a tech perspective (intertangled data contributions) and from a legal perspective
16:14 harry-wood: but at least it's not a total dead end
16:15 harry-wood: RichardF is that treading on your toes with any plans for a "go public domain!" campaign?
16:16 RichardF: I don't have any plans for a "go PD" campaign! I just think that you'd be fairly mad to start a sharealike address database, so I'm pleased that the OAF people have ruled that out
16:17 RichardF: what it and OSM need is a competent, user-friendly mobile surveying app, but I've banged on about that before ;)
16:17 harry-wood: What I'm thinking is... I can well imagine even just asking the top 30 address mappers in the UK for dual licensing permission would probably actually get you 50% of the OSM UK addresss data
16:18 RichardF: I can see some logic in that, yes
16:18 harry-wood: Need to run some number crunching analysis on that kind of idea, but I reckon it might well work out that way
16:19 EdLoach: I can't give you postcodes dual licenced, but the housenumbers and streetnames are from survey.
16:19 RichardF: I suspect there's a lot you could do with clever analysis of OS OpenData, too
16:39 PeterMead: The Land Registry Prices Paid is a good source for determining the correct postcode for roads with more than one. It usually fairly obvious if it all odd numbers are one postcode and all even numbers are another, or if they're in ranges of numbers going down the street.
16:51 osmbot-test: adtyne just started editing near Powys, Wales, United Kingdom with changeset http://osm.org/changeset/24084951
17:01 SK53: ris: do I rant :-)
17:02 SK53: harry-wood: have signed up for the day so happy to talk
17:03 SK53: harry-wood: actually my main plan would be to try & partially geocode land Registry & Social Housing addresses as a start; OSM would be best source, but OS OpenData would also be possible as a starting point
17:04 SK53: harry-wood: dual licensing OSM data is a non-starter because you dont know how much OSM data was used to assign an address (e.g, use OSM to find a road name & assign addresses thereafter)
17:12 harry-wood: SK53: Ah great
17:12 harry-wood: I mean... great that you're coming
17:12 harry-wood: Not so great that you're saying it's a non-starter :-(
17:15 harry-wood: The openaddresses file idea is a specifically about that final part of the address... the house number.... and the coordinates thereof.
17:15 harry-wood: We've got open data on postcodes, and open data on streetnames
17:15 harry-wood: The issue everyone's het up about now is that Royal Mail was sold off along with PAF
17:15 harry-wood: database of every address (housenumbers and positions)
17:18 harry-wood: I guess strictly there's some derivation entanglement for that data in OSM. Someone comes along and adds the housenumbers in, they're positioning those nodes based on the positions of the skeleton road network, or other data, so strictly it's all tangled up together even if a node only has one edit by someone who's willing to dual license
17:20 harry-wood: To me that feels like a "loose" derivation though.
17:21 harry-wood: And of course we have to hope it is a "loose" enough derivation given our redaction bot algorithm made not account of that kind of thing :-O
17:22 RichardF: in the UK, because of sweat-of-the-brow, I suspect SK53 is right. in any sane jurisdiction, you could point to the fact that you could have used OS OpenData to get _exactly_ the same result without any sharealike contamination, and you'd be ok
17:29 SK53: PeterMead: LRPP data has a fair number of errors which need to be screened out
17:30 harry-wood: With these dual licensing ideas, is there also an issue around whether you're allowed to take data out of the OpenStreetMap database and relicense it, even with the permission of the person who put it into the OSM database?
17:30 SK53: RichardF: harry-wood: actually I think the immediate difficult issue is not OSM SA clause but restrictions on use of postcode centroids
17:31 SK53: harry-wood: well probably OK, but how many roads have not been tweaked multiple times by folk
17:32 : achadwick left the room (quit: Quit: Ex-Chat).
17:32 harry-wood: Well that's the thing. I'm not interested in the roads, or the postcodes. I'm just interested in address nodes. The only reason I *might* be interested in who edited a road, is if I was worried about the loose derivation of node positions based on the position of roads
17:34 SK53: harry-wood: the road data is critical for helping build sensible address data without having to survey every single address
17:34 SK53: harry-wood: unless you want an address file w/o geolocation
17:34 SK53: harry-wood: and I query just how open postcode data are
17:35 ris: SK53: in a good way
17:40 harry-wood: Other folks in this Open Address File consortium will be looking at what can be done to fill in gaps via some guestimation work using whatever datasets (probably a bunch of OS opendata stuff) So by this I mean guestimation to service a use case of finding an address, when the database doesn't have an exact location of that house number.
17:42 harry-wood: They probably aren't interested in using OpenStreetMap for feeding into that process. But the interesting opportunity. The thing which is a shame if we can't help them with... is if we have a node in our database for an exact location of a housenumber... *if*... obviously everyone's aware that OSM doesn't have anything like complete coverage of that
17:43 harry-wood: And maybe it's two different questions. 1) Will we be able to feed our existing address nodes data into OAF? 2) will we be able to be the platform for community contribution to that
17:43 : PeterMead left the room (quit: Quit: Leaving.).
17:44 SK53: harry-wood: this post http://sk53-osm.blogspot.co.uk/2013/12/assigning-addresses-from-land-registry.html for instance is mainly about general issues about address assignment, only partially OSM relevant
17:44 harry-wood: I think a worst case scenario might be that we give up on (1) and only do (2) as a one-way thing where some data collection app feeds into both OSM and OAF
17:45 harry-wood: But in that scenario we'd also end up with the annoying problem that people would sometimes be contributing addresses which OSM already knows about
17:45 SK53: harry-wood: LRPP has about 12 million addresses, we have 0.5 million, personally I'd start with former, the interpolation issues will be the same
17:46 SK53: harry-wood: from an OSM perspective I'd rather contribute to solving how to get LRPP addresses suitable coded in an Open but non-SA form, as then they can be included in OSM
17:47 harry-wood: So there's a challenge there of positioning the individual houses within a postcode area right?
17:48 harry-wood: Perhaps not impossible through some automated means, guessing how addresses are arranged along a road
17:51 RichardF: harry-wood: especially if you can recognise the shapes from OS StreetView - i.e. there are 12 distinct buildings along here, you know that the one at 51.5,-0.2 is number 5, so...
17:59 SK53: harry-wood: the challenge is to know which way house numbers run (odds/evens or clockwise/anticlockwise) & then you can do a lot of decent interpolation
18:00 SK53: harry-wood: I have a table derived from LRPP which attempts to estimate whether a postcode covers addr:interpolation all/odd/even
18:02 : blackadder left the room (quit: Quit: Gone).
18:04 SK53: harry-wood: the other thing which is needed is a much more comprehensive street gazetteer than provided by OS Locator
18:05 harry-wood: why? Is it missing lots of streets?
18:05 : ris left the room (quit: Quit: Konversation terminated!).
18:06 chillly: judging by the way extra streets are being added to OS Locator that have been there for years, there may be lots missing
18:06 osmbot-test: Rostranimin posted a new note near Aberargie, Perth and Kinross, Scotland, United Kingdom http://osm.org/note/198600
18:07 chillly: OS Locator also repeats streets as they cross admin boundaries which is not great, as well as the obvious errors that I estimate run at around 3%
18:24 : EdLoach left the room (quit: Remote host closed the connection).
18:36 SK53: harry-wood: OS Locator does not contain any 'streets' which only have pedestrian access; I'd estimate this is ~5%+ of streets in Nottingham
18:36 SK53: chillly: actually the admin boundary thing is OK with Locator for address assignment (I have an equivalent OSM highway dataset chopped up with Boundary Line)
18:38 harry-wood: SK53 RichardF Chillly If you don't mind I'm going to stick a record of this conversation somewhere on the wiki. Some good thoughts which I should feed into the OAF discussions
18:39 SK53: harry-wood: where are OAF discussions taking place?
18:40 harry-wood: Well JeniT got in touch with me a little while ago. And there's a google doc somewhere
18:41 SK53: harry-wood: there's a list of suitable sources on github somewhere, but other than that I'm not aware of stuff
18:52 SK53: harry-wood: hmm seem to have been some commits of mongo-db based code on github, but it presupposes all sorts of things about address data structure https://github.com/theodi/open-addresses
18:53 harry-wood: Oh yeah. That's from a year ago though.
18:57 SK53: harry-wood: OK, glad of that, looked like a really lousy way to start off
18:59 harry-wood: perfect is the enemy of the good with this kind of thing though. ....especially when it comes to designing a datamodel
19:00 harry-wood: but I yeah I guess they've had a few different thoughts since that early experiment
19:08 : shaunmcdonald left the room (quit: Quit: shaunmcdonald).
19:12 SK53: harry-wood: no a fucked up data model means that you fight everything for ever more; getting a data model right is important (and one reason why OSM works); a straight replication of PAF is a reasonable short-term target, but as its a delivery point-based addressing system it falls down on lots of points (not least in not providing addresses for all buildings in the country)
19:15 harry-wood: Yeah but on the flipside designing a datamodel by committee to account for lots of usecases That's what they did with transXchange, and aint pretty :-)
19:18 SK53: harry-wood: well obviously a data model should be designed by a data modeller taking account of various usecase; but the beauty of a good data model is that it can handle multiple use cases
19:20 SK53: harry-wood: look at the total crap in PAF regarding postal town & you see what happens when you try to shoehorn data into a poor model; with Royal Mail insisting that people live somewhere they dont
19:21 SK53: harry-wood: and also PAF handles named terraces with housenumbers very poorly indeed (overloading the building name); on the other hand PAON. SAON works quite well
19:23 : shaunmcdonald [~shaunmcdo@97e27940.skybroadband.com] entered the room.
19:31 chillly: SK53: Postal towns are a deliberately misleading. Then there's the county. Royal Fail say you don't need a county in your address, yet still call every address in East Yorkshire 'North Humberside'. Humberside was disbanded in 1996, and there never was a 'North' Humberside.
19:48 : ris [~ris@host-78-147-49-148.as13285.net] entered the room.
19:52 SK53: chillly: exactly, there's no reason why an open replacement has to be as brain-dead
19:53 SK53: harry-wood: perhaps Matt Williams would be a good person to talk; his code might be usefully repurposed http://milliams.dev.openstreetmap.org/postcodefinder/about/
19:55 harry-wood: Thanks for all the info
19:55 harry-wood: Gotta go now