Talk:Import/Madison County, Kentucky Addresses

From OpenStreetMap Wiki
Jump to navigation Jump to search

In the resulting data... The addr:city 'Mt Vernon' and the addr:street 'Mt Vernon Road' should probably be expanded using 'Mount' Blackboxlogic (talk) 14:09, 24 July 2020 (UTC)

Good catch, @Blackboxlogic. We made these types of changes for other abbreviations but didn't catch this one. We will update the "Mt" abbreviations as described above so it is corrected for future edits made by OSM mappers, and check if any earlier edits need to be corrected.

Update: the data has been updated to replace 'Mt ' with 'Mount ' in both the addr:street and addr:city fields, and seems to be displaying as expected in the layer. Thanks for reporting the issue. --Dkensok (talk) 19:47, 24 July 2020 (UTC)


I'm sorry I gave you a tiny bit of feedback, waited for you to respond, and now I have more feedback. I promise this is all of it. I hadn't realized that the "export" of the data that I analyzed was only a small fraction and I've now scraped the rest of it from the API. I think the house numbers and street names are worth the effort to improve, the rest of it, I trust your judgement. Please don't assume all of this needs to be fixed! This import will be a major contribution even with some data errors.

Madison Data Set - Volunteer Quality Review

Geometry

Looks great!

House numbers

55 house numbers are empty, ' ', or '0' (I would filter these elements out or flag them for extra review, maybe a fixme tag, though... importing fixme tags is frowned on). Some House numbers have letters on the end, which should probably be the unit?

- - - These features have been removed from the dataset. --Jshimota (talk) 23:00, 7 August 2020 (UTC)

Street names

39 distinct street names end in a space (I'm told iD trims whitespace from tag values, if true, then this is a non-issue). Some street names have unexpanded suffixes 'Rd', 'Ln'. Some street names have unexpanded pre-direction 'N'. Some street names have unexpanded post-direction 'N'. Some street names have double spaces in them. Some street names seem to include addresses? (Contains ' no ' or '#' then a number). Some street names start with 'st' which should be 'Saint'.

- - - The features have been edited to expand abbreviations, remove trailing blank and double spaces, and miscellaneous errors. --Jshimota (talk) 23:03, 7 August 2020 (UTC)

Unit number

There are a very limited number of junk values, so... low impact. If easy, consider using a black-list for this field: 0, 4 plex, rrr, Multiple boarding, llll, ekutemp, COULD BE A OR B, ANNEX, 4 APTS IN 2 DUPLEXES,

- - - Removed junk from unit field. --Jshimota (talk) 23:04, 7 August 2020 (UTC)

Name

Overall, the names look to be high quality. I'm surprised there are so many. Some end in a *, to indicate that they have been truncated to 50 characters (No idea what to do with that, unless you did the truncating, then... stop it?). Duplicated names could be dealt with, I've seen this when multiple buildings are part of one organization (See "North Ridge Apts" as an example, where that would probably be better in 'operator' than 'name'. Ideally, a mapper would turn it into a 'place=neighborhood' with 'name=North Ridge Apartments', or something similar). Some names have unexpanded acronyms like 'apts', 'apt', 'co', 'ctr', 'ct', 'st'. (Probably hard to reliably fix this programmatically).

- - - Unfortunately the names are truncated in the source however the abbreviations have been expanded. --Jshimota (talk) 23:06, 7 August 2020 (UTC)

Duplicate Addresses

Some addresses are highly duplicated, some only doubled. The worst case is 4375 Boonesborough Road, with 29 instances, all named "Fort Boonesborough Stae Park". I don't have any advice for how to deal with these efficiently, other than manually, and case-by-case, perhaps after the "import" is "finished". Consider a process to merge elements which have identical tags, and are "near" each other.

- - - The duplicate addresses of buildings will be reviewed keeping in mind that there are many cases (e.g. multi-building complex, mobile home park) where duplicate addresses are valid. --Jshimota (talk) 23:07, 7 August 2020 (UTC)

Other improvements

I don't know if the source data had more information about the type/class of objects (for example, if something is a post office or park). If it did, it would be valuable to try to translate that class information for OSM as well. Here's the translation I used for a different dataset, for inspiration: https://github.com/blackboxlogic/OsmPipeline/blob/master/Data/PLACE_TYPE.json

Blackboxlogic (talk) 13:32, 30 July 2020 (UTC)

- - - Unfortunately there was no additional information on feature type in the source data. --Jshimota (talk) 23:09, 7 August 2020 (UTC)

---

@Blackboxlogic, thanks for the additional feedback! No worries on it coming in parts; it's very useful to have more eyes on the data and for us to fix what we can in bulk before it's used by many OSM mappers. We will go through all of the items above and see how much we can fix in bulk. We'll also update our procedures doc for any issues that might recur in other communities. --Dkensok (talk) 14:49, 30 July 2020 (UTC)