Mechanical edits/ke9tv NYbuildings repair

From OpenStreetMap Wiki
Jump to navigation Jump to search

Purpose

There are many (about 130,000) buildings in New York State that appear to have been imported by OSM user miluethi (and a couple of aliiases, NYbuildings and AlexCleary) While the data sources for the import are not clear (it appears that the import was not discussed in the usual places), the buildings appear to have originated from a conflation of the Microsoft AI-generated building footprints [1] and the address points from the New York State GIS Street and Address Maintenance Program.[2]

Unfortunately, the translation of the address points contains several systemic errors leading to some 60,000 buildings with incorrect addresses. Two major root causes have been identified:

  1. The street name in addr:street=* for about 31,000 buildings was extracted from a database field that contains only part of the name, missing all prefixes and suffixes. An address like "100 East Main Street," if it encountered this bug, would have been entered into OpenStreetMap with addr:street=Main, which is clearly not a usable address.
  2. The city name in addr:city=* was extracted from a database field that contains the containing municipality or Census-Determined Place (CDP), and not the 'postal city' - the city that should go on a mailing address. This problem affects another 34,000 buildings (a small set have both issues) which are either at the borders of a postal service area, located in postal service areas that have no corresponding CDP (so that the township name will attach to the address in place of the name of a post office), or located in municipalities that do not have their own postal service areas.

In addition, there are a few hundred buildings that have either had their E911 address points altered in the NYSGIS data since the defective import, or else have been imported with apparent memory corruption in the data (weird city names like 733e+001 WARR01094012915, 20170510JLevandowskiII, and ek have appeared in the imported data).

Who is making the change

Kevin Kenny User icon 2.svgke9tv (on osm, edits, contrib, heatmap, chngset com.) [email] has proposed a mechanical edit to repair this issue. The detailed project plan, together with all scripts that make the changes, can be found on Github.[3]

Details of the process

A detailed plan, with computer code, can be found on Github.[3] A basic summary is:

  1. Examine all changesets belonging to the suspect user ID's, looking for created ways with building=* tags.
  2. For each identified way, inspect the current version of the way in OSM, checking (a) whether any of the tags addr:housenumber=*, addr:street=*, addr:city=* or addr:postcode=* has been changed by another user after the import.
  3. For each unchanged way, inspect whether the current version of the way has the tag nysgissam:addresspointid=*, indicating that the building has been visited by the import of New York State E911 address points[4] and conflated with one or more address from that source.
  4. Any of the four address tags that is unchanged from the original defective import and belongs to a building that has been conflated successfully shall have that tag replaced with the version from NYSGIS.
  5. The resulting sets of changes shall be grouped according to what tags have been modified, and then clustered geographically in small enough groups to make for manageable changesets (say, no more than five hundred affected ways in an area at most a few km across).
  6. The identified ways can be retagged using JOSM using the load_object remote control command, reviewed and uploaded to OSM. The upload will be conducted from a dedicated import account User icon 2.svgke9tv_NYbuildings_repair (on osm, edits, contrib, heatmap, chngset com.). All changesets will have changeset source containing script name, version and link to the project repository, changeset tags including bot=yes and description=* including a link to this page, and changeset comments describing what tagging was applied.

Consultation

The proposal has been floated in a less formal form in the imports-us and talk-us mailing lists. It has also been discussed on the relevant Slack channels such as #local-newyorkstate on osmus.slack.com. Nobody appears to have had any great heartburn with it yet.

Timing

The plan is to circulate a few sample changesets, wait a week or so to garner feedback, apply the sample changesets, wait another week or so to gather feedback, and then begin running the bot in earnest. This edit should be a one-time-only affair, although the framework it provides suggests possibilities for reconflating future versions of the NYS address points.

How to opt out

Contact Kevin Kenny User icon 2.svgke9tv (on osm, edits, contrib, heatmap, chngset com.) [email] with any concerns about the changes.



  1. These computer-generated building footprints are licensed for use in OSM and are publicly available from Cornell University Geospatial Information Repository.
  2. NYSGIS Street and Address Maintenance (SAM) Program releases E911 address points under a CC0 license.
  3. 3.0 3.1 See the project page for kennykb/NYbuildings_repair.
  4. New York (state)/NYS GIS SAM Address Points Import