Talk:New York (state)/NYS GIS SAM Address Points Import

From OpenStreetMap Wiki
Jump to navigation Jump to search

As mentioned below in the "no matching street nearby" topic, this import added some addresses that do not correspond to real-world features. Mostly these were imported as nodes, and they are most apparent in undeveloped areas and developed public space like parks, but some are also on buildings and private lots. Many of these are not tagged with nysgissam:review so they'll only be dealt with as mappers notice them over time.

Based on Dead10ck's advice, these nodes should simply be deleted. Personally I don't feel comfortable deleting them based on aerial imagery alone, since most of it is years old and has tree cover that can obscure smaller buildings. I've also surveyed buildingless lots with posted addresses, eg, a place to park an RV over the summer, or a future construction site. So I'm only deleting these when the address's nonexistence is confirmed by some combination of local knowledge, recent streetside imagery, county data, and survey.

There are other variations of this issue, such as:

  • nonexistent addr:* tags added to building way (either a building with no address, or a multi-address building whose real addresses were already mapped as nodes -- presumably just remove the addr:* and nysgissam:* tags I guess, but cautiously because it's certainly possible to have a real address that's not adequately signed)
  • incorrect addr:* tags added to building way (fix the addr:* tags, but what to do with nysgissam:nysaddresspointid?)
  • extra addresses added to an existing building whose address(es) were already mapped (should we try to match them up and copy over the nysgissam:nysaddresspointid tags from the imported addresses? What would the match criteria be?)

Because this is planned as an ongoing import it would be great to have some more granular guidance. One overarching question is: What are the guidelines for keeping/removing/combining the nysgissam:nysaddresspointid tags when fixing addresses? --Jmapb (talk) 17:49, 21 June 2021 (UTC)

These are some really good points, very well thought out. I'd agree that some of the iffy cases, such as address points in the middle of undeveloped land, would be best served by a field survey and local knowledge. In some of Kevin's cases, he had local knowledge that some of the addresses were from areas where development plans had been approved, but later abandoned. This is a case where a mapper has reasonably high confidence that the addresses do not and will not exist. However, without that local knowledge, it becomes more murky. Even if you do a field survey, and it's just undeveloped land and you don't know how long it's been there, it's still possible it will get developed. I think the only thing to do in this case is see if you can uncover some local approved development plans to see how long it's been sitting in limbo. Or otherwise, just wait a year or two, and if still nothing has happened, you've gained local knowledge that likely nothing is going there.

It's hard to say what to do with the nysgissam:nysaddresspointid=* in a general way. These are getting into nuanced grey areas, and I think each in turn deserves their own consideration.

nonexistent addr:* tags added to building way (either a building with no address, or a multi-address building whose real addresses were already mapped as nodes -- presumably just remove the addr:* and nysgissam:* tags I guess, but cautiously because it's certainly possible to have a real address that's not adequately signed)

The last sentence gets to the grey area here. Not all addresses are signed, so how does one determine if an address "doesn't exist"? The only way I can think of is to check the USPS web site's address finder, but I don't know if that is a fair oracle to use. Others might know more about this than me.

However, if it really doesn't exist, then yeah, just delete all the nysgissam:*=* tags.

incorrect addr:* tags added to building way (fix the addr:* tags, but what to do with nysgissam:nysaddresspointid?)

I think this is really a case by case basis thing. If there was just an obvious typo in the addr:street=* tag, for example, then just fix the tag and keep the nysgissam:nysaddresspointid=*. If the address is correct, but it's just at the driveway, and recent enough imagery has become available, then move it to the roof of the building; if there is already a building there, then move the tags over and delete the node.

However, if it's a more material error, like an incorrect house number, then it becomes a bit more complicated. If we're thinking about it purely from the perspective of the source database, which consists purely of an address and a point coordinate, and put OSM out of our minds, and we consider this address point, what is wrong: the coordinate of the address point, or the house number? It's really hard to say. So I think in this case, it's best to avoid the ambiguity and just fix the tags and delete the nysgissam:nysaddresspointid=* tag. In the future, this will be treated as a deletion by the importer, and any updates to it will be skipped. My hope is that these will be statistically more on the rare side, but I won't know for sure until I start working on updates.

extra addresses added to an existing building whose address(es) were already mapped (should we try to match them up and copy over the nysgissam:nysaddresspointid tags from the imported addresses? What would the match criteria be?)

Do you have any concrete cases here, or is this just a hypothetical? The only cases I can think of where this might happen is when the existing address was only a addr:housenumber=* and fell under the "partial data" issue; or, when an existing building had an entirely different address, but in this case, it should be marked as a conflict. In the former case, I'd say delete the node with only the addr:housenumber=*.

What are everyone's thoughts? I'd be happy to add these to the QA section.

-- Dead10ck (talk) 04:34, 22 June 2021 (UTC)

All of my questions here concern real observations, and now I wish I'd taken better notes about each one. Also, in some cases, I wish I'd left the "incorrect" addresses alone pending discussion, rather than fixing them by deleting tags or nodes. But here's one I saved more or less in situ: https://www.openstreetmap.org/way/815753899. This is a strip mall building on NY 28. The on-the-ground addresses of the businesses were already added from survey, and the import added additional nodes. Some I can see why they didn't match (5571, 5573 and 5577 don't exist, except on paper I suppose, and two different business POI nodes use 5575 ... that's another topic we'll need to discuss) but the others I feel should have matched. Should I copy over the nysgissam:nysaddresspointid values?

Another shopping plaza I remember (but can't find) had imported nodes that landed near each business, but used a different addr:street and addr:housenumber than the existing mapped business POIs. Did the strip mall choose its own unofficial addressing scheme? Is it good to keep the imported address nodes too, or should we assume the import's simply wrong?

Many of these anomalies probably would have ended up with a nysgissam:review="existing element's addr* has different addr*" tag in less dense circumstances, ie, single-address buildings.

My gut says there's no value to trying to retain nysgissam:nysaddresspointid if the address has changed radically (and even a single digit change in housenumber counts as radical IMO), but it's good to hear that confirmed. I'll keep an eye out for further examples of things I'm unsure how to handle. And I for one would definitely appreciate some more explicit guidance in the QA section.

--Jmapb (talk) 19:32, 25 June 2021 (UTC)

But here's one I saved more or less in situ: https://www.openstreetmap.org/way/815753899. This is a strip mall building on NY 28. The on-the-ground addresses of the businesses were already added from survey, and the import added additional nodes. Some I can see why they didn't match (5571, 5573 and 5577 don't exist, except on paper I suppose, and two different business POI nodes use 5575 ... that's another topic we'll need to discuss) but the others I feel should have matched. Should I copy over the nysgissam:nysaddresspointid values?

This looks like a case of the "State Route" issue. The previously existing addresses are "State Route 28", and in the import data, they are simply "Route 28". I think the right thing to do here is move the nysgissam:nysaddresspointid values to the existing POIs and delete the imported nodes. Keep everything else the same (I think "State Route" is clearer and matches the nearby street). They were not marked as "no matching street nearby" because I purposely chose not to for these cases.

Another shopping plaza I remember (but can't find) had imported nodes that landed near each business, but used a different addr:street and addr:housenumber than the existing mapped business POIs. Did the strip mall choose its own unofficial addressing scheme? Is it good to keep the imported address nodes too, or should we assume the import's simply wrong?

I'd consider this just like any other conflict: it's hard to say who is right without extra research. I'm no expert in how the USPS handles addresses, but maybe one is old and needs to be updated, or maybe both are acceptable. I can say from experience that sometimes a business lists its address incorrectly.

Many of these anomalies probably would have ended up with a nysgissam:review="existing element's addr* has different addr*" tag in less dense circumstances, ie, single-address buildings.

I would say it has more to do with the fact that it's impossible to divine what DB elements correspond to which OSM elements when the others have different tag values. When they don't match, it's impossible to tell in an automated way if a nearby element is the thing that matches up with it, or if it's legitimately a different address. It does make it harder when there are multiple addresses in one building to mark it for manual review, but this is a problem even in the absence of a building.

My gut says there's no value to trying to retain nysgissam:nysaddresspointid if the address has changed radically (and even a single digit change in housenumber counts as radical IMO), but it's good to hear that confirmed. I'll keep an eye out for further examples of things I'm unsure how to handle. And I for one would definitely appreciate some more explicit guidance in the QA section.

Yeah, I think that might be the best rule of thumb: if the house number is wrong, just disregard the nysgissam:nysaddresspointid tag. If the street name is totally different, but the house number is right, treat it as a conflict and try to find out which one is right. I'll write these up in the wiki.

-- Dead10ck (talk) 03:08, 26 June 2021 (UTC)

nysgissam:review=no matching street nearby

There are some streets about where nobody seems to know the correct spelling, including the DOT. In a handful of cases, I've resorted to showing different spellings in name=* and addr:street=* for individual blocks, because the signage, the tax rolls, and the DOT disagree on the spelling, and I've followed the signage in the field. -- Kevin Kenny User icon 2.svgke9tv (on osm, edits, contrib, heatmap, chngset com.) 2021-06-15

Good points, I'll add a tidbit about following the on the ground principle. -- Dead10ck (talk) 14:34, 15 June 2021 (UTC)

While spelling errors, and as-yet-unbuilt streets in new developments, predominate, Frank Winters's explanation is not the entire story. I've encountered some addresses in my town where a planned development was abandoned before construction began, the platted street was never built, and the land was subsequently repurposed. I've found some of these traces of never-built subdivisions in undeveloped land belonging to the town's parks department and used as public green space, including some that have obviously been woodland for a century or more. I don't have any misgivings about removing imported address points where the street does not exist and there is no matching parcel on the tax rolls. -- Kevin Kenny User icon 2.svgke9tv (on osm, edits, contrib, heatmap, chngset com.) 2021-06-15

Fully agree, reality is always more messy. If there are addresses that are obviously non-existent, do not hesitate to remove them. -- Dead10ck (talk) 14:34, 15 June 2021 (UTC)
I'm out of the loop re Frank Winters, but this issue of over-abundance of unlikely addresses in some locations is not specific to the addressed tagged with "no matching street nearby." See the old Senate House site in Kingston for example, which got 17 new addresses, some landing on buildings that don't use them but most just sitting in the park. And the one "real" address, 296 Fair Street, does not seem to have been imported. My inclination is to delete these extras, but what's the best way to indicate that they shouldn't be added back in future updates? -- Jmapb (talk) 20:49, 17 June 2021 (UTC)
Don't worry about them coming back, just delete them. When I implement updates, I will take all user edits into account, including deletions. A deletion in itself is the best way to indicate it shouldn't be added back. -- Dead10ck (talk) 21:10, 17 June 2021 (UTC)
Ok, I've cleaned up the Senate House (changeset 106589801) but there's plenty more to do. Most of the current QA documentation focuses on the the nysgissam:review tags but from what I see, handling these iffy addresses is going to be a large part of the QA process, so I've created a separate Talk topic for that. --Jmapb (talk) 16:01, 21 June 2021 (UTC)

I've run into "no matching street nearby" plenty -- a couple of roads actually missing, but mostly spelling variations. So far in all cases but two (Ed West Road and Mount Ava Maria Drive, mentioned on the mailing list) the SAM addresses have had what I believe to be either the correct spelling, or an equally correct spelling. In addition to "letting what's on the street sign win" as far as the value of the addr:street tag, I've also taken the opportunity to populate the highway segments with alt_name values when there's a realistic alternate. And roads I haven't yet been able to survey or find via roadside imagery, I'm leaving alone for now, with the nysgissam:review tags untouched.

Regarding the numbered routes... I'm not rust-literate but it looks like the street matching code uses a regex that matches any Highway/Route/Road (optionally prefixed with State/County) followed by a number to any other followed by the same number. I'm sure this prevented many thousands of "no matching street nearby" false positives given the unpredictable naming of numbered routes. I haven't yet seen any false negatives. -- Jmapb (talk) 20:49, 17 June 2021 (UTC)

Regarding the numbered routes, you are correct, I added a regex that specifically skips the nearby matching street check for these addresses. There is simply no consistency or standard on the naming convention for these roads, even on official sources, so it's simply not helpful to point out that they're different. Maybe I'll add a bit about this under this section too. -- Dead10ck (talk) 22:49, 17 June 2021 (UTC)

nysgissam:review=existing element's addr:* has different addr:*

In early stages of the import, this one came up a lot when mapping street addresses served by a post office that serves multiple cities. USPS strongly prefers that a post office have a single city name, and that the name be the name of the largest community served. This rule causes addressing anomalies where a suburban post office serves a few addresses inside the bounds of a larger city (example: the post office in Niskayuna, New York is titled, "Schenectady," because it serves a few addresses in Schenectady), or where an unincorporated community has a strong local identity (examples; historically, the citizens of Rexford, New York, petitioned successfully to have the name of their post office restored after the bureaucracy decided that it needed to be named 'Clifton Park' because Rexford was an unincorporated hamlet in the Town of Clifton Park). USPS mostly deals with the problem by maintaining lists of acceptable and unacceptable city names to use in addresses for each post office. While 12302 is formally labeled, 'Schenectady', 'Scotia' and 'Glenville' are acceptable alteratives for addresses in the given communities. This anomaly caused every building that I'd hand-drawn in Niskayuna (plus some in Rexford, Scotia, and Glenville, at least) to be flagged with "existing element's addr:city has different addr:city". In Niskayuna, this affected enough buildings that I resorted to a mechanical edit to put it right. Rather than simply revert or bulk-update, I set up a database query for address points carrying this warning, mechanically edited "Schenectady" to "Niskayuna" if "addr:postcode" was 12309 and the address was within the boundaries of the Town of Niskayuna, and deleted the warning for the points edited. Then, in a separate step, I used JOSM's address conflation tool to merge the address points into the buildings. (I detected a few of my own blunders by so doing, where in surveying a block I mistakenly extended the city name across the city line.) -- Kevin Kenny User icon 2.svgke9tv (on osm, edits, contrib, heatmap, chngset com.) 2021-06-15

Thanks again for your help with identifying this issue. For the record for others that come across this here, after Kenny described these problems, I changed the importer to ignore conflicts on city name, to avoid this kind of issue. So for data imported after 2021-02-09, the data should be such that an existing addr:city=* is left alone, and is not marked as a conflict if it differs from the import data. -- Dead10ck (talk) 14:38, 15 June 2021 (UTC)