Talk:Santa Cruz County, California
Note that we are are using the convention "add more recent comments from the top down" on this Talk page. This keeps "recent at the top" and "scroll down to see older history" as the method to "see most current entries most easily."
As glebius (first name Gleb) and stevea (first name Steve) had a lengthy changeset discussion which Steve considers important and suggests it be moved here, I (Steve) am doing exactly that: copy-pasting that discussion here. Hopefully it will engage the wider Santa Cruz County OSM community (and that is MANY additional people besides Gleb and Steve!) to become involved in this discussion. Here goes:
Gleb included the following phrase in his changeset comment: "some minimal cleanup of exported polygon mess." And then the exchange begins:
Comment from stevea 3 days ago Gleb, I'm disappointed that after all of our good communication you continue to characterize the three versions of imported SCCGIS data which were carefully-curated over several years as "exported polygon mess." We agree they are not "perfect," as very few things in OSM are. We agree they are (maybe about 98% or 99%) correct, as polygons and multipolygons are perfectly valid OSM data structures, and the data provided by the County are/were perfectly valid based on cadestral/parcel data. We agree that you "prefer" to use your JOSM tool reltoolbox to assist you in conflating edges, and Frederik Ramm, mailing lists and I agree that "to take perfectly valid data and convert it to another format as this tool does" is essentially a senseless waste of time. I understand that "you live here" (and for many decades, I have lived here, too) and want to see data improve in our County. However, it remains true that there ARE existing data surrounding us in OSM. If/as/when data are "just plain wrong," of course, I encourage you to correct them. But to disparage the existing, correct data with mean comments like "mess" is not in the spirit of OSM. Thank you for your improvements.
Comment from glebius 2 days ago The exported data is of very bad quality. I would dare to say of extremely bad quality, since even roads sometimes do not reflect reality. In this particular area curve of Comstock Mill Rd didn't follow real road. Robinridge Lane didn't even had a single point matching reality, it was going to the north-east of Comstock, crossing several private lots, while in reality it goes south of Comstock. I fixed that 6 months ago. The polygon data is also shit. The county data (at least the data they give to you) do not reflect reality. A wood where in reality there is a meadow - multiple examples. Vice versa - multiple examples. Soquel Demo Forest going over private property. I know it is incorrect, I can give you names of people whose land was mapped as SDF. Finally, even most official not outdated cadestral/parcel data is incorrect. Let me explain why. It is data of the last land survey that was converted to Lon/Lat in a very rough manner. The actual borders of lots are defined not by lat/lon but in a text manner, e.g. "600 feet from junction", "300 feet along canyon". When county does surveys they encounter lots of problems - streams changed their curve, a road has moved due to mudslide, etc. After survey is done, new data enters the database. I believe for surveys made in last 10 years it has a very good precision. But for some lots last survey was performed in 19th century! Until survey is done, the actual border is where neighbors agree it to be. Where a fence is, or where a road or stream is now. Which means TRUTH ON THE GROUND. Not in the county database, sorry. We already discussed quality of data at database level: adjacent polygons overlap, are not connected to each other. An attempt to improve forest borderline pulls a ton of problems. This all just blocks editing of data by inexperienced mappers. So what is our goal: to fill the map with something so that map looks nice when zoomed out? Or may be to maintain a copy of county cadestral/parcel db? I believe our goal is to have the best damn map in the world, which is true and precise at any zoom level! In some regions of the world OSM is already damn best. In the SC county we are like stuck in a clinch, we are in a limbo. New mappers open data in JOSM, see the mess and run away. Those who are willing to resolve the mess are being asked to stay away. But still edits are coming in, so your endeavors to do continuous exports become more and more difficult. May be it is time to stop exporting? Exporting was a good thing 10 years ago, when OSM was all virgin and white. Now it does more evil than good. I really wish we escalate this disagreement to a larger level, so that more people can participate.
Comment from stevea 2 days ago Gleb, there are two imports we are talking about here. One was OSM-US' TIGER import of roads and rail in 2007-8, which many agree was of poor quality, but we are more-or-less "stuck" with it, and the solution is to improve it with better-and-better developing strategies. Some people estimate it might take thirty years to clean up TIGER in the USA. OK, maybe it will.
The other import was the SCCGIS landuse import that nmixter (a friend of mine who I and many others have reviled and even ridiculed at his poor and wide-ranging imports all over California). If you read our County wiki page, you'll see HE "made the mess of the initial import" and I am the one (with some others) who has spent many years, thousands of edits and countless hours improving these data to a state of "decency." (You have every right to disagree with that word). I have also said that when the present v3 discovers that SCCGIS offers newer data (2020? 2021?) and might become v4, I endeavor to enter the data with shared multipolygon boundaries where/as it makes sense to do so. This is highly ambitious, shows my continuing dedication to improving the map in ways that it naturally evolves, keeps me and others engaged in how best to improve our County data and opens the gigantic discussion of the difficulties about how best to conflate.
When you say "the exported data are of very bad quality" I don't know if you are talking about TIGER, SCCGIS or both. I think "both." I agree that TIGER's rural roads in the mountains are atrocious, closer to "an hallucination" rather than reality. Please, as you know better reality, fix these. I believe you are doing fine so far; I and the map greatly appreciate your good, solid work.
I, too, would like others to join this discussion. I, too, would like OSM to become "the best damn map in the world." I was NOT the person who imported TIGER, nor did I import the SCCGIS data, rather, I painstakingly improved the SCCGIS data from the hideous mess that arose from nmixter's "trigger happy import finger" as best I could, and this untangling took years of my best efforts. People like you who seem "closer to the action" (you live here, too, as you say) DO improve the map, and rightly so; I am very glad of that. It isn't that you and/or others are being asked to be "hands off" the imported data, I and others WANT you and others to improve the imported data, and you are. (You complain about it, I hear you loud and clear). What we asked you to be "hands off" about was the "reltoolbox" automation of polygon edge conflation that you were doing, which made bad data that were getting better MUCH more confusing, especially for novice editors (and we need all the new mappers we can get).
I agree with you that decades-old data like TIGER and SCCGIS imports were an early "first draft" to get SOME data into the map so it wasn't such a blank canvas. We are beyond doing such things today in areas as richly data-populated as our County (though I'd honestly call it "medium data populated" rather than "richly"). There are no new imports proposed — except the possible improvements a v4 SCCGIS landuse import MIGHT offer; we haven't had that discussion yet as newer data won't be available for a couple/few years. Simultaneously, we can and should improve these data, with, yes, I agree with you 100%, "truth on the ground" approach. Even our wiki says that the landuse import (again, not my idea, not my doing, though my passion to clean up, yes) was "a first step to getting some landuse data into the map" and that more details and better data would follow. That is exactly what you are doing, and correctly so.
Regarding "A wood where in reality there is a meadow - multiple examples. Vice versa - multiple examples." Here is my (partial) answer: there continues to be misunderstanding/debate about landuse vs. landcover. The SCCGIS data did tag as landuse=forest areas which were imported as TP (Timber Production, that is clearly "forest" as OSM means it). Sometimes, a clearing (meadow) could be seen upon this land (trees were felled in a timber forest, nothing strange about that) and so we would superimpose a landuse=meadow on top of that. Amazingly, mapnik/Standard (now Carto) rendered that quite pleasingly. Likewise, many areas which the County zones (remember, zoning is only a "first step" at accurate landuse mapping, and landcover is not landuse) as "farmland" may largely be covered with trees. Many of these areas are orchards, vineyards or simply "still trees" but the land COULD be used for agricultural cultivation, which technically makes the "landuse=farmland" 100% accurate, even as a visual/aerial/satellite/on-the-ground observation might say "hey, there's nothing but TREES here, why is this tagged FARMland?!" Because that's what its landuse actually is, that's why. Let's not (between simply the two of us) debate landuse vs. landcover, we are not going to solve it without the help of the much larger OSM community.
I suggest we take this to the Discussion page of Santa Cruz County's wiki: https://wiki.osm.org/wiki/Talk:Santa_Cruz_County,_California . That page hasn't been touched in 8-1/2 years and it is time to take it there so it is more public than in a Changeset Comment like here.
Honestly, I believe our goals are much more inline with each other than they are at odds with each other. We have some "chunks of junk" that need improvement/clean-up (just like TIGER, but simultaneously with much grumbling/complaining but also some good strategies — and we are cleaning that up, too). You are doing good work at cleaning up the imported data around here (both TIGER roads and SCCGIS landuse polygons). Yes, it is slow going, it will take years. (It took me about five years to clean up Nathan/nmixter's SCCGIS v1 data through v2 cleanup and v3 re-import). The map is a big place, the world is a big place. Our county is a finite amount of data in OSM. It is a substantial amount of data, but it isn't so large that a project like ours, with cooperation and consensus among its participants (like us) can't get it done — we CAN get it done and we ARE getting it done.
Best regards, Steve
stevea's reply to DanHomerick
Yes, nmixter has confirmed (http://wiki.openstreetmap.org/wiki/Santa_Cruz_County) that the uploads intended to be uploaded have concluded. Now it is up to the greater OSM community to adjust/edit/refine those data. The most important efforts will be in the removal of duplicate data, and the correction of data which are clearly wrong: misspellings which cause failure to render, such as Name= instead of name=, or Park_Type instead of park:type, and not following rules or agreed-upon guidelines/conventions, as with park:type=county_park vs. park:type=city_park, and so on. Next might be the identification and correction of data which are correct as officially uploaded, but which contradict actual facts on-the-ground, as for example a "public_facility" which has recently become an obvious public park. There are less-important categories of corrections and additions, too. But the ones identified are the most important after a large import such as what we just had happen in Santa Cruz County.
It is my understanding that the 'special_use' areas MUST be manually annotated, as there was/is no obvious OSM landuse tag to map them to. These could be a wide variety of landuses, some which may, some which may not map to OSM landuse tags. Other sources and methods (like on-the-ground surveys) must be used to determine landuse on these parcels, and even if and when this can be determined and entered/corrected into OSM, it is possible OSM may not render the landuse tag chosen. A descriptive tag may be helpful even if it doesn't render, as "special_use" is not very helpful but something more specific likely is, especially as renderers get upgraded to include more landuses. OSM's attitude of "liberal (but be accurate) tagging" guides here.
We should delete stray nodes when we come across them. The Validator plug-in notices and can rather effortlessly clean these up, if you have it installed, pay attention to its notifications at upload time, and "Fix" (button) reported Errors.
"Landuse areas that are apparently meaningless" is a matter of subjectivity. There are the official data which have been uploaded, there are what the OSM community thinks about these, and there are what the renderers do with them. Clearly the former and latter have tried to become one during and through the harmonization process of the import itself. It is the middle case of "what do we think of official data (polygons with landuse tags) that don't get rendered" that gives rise to a debate of "bulldoze as clutter" vs. "leave as factual data, perhaps for future renderers." And there are certainly other potential points of view that can be injected into this multifaceted discussion. Data in OSM which are not rendered do not "hurt" rendered maps. Some of those are just plain wrong (or are duplicates, or outdated/obsolete), and those ought to be removed. CORRECT data which do not render certainly do cause some visual clutter while editing (e.g. in JOSM), but if they are true and factual data, I believe they should be left in. Think about it this way: if a renderer comes along and suddenly becomes able to render these data, but you removed them, are you going to suddenly add them back in? Likely not. This is why OSM allows (and even encourages) rather liberal tagging, even (and especially) when it is not explicitly, currently, prettily rendered. It is not "some particularly subtle meaning that someone else might be able to figure out" but rather some particular future user/use/renderer that has not yet been imagined or which has not yet been written into software that will figure this out. stevea 19:54, 3 October 2009 (UTC)
Is the import, slated to be finished on Sept 20th, 2009 done?
Are there any plans to add meaningful tags to 'special use' areas in an automated way, or should we start manually annotating?
Any clue what happened here? There's a large landuse area that is marked as farm, when it couldn't possibly be. Is it really marked as agricultural in the original dataset, or was there an error in the translation to OSM format?
In the same area, you can see a large number of nodes that do not have tags and aren't part of any ways. Will there be any work done to automatically clean those, or should we manually delete them when we come across them?
And finally, what should be done with landuse areas that are apparently meaningless? Should we bulldoze them away as clutter, or is it preferred that we leave them, in case they have some particularly subtle meaning that someone else might be able to figure out? --DanHomerick 16:21, 21 September 2009 (UTC)