Talk:National Hydrography Dataset
Dec 2012 Cleanup Request and Notes
Added below are notes from an email from Paul Norman in Dec 2012. Jeffmeyer 07:06, 31 December 2012 (UTC)
---------- Forwarded message ---------- From: Paul Norman <email@example.com> Date: Sat, Dec 29, 2012 at 5:20 AM Subject: RE: NHD testing for King County? To: Jeff Meyer <firstname.lastname@example.org> No one should be importing data with it. The current best route appears to be to make it available through snapshot-server as a vector layer in P2. Unfortunately JOSM doesn’t have great vector layer support for cases like this. I talked about this in Nov at http://lists.openstreetmap.org/pipermail/talk-us/2012-November/009515.html. Note that the snapshot-server instance is not currently running as it only runs if I manually start it. After having worked a lot with it, I can confidentially say that a wholesale import of NHD would be a colossally stupid idea. It would only take a few hours for me to finish the few remaining issues with the translations and then a few days of CPU time to convert all of NHD and make it available as .osm files, but that’s only a small part of the picture. The main current blockers for proposing it as an import are are snapshot-server: There are some indexing and efficiency issues with snapshot-server that impact ways with a large number of nodes (e.g. >40k). There are also some other issues that impact data sets this large (i.e. larger than OSM is right now) Way simplification: It is essential that the geometries are simplified either with pre-processing or post-processing. http://lists.openstreetmap.org/pipermail/talk-us/2012-October/009359.html and the surrounding discussion details this. Pre-processing offers other advantages so I’m leaning towards that. ogr2osm: ogr2osm does not have a way to specify a postgis datasource. Ogr supports these, it’s just a matter of specifying them. Translations: The translations could use verification. As there are no well-established tags for some of the features in NHD the tagging will require discussion. Policy: The current import policies do not clearly address imports using vector layers. I expect work on these policies to be on the DWG’s workplan for 2013 P2: P2 background layers only support bounding boxes to indicate coverage, not polygons. Depending on deployment issues (which in turn depend on simplification issues and how ogr2osm performs with 32G of RAM) this may be a moderate or a severe issue. Vector layers are also not well exposed. P2 docs: P2 docs on vector layers are lacking. Server resources: NHD is a large data set. Without simplification and including 46003 streams it is 10x-15x the size of planet-latest.osm. I hope to bring this down to a reasonable size, but it’s still likely to be a .5TB or larger database.
It seems that it would be good to get this in as soon as possible, as people are adding waterways, which is both potentially unnecessary work, and a potential source of conflicts on an import. Is there anyone who has a good suggestion on how to get started - surely most of the needed code is already written somewhere, and would just take some tweaking for the various types of tags, etc.? --Liber 00:45, 30 January 2009 (UTC)
I would like to begin adding NHD data to OSM, but do not know the standard process. Can someone outline the steps necessary to go from downloaded NHD shape files to OSM data? - Jared Campbell 22:43, 21 June 2009 (UTC)
It would seem to me that waterway:canal would be better than waterway:drain for canalditch. The NHD description is:
Artificial open waterway constructed to transport water, to irrigate or drain land, to connect two or more bodies of water, or to serve as a waterway for watercraft
--Liber 01:34, 30 January 2009 (UTC)
- The problem with waterway:canal and waterway:ditch is that NHD doesn't differentiate between the two. There are several areas that have several ditches in an area. Canals are rendered the same way as rivers are and tend to clutter up a map if there are too many of them. I would suggest tagging everything as ditch and then manually changing them if they are a true canal, which is something that you can't cross by jumping over it. :--Srmixter September 30, 2010
- I spent some time putting together a Frequency count of FCodes observed in a distributed sample set to get an idea of what's actually in the dataset, to guide us. When I get some more time, I'll go look at the rare features and maybe render some samples of them. --Davetoo 18:59, 16 August 2009 (UTC)
Etowah River Watershed
I have converted the Etowah River Watershed (147876) to OSM format, and am planning on uploading it shortly as a test. Please let me know if anyone has any objections. --Liber 22:44, 14 February 2009 (UTC)
- I am starting the upload. It looks as though it will take me a week or so to complete it, though. Please look at the data (which is to the NW of Atlanta), and let me know if you have any comments on the OSM conversion. --Liber 14:49, 15 February 2009 (UTC)
Managing the process/data
I've been working some ideas to help manage the complexity of this project ; one of the things i've done this weekend is create this map of the 18 continental HUC regions:
There are 334 basins within those 18 regions; a basin might be a good chunk size for people to sign up for if they're doing manual conflict resolution with JOSM, etc. I'm putting together a spreadsheet with the codes and names for those basins, a link to The Map centered on each of the basins, and fields for user sign-ups, status, etc. However, I'm not sure that a table of that size would really "work" here in the wiki. Of course I could break it into 18 separate tables. What I'm wondering is how people would feel about using a shared Google spreadsheet to manage this?
There are 2,104 sub-basins in the lower-48; that not going to fit on the main page as we're doing now :).
--Davetoo 04:42, 9 August 2009 (UTC)
- Thanks for your work on this! I'd take some basins in Michigan to merge into the already existing data, if someone can upload the NHD data for them. While I don't mind a Google spreadsheet, it might be nice to have that kind of data in the Wiki as well. Maybe one subpage per region with a table of all basins in that region? --Abgandar 15:55, 9 August 2009 (UTC)
- I'm thinking of three pages to list the regions--East, Central, and West--with six tables each. Don't want to make the heirarchy too deep to navigate easily, but this should stop the wiki warnings about too much text to edit. Then we can figure out how we want to keep track of completed sub-basins. --Davetoo 23:11, 11 August 2009 (UTC)
- Sounds good as well, that way it'll also be easier to look for the basins in a specific area.--Abgandar 18:09, 14 August 2009 (UTC)
I've got a new page up, the first of the three, five Western regions. Let me know what you think. I should have time to finish the other two over the weekend. I do realize that this format doesn't allow for "signups" for areas smaller than a basin. Let me know if y'all think we need something more granular. Also looking for scalable ideas to record the status of all 2100+ sub-basins. --Davetoo 04:53, 14 August 2009 (UTC)
- This looks great! I don't think we need higher resolution. I think mappers could indicate the subbasins they've completed in the notes.Jumbanho 22:41, 15 August 2009 (UTC)
Providing pre-converted NHD files
I have one more question for the experts: How complicated would it be for someone to automatically convert each one of the 2,104 sub-basins into a corresponding zipped OSM file? We could then place a link to the already converted files in the table as well (if necessary, I can host the files on my server). That way, people who don't want to mess with the Python script and its dependencies can still contribute by just downloading the ready made OSM file for their region, merging it with existing data, and then uploading everything. This might speed up the overall import significantly. --Abgandar 03:35, 14 August 2009 (UTC)
- It would take somebody with the entire dataset on CD/DVD, for one thing. But I don't know that we've had enough peer review of the results of the work so far to know that we're doing the best we can do. For example, Depending on the tools people are using, I've seen some imports that had the artificial paths on the centerline of polygonal waterways and water bodies. --Davetoo 04:53, 14 August 2009 (UTC)
- Getting all data from them somehow should not be too hard, I hope. As for the quality of the converted data: I also noticed some issues. But exactly for that reason I think it would be good if we used one common conversion script instead of each user building her/his own homebrew solution. That way, later (semi-)automated fixes will be much easier as we would deal with data with a uniform structure (c.f. the TIGER point data removal running right now).
- Could we maybe settle for a clearly defined subset of all the FCODEs that we know how to map to OSM features? Then later imports can deal with the left over funky objects (like artificial paths) once we have more experience with those.
- Looking at the map in Michigan, I think it would really help to even just get lakes and rivers imported rather quickly. People are investing lots of time in local areas, mapping those things manually. The longer we wait for a mass import, the more work will have to go into merging. Thus, time is wasted twice: Once by mapping these features from Y! satellite pictures and then again when merging.--Abgandar 18:09, 14 August 2009 (UTC)
- For those without conversion resources and active mappers in their area of interest, contact me with a small number of sub-basins (less than 3). I'll download and create the .OSM files and get them to you for editing and upload. MikeN 13:53, 20 August 2009 (UTC)
- In regards to those artificial paths, the wiki strongly suggest putting artificial paths inside polygonal waterways, and it seems as though that would be useful for future routin-- that is why I have been uploading them. Jumbanho 13:22, 24 August 2009 (UTC)
- Has anyone created a rules file for the Java shp-to-osm? Doesn't seem like it would be too hard to do if it hasn't been done. --Srmixter 19 December 2009
I'm not able to do the whole conversion process. So I'm going to be manually importing features one at a time, just as if I were tracing from aerials. --NE2 22:23, 24 July 2010 (UTC)
Data Quality ?
The already imported parts that I have seen so far have only a very remote resemblance to what aerial imagery shows. Based on that, I would strongly advice against importing this data; better use it just as source of river/stream names but trace the features from aerials or on the ground. --Lyx 13:24, 23 October 2010 (BST)
- It really depends on the area. --NE2 15:59, 23 October 2010 (BST)
Comments on process
I haven't imported any data myself, but have looked at some of the files, and intend this section to include comments that will be compiled in a new process document to assist new users in importing NHD data. This is specifically based on HUC_8 02070010 (in Potomac basin, Washington D.C. metropolitan area).
- Spatial accuracy can vary significantly, though the accuracy standards require that "ninety percent of well-defined features to lie within 40 feet of their true geographic position" . For my region many waterbodies had the correct geometry (shape, scale), but were just offset (~16 meters in many cases). Of course you need to know the spatial accuracy of the aerial imagery you're using if you plan to correct for this. For improving the accuracy of the shape, consider using the ImproveWayAccuracy plugin for JOSM.
- Data recency can vary significantly, some features may be missing in NHD or no longer exist in the real world (new developments). Check against recent aerial imagery if possible.
- Many swimming pools have been imported as natural=water since the same FCode is used. Change this to leisure=swimming_pool as appropriate.
-- Joshdoe 18:24, 20 June 2011 (BST)
- Before importing NHD data, consider contacting local environmental, stormwater, and GIS agencies which may have more recent and higher quality data that they are willing to share.
- Some features in NHD that are tagged as natural=water may no longer hold water on a regular basis due to changes in development, diversion of streams, etc. Perhaps these should be re-tagged as natural=wetland if land use clearly hasn't changed to residential, commercial, etc.
-- Joshdoe 18:32, 20 June 2011 (BST)
As of this writing, there are a few issues with conversion of the raw data still being worked out:
- The conversion work-flow generates nodes that contain only import-reference tags and not actual OSM tags (e.g. water=gross, etc.). There have been suggestions on the mailing list to remove these import-reference-only tags (e.g. nodes with only an fcode because the fcode had no OSM import mapping).
- The conversions are split by node, which may separate ways and create duplicate nodes. The final partitioning of the data may depend on the recommended workflow.
One question that has has come up is: intended use - import or editing source. The two uses would have different implications - import-targeted data needs to be ready for OSM with as few mods after import as possible, while editing-ready data needs to be in a format that is friendly to the various editing tools. Personally I am in favor of an editing-centric workflow, but let's see where the discussion goes. --Bsupnik 19:11, 20 June 2011 (BST)