NHD Subbasin 01080105 Import

From OpenStreetMap Wiki
Jump to: navigation, search

This page contains my findings and procedure for importing the National Hydrography Dataset data for the White River watershed in the state of Vermont, U.S.A.


I hope to perform this import in a way which best supports the OpenStreetMap community, particularly local mappers both current and future. Having worked with some data entered via imports (primarily TIGER and MassGIS), I believe there are a few ways in which I can learn from these experiences. My primary goal is to do my best to ensure that the imported data follows the K.I.S.S. principle as much as possible. I hope the result of this will be data which is simple for the casual mapper to work with and improve.

The first and primary thing I have noticed is that often many attributes have been pulled in to OSM which have little or no relevance to OSM's primary goal of creating a map. Often this data has no meaning to the casual (and critical local) mapper and only serves to confuse and slow the mapping process.

One good example of this would be IDs which do not identify features at a scale appropriate for OpenStreetMap. The result is IDs which are merged or possibly mangled in one way or another. This reduces the presumed benefit of these IDs, the ability to link back to the source data at some point in the future. More importantly, automated edits are discouraged (as are many imports), so including IDs which serve only to facilitate those edits also seems like a bad idea. Another example is feature dimensions, which can (and should) be calculated from the feature itself.

A second way in which data becomes hard to manipulate happens when either too many very short ways or too few very long ways are imported. This can be exacerbated if too many points are used in the input data.

Anyway, I hope this page serves to document my procedure and provide sound supporting arguments for my methods.

Identified Conflicts with existing OSM data

  • waterway=riverbank areas in Hartford leading to the mouth of the White River.
    • Solution: I have cut these areas from the input data before converting to OSM
    • Post-Import Cleanup: Based on aerial imagery, I will fill the small, intentional gap between my import data and the existing data, merging nodes as necessary.
  • Silver Lake in Barnard
    • Solution: Remove this data from my import data before running the import.
    • Post-Import Cleanup: Check for and resolve duplicate nodes on entering and exiting waterways.
    • Post-Import Cleanup: Potentially add NHD:ReachCode tag to the lake from my source data.
  • Skylight Pond
    • See Silver Lake, above.
  • Pomfret Border (ways 150612711, 150612707 and 150612706)
    • A section of Pomfret's border seems to be potentially defined as the centerline river I will import. I am unsure of the technicalities of this.
    • Solution: Resolve directly with user who has recently modified this way
    • Post-Import Cleanup: Do any post-import cleanup as agreed on with the other mapper.
  • Pittsfield border with Rochester
    • Solution: After resolving the Pomfret issue, above, repeat in this instance as well.
    • Post-Import Cleanup: Follow process defined by the Pomfret issue, above.
  • Various waterway=dam nodes from import of GNIS data.
    • There are between 15 and 20 of these nodes
    • The chances of any or most of these nodes being duplicate with my import seems minimal. Many of the nodes are not accurate to begin with.
    • These nodes can be discovered using XAPI.
    • I believe this can be taken care of as an optional post import cleanup step, converting them to ways (and possibly waterway=weir) and deleting them or moving them to their proper locations based on aerial imagery.

Identifiers in N.H.D. - ReachCodes, ComIDs and Permanent IDs

The short story is 'ReachCode's will be imported, while ComIDs and Permanent IDs will not be imported.

In trying to follow a K.I.S.S. strategy, while retaining a potentially useful link to the NHD database, I have chosen to import only the ReachCodes of elements. They will be imported as 'NHD:ReachCode'. This is based on my own experience with the data and on the advice included below. The ReachCodes identify data at a scale very suitable for OSM, which should mean that there will be little need to join ways with different reachcodes, as happens with tiger:tlids, for example.

I have reviewed a number of other NHD imports and have yet to see anyone import the Permanent ID, which is defined as existing to facilitate database replication. That pretty much makes it completely irrelevant to OSM.

From NHD F.A.Q.:

A reach has both a reach code, common identifier (ComID), and a permanent identifier (PermID). What is the difference?

A reach code uniquely identifies each reach. This 14-digit code has 2 parts: the first 8 digits are the hydrologic unit code for the subbasin (formerly, known as cataloging unit) in which the reach exists; the last 6 digits are a sequence number assigned in arbitrary order to the reaches within that subbasin. Each reach code occurs only once throughout the Nation. Once assigned, a reach code is associated with its reach permanently. If a reach is deleted, its reach code is retired.

Reach codes facilitate geocoding or linking of observations, [...], to reaches. Reach codes form the basis of a national linear referencing system which supports linking such observations to a point along a reach, an entire reach, or groups of reaches.

Reach codes are stored in the data element named "ReachCode". [...]. The only link between NHD at different resolutions is the reach code. Once a reach is defined and assigned a reach code, only mapping errors and changes to the hydrography will cause the reach code to change. As reach codes change, they are tracked in a special NHD cross-reference table.

The common identifier (ComID) is a 10-digit integer value that uniquely identifies the occurrence of each NHD feature (including reaches). Each value occurs only once throughout the Nation. Once assigned, the value is associated permanently with its feature. When features are deleted or split or merged, their ComIDs are retired. The common identifier is stored in a data element named "COM_ID". ComIDs are different between medium resolution and high resolution. Changes to common identifiers are not tracked.

Permanent Identifiers (PermIDs) [...]

Can I link my own data to the NHD?

Yes, the best way to link data to the NHD is using the reach code. Reach codes are permanent, tracked, multi-scale, and form the basis of a controlled linear referencing system. [...]

For the NHDArea layer, there is one shape which encompasses the White River and six of its branches. The ComID for that shape is 174342708.

Attributes in the N.H.D. Source Data

  • FDate - Last Feature update date
    • Imported: No
    • User Import Page: Yes
    • Will not be imported as this value equals 2011-09-26 on every feature.
  • Resolution - Source resolution
    • Imported: No
    • User Import Page: Yes
    • Will not be imported as this value equals 'High' for every feature, also importing high resolution NHD seems standard
  • GNIS_ID - Unique identifier assigned by GNIS
    • Imported: Yes, where exists.
    • Imported As: 'NHD:GNIS_ID'
    • User Import Page: No
    • This corresponds to USGS GNIS, which uses the 'gnis:feature_id' tag. Once I have run the import, I will begin to create river relations, one for each of the primary rivers. I will move the 'NHD:GNIS_ID' on each individual way to a single 'gnis:feature_id' tag on the relation, which will ensure that this is a 1:1 correspondence.
  • GNIS_Name - Proper name ... by which a particular geographic entity is known
    • Imported: Yes, where exists
    • Imported As: 'name'
    • User Import Page: No
    • Maps nicely to the name of the way. Also used for stream/river classification, see note.
  • FTYPE - Feature Type Text Field String enumeration
    • Imported: No
    • User Import Page: Unlikely
    • NHDFlowline layer:
      • Equals 'ArtificialPath' where vector is inside of a feature in nhdwaterbody or nhdarea. Equals 'StreamRiver' otherwise. See FTYPEs and FCodes
    • NHDArea layer:
      • Equals 'StreamRiver' for the one imported feature in this layer.
    • NHDWaterbody layer:
  • FCode - int representing Feature Type & characteristics
    • Imported: No
    • User Import Page: Unlikely
    • NHDFlowline layer:
      • Two values in source data are 55800 and 46006 corresponding to 'ArtificalPath' and 'StreamRiver' in FTYPE, respectively. See FTYPEs and FCodes
    • NHDArea layer:
      • Equals '46006' for the one imported feature in this layer.
  • LengthKM - - See Dimension Attributes, below
  • AreaSqKm - See Dimension Attributes, below
  • Elevation - Vert. distance from a given datum

Dimension Attributes (such as LengthKM and AreaSqKm)

Will not be imported as this can be calculated from the shape itself. Will not be imported as there is no equivalent in OSM and this can be calculated from the shape itself.

    • NHDArea
      • As there is only one shape in the input data for this layer, I will simply note here that the value is 6.226 in the source data.
      • I will break up that one big area into multiple smaller areas, and remove a couple pre-existing portions, so that value is wrong anyway.

NHDFlowline data

The NHDFlowline data represents all linear water features.

Data Conversion and Import Process

  1. Start with the 'NHDFlowline' shapefile.
  2. Using GRASS, import 'NHDFlowline' and simplify the shapes. There are many minor streams in the input data with many many vertices, and this reduces the number of vertices by 43%. Run the following command:
    v.generalize input=NHDFlowline output=NHDFlowline_simp method=douglas threshold=0.000015 -c
  3. Export simplified shapefile from grass.
  4. Use ogr2osm to convert simplified shapefile to OSM file. Using the translation method defined below, run the following command:
    python ogr2osm/ogr2osm/ogr2osm.py -t ogr2osm/nhdflowline.py -o converted/nhdflowline.osm simplified_shapes/NHDFlowline_simp/NHDFlowline_simp.shp
  5. Open osm file using josm and save it using JOSM. Do not upload yet. This simply saves the OSM file with line breaks, which are needed for the next step.
  6. Using the merge-ways.pl tool, begin to merge ways with matching attributes. Use the following procedure, repeating the steps using the previous output as input until satisfied, (I ran it 4 times):
    1. Run merge ways with the tags_compatible routine set to that listed below.
    2. Open output in JOSM and run validator tool on the entire data set. As long as merge-ways actually merged ways, it will find 'Duplicated Way Nodes'. Use the 'fix' option to fix these.

ogr2osm translation

def translateAttributes(attrs):
        if not attrs: return

        tags = {}

        gnisName = attrs['GNIS_Name']

        # OSM 'name' tag
        if gnisName:
          tags.update({'name': gnisName })

        # OSM 'waterway' tag
        if gnisName.endswith("River"):

        # OSM 'NHD:GNIS_ID' tag
        if attrs['GNIS_ID']:

        # OSM 'NHD:ReachCode' tag

        # OSM 'source' tag

        return tags

merge-ways.pl tool

Need to modify to the following. I prob. don't actually need the waterway check and just need the ReachCode, but it won't cause any harm.

sub tags_compatible 
    my ($a, $b) = @_;

        $a->{waterway} eq $b->{waterway} &&
        $a->{'NHD:ReachCode'} eq $b->{'NHD:ReachCode'}

Additional attributes not imported from the source data:

  • FlowDir - Direction of flow relative to vector
    • Not imported as all values equal 'WithDigitized', meaning all ways point in direction of flow, matching OpenStreetMap standards.
  • WBAreaComID - ComId of equivalent waterbody in nhdwaterbody or nhdarea layer
    • Not imported as ComIDs are generally not imported, but also OSM has no concept of layers.
  • WBAreaPer - ?
    • Value is empty/null for all features.
  • Enabled - "All features should be set to true"
    • Not imported as all values are 'True'. No equivalent in OSM.

NHDArea data

The NHDArea data represents the surface area/shapes of the larger linear features, ie those that connect to the larger network (rivers)

Attribute Value Definition Comments On User Import Attribution Page
Elevation (null/empty) Vert. distance from a given datum Will not be imported. There is no value to import. no

NHDWaterbody data

The NHDWaterbody data is a collection of the isolated water features (ponds, lakes, wetlands, etc.)

Attribute Definition Import As Comments On User Import Attribution Page
Elevation Unique id'er, 1st 8 digits are subbasin code, next 6 are unique w/in subbasin 'NHD:ReachCode' (copy value) Will be imported as this will allow me to merge ways having this key before import. Also, retains a link to NHD at a level reasonable for OSM no
FTYPE Feature Type Text Field String enumeration Equals 'ArtificialPath' where vector is inside of a feature in nhdwaterbody or nhdarea. Equals 'StreamRiver' otherwise. See FTYPEs and FCodes, below yes
FCode int representing Feature Type & characteristics Two values in source data are 55800 and 46006 corresponding to 'ArtificalPath' and 'StreamRiver' in FTYPE, respectively. See FTYPEs and FCodes, below yes

waterway=(river/stream) classification

From looking at this data and from personal knowledge of the area, I know that there are no canals and few to no drains in this data. The vast majority of this data are natural waterways (streams or rivers) - probably 98%. The remaining 2% are 'ditch'es, on farms, or are places where streams have been redirected under human development.

I decided not to use FTYPE/FCode to classify streams vs rivers in this data. The only way I could have done that would be to map ArtificialPath as river and StreamRiver as stream. However this would have meant that ponds, which are fed by streams represented within the pond as ArtificialPath would have streams leading into and out of them, but river inside of them.

In the end, it turned out that using the value of the GNIS_Name attribute provided the best mapping for this. Any NHDFlowline lines which have a GNIS_Name ending in 'River' were mapped to waterway=river. Everything else is waterway=stream. This seems to work very well, based on my local knowledge.