USGS GNIS

From OpenStreetMap Wiki
Jump to navigation Jump to search
Flag of United States Part of United States mapping project.

The USGS Geographic Names Information System (GNIS) is a database that contains millions of names for geographic features in the United States and Antarctica. The system is run by the Board of Geographic Names, a United States Geological Survey group. It is the authoritative set of geographic names for the US. It contains features that are on no other map or spatial database.

Portions of GNIS US data were bulk-imported in 2009 into OSM. While GNIS data for natural features is generally of very good quality, GNIS data for administrative and man made features was historically of poor quality and vast swathes of incorrect data imported into OSM still needs to be tracked down and corrected. Unfortunately, many GNIS records for administrative and man made features were imported into OSM without regard as to whether or not those features still exist, so there are tens of thousands of churches, schools, etc., that have long since disappeared - pre-dating the Interstate system in many (obvious) cases.

Status of GNIS imports

The initial imports of GNIS were in 2009. At this time, selected GNIS records in certain feature classes were imported into OSM and tagged as OSM features. However, the imports only covered a subset of GNIS data. In some feature classes (such as the Summit class), the majority of GNIS records were imported into OSM. In other feature classes, few or no GNIS records were imported. The bar chart below shows the observable progress in importing GNIS records as of January 2023.

GNIS Features in OSM by Feature Class.svg

Note that many of the GNIS records that are "missing" from OSM may correspond to features that have been mapped independently without the use of GNIS data. In this case, the features may be present in OSM but without the gnis:feature_id=* or synonymous tags that would allow us to correlate the feature with a GNIS record. However, in many of the GNIS feature classes, very few GNIS records were imported at all!

In some cases, the decision not to import the GNIS records for certain feature classes may have been based on the limited geometry in GNIS records that does not match the expectations for mapping features in OSM. For example, records in the Stream class in GNIS have only two coordinates, one for the mouth of the stream or river, and another for its source. Understandably, it would make little sense to import these records without filling in the rest of the geometry. On the other hand, GNIS records in the Spring class have all the data necessary to map the features in OSM and yet only 11% of these records have been mapped. As of January 2023, there are more than 600,000 GNIS records from currently maintained feature classes for which there is no corresponding feature in OSM with a gnis:feature_id=* tag.

Neither our environment nor the GNIS catalog of named natural features is static. USGS is continually improving the GNIS data set, correcting errors in records, reconciling duplicate records, updating records where names have changed, and adding new records as new features are named. In general, OSM has not kept up with these updates so there are many cases where the data imported into OSM in 2009 is now stale and no longer correct.

Legacy tags from GNIS imports

Like many other early imports, the GNIS imports brought a lot of additional tags into OSM. The only GNIS field that is useful in OSM is the Feature ID in the gnis:feature_id=* tag. This tag allows us to look up the corresponding GNIS record and compare the GNIS data to what's mapped in OSM. The majority of the other GNIS tags are direct copies of other fields from the GNIS records, and for the most part this information does not need to be in OSM.

Key(s) Notes Key(s) Notes Key(s) Notes
gnis:fcode Key used in some National Hydrography Dataset imports for the NHD FCode value. This value is useful in verifying that a hydrographic feature has been correctly mapped but the more common key for this value is nhd:fcode=*. gnis:ftype Key used in some National Hydrography Dataset imports for the NHD FType value. Some NHD imports put this value in the nhd:ftype=* tag. This value is redundant if the NHD FCode value is present. gnis:feature_type

gnis:Class

gnis:class

The GNIS Feature Class value. This is not typically useful information in OSM as the other tags on the feature will be more specific and appropriate. These keys can be safely deleted.
gnis:created The GNIS Date Created value which is the date that the record was created in GNIS. This is not useful information in OSM and can be safely deleted. gnis:county_id

gnis:County_num

The GNIS County Numeric value which is a numeric identifier of the county. OSM identifies which county a feature is in by area enclosure so this can be safely deleted. gnis:state_id

gnis:ST_num

The GNIS State Numeric value which is a numeric identifier of the US state. OSM identifies which state a feature is in by area enclosures so this can be safely deleted.
gnis:county_name

gnis:County

gnis:county

The GNIS County Name value which is the name of the county. OSM identifies which county a feature is in by area enclosure so this can be safely deleted. gnis:ST_alpha

gnis:ST_alph

gnis:state

The GNIS State Alpha value which is the name of the US state. OSM identifies which state a feature is in by area enclosures so this can be safely deleted. gnis:name The GNIS Name value which is the official name of the feature. For records in the currently maintained feature classes, GNIS is a definitive source of feature names. So, if this value is different from the name=* value something might be wrong. For records in the archived feature classes, the accuracy of the GNIS names is questionable. In either case, if the value is correct it belongs in the name=*, alt_name=*, official_name=* or other similar name tag.
gnis:Cell The GNIS Map Name value which is the name of the USGS topographic map containing the feature. This information is not needed in OSM so this can be safely deleted. gnis:edited The GNIS Edited Date value which is the last date the GNIS record was edited (before the data was imported into OSM). This data not useful in OSM and is likely stale so it can be safely deleted.

Alternate keys containing the GNIS Feature ID

Historically, imports used several different keys for the GNIS Feature ID value. The gnis:feature_id=* tag was most common, but the gnis:id=* and ref:gnis=* tags were also used. The National Hydrography Dataset has GNIS Feature IDs in its records for named hydrographic features and these values were imported into the nhd:gnis_id=*, NHD:GNIS_ID=*, and other similar tags. The TIGER data sets have GNIS Feature IDs for civil boundaries and Census Designated Places and these values were imported into tiger:PLACENS=* and similar tags.

The Cleanup and normalization of GNIS imports to only use gnis:feature_id for the id tag effort by Watmildon in 2023 moved all the GNIS Feature ID values to the gnis:feature_id=* key. As of August 2023, only the gnis:feature_id=* tag should be used for GNIS Feature IDs.

Sources of GNIS data

USGS provides access to GNIS data that can be used to update or correct existing features in OSM that already have the gnis:feature_id=* tag, or to add GNIS features that were not previously imported into OSM. It is important to understand the different GNIS data sets that USGS makes available. See the How can I acquire or download Geographic Names Information System (GNIS) data for an overview of the available data sets and data sources.

As of 2023, GNIS makes data sets available:

1. The Archived Data Set

This is the historical GNIS data format and contains records in the archived feature classes. The original GNIS imports in 2009 were based on the version of this data set available at that time. USGS archived this data set on August 25, 2021 and has not modified any of the data since then. The quality of data in the archived data set varies greatly by feature class. Feature in the archived feature classes (see table below) had historically not been well maintained and GNIS data for these features was often incorrect or out of date. Features in the other classes were well maintained and the data is generally of good or very good quality.

Many of the GNIS records imported into OSM in 2009 were from the now archived (then poorly maintained) feature classes. These records for churches, schools, post offices, other buildings, mines, towers, and other man made or administrative features were often incorrect so the data in OSM needs to be cleaned up. Some of the cleanup has already happened, but there are many examples of old, incorrect data still in GNIS. On the other hand, GNIS records for natural features imported in 2009 were generally of very good quality. This data was correct at the time but may have been modified or updated by GNIS in the years since the original OSM imports. GNIS data for the archived feature classes can only be found in this archived data set. Records for the archived feature classes are no longer available in the other GNIS data sets.

The archived GNIS data can be found in the Archive folder in The National Map Staged Products Directory. This data set consists of several flat files:

  • NationalFile_20210825.txt (315 MB) and associated files for individual states which contain the main data set of domestic US names maintained by GNIS. These are the primary GNIS records in the archived data set.
  • NationalFedCodes_20210825.txt (22 MB) and associated files for individual states which contain the Census Code (formerly FIPS55 Place Code), Census Class Code (formerly FIPS55 Class Code), GSA Geolocation Code, OPM Duty Station Code, and INCITS 38:200x (Formerly FIPS 5-2) State codes and INCITS 31:200x (Formerly FIPS 6-4) county codes for features in the Populated Place, Civil, and Census classes. These files are primarily useful to cross reference GNIS features against other US government sources.
  • AllNames_20210825.txt (701 MB) contains alternate and historical names of GNIS features as well as notes about the historical sources of the names. To be useful, the records in this file need to be matched against records in NationalFile_20210825.txt using the Feature ID as a primary key.
  • Feature_Description_History_20210825.txt (27 MB) contains the additional Description and History text fields for records in NationalFile_20210825.txt, using the Feature ID as a primary key.
  • GOVT_UNITS_20210825.txt (253 KB) contains GNIS data identifying US states and counties (or their equivalent). These records also appear in the Civil class in NationalFile_20210825.txt.
  • ANTARCTICA_20210825.txt (6.5 MB) which contains GNIS-maintained names for features in Antarctica in a data set that is distinct from the GNIS data for domestic US names.

The archived GNIS data also includes several flat files that are subsets of the primary set of GNIS records:

  • HIST_FEATURES_20210825.txt (23 MB) contains GNIS records for features that no longer exist. GNIS actively collects and maintains records of historical features. These features are also present in NationalFile_20210825.txt and can be identified by the "(historical)" text at the end of the name field.
  • POP_PLACES_20210825.txt (26 MB) contains GNIS records for all the features in the Populated Place class in NationalFile_20210825.txt. See below for discussion about the issues with this class of features.
  • US_CONCISE.txt (5 MB) is a subset of records from NationalFile_20210825.txt containing arbitrarily prominent features that GNIS suggests should be labeled on maps with a scale of 1:250,000-scale or smaller.

2. The Current Data Set

In 2023, USGS made a new version of the GNIS records available in flat files. This version of the GNIS data is maintained and current, but omits all of the features in the archived feature classes. The quality of data in this data set is generally very good, partly because the problematic records in the archived feature classes are no longer present, but also because USGS has been updating GNIS records to improve location accuracy and correct other errors. If there were problems with a record in the archived data set, it is likely that the problems have been fixed in the current data set. And because these records are maintained, it is possible to contact USGS to get remaining errors corrected.

The current GNIS data can be found in the Geographic Names folder in The National Map Staged Products Directory. As with the archived data set, this data set consists of several flat files:

  • DomesticNames_National.txt (147 MB) and associated files for individual states which contain the main data set of domestic US names maintained by GNIS. These are the primary current GNIS records.
  • FederalCodes_National.txt (28 MB) and associated files for individual states which contain contain the Census Code (formerly FIPS55 Place Code), Census Class Code (formerly FIPS55 Class Code), GSA Geolocation Code, OPM Duty Station Code, and INCITS 38:200x (Formerly FIPS 5-2) State codes and INCITS 31:200x (Formerly FIPS 6-4) county codes for features in the Populated Place, Civil, and Census classes. These files are primarily useful to cross reference GNIS features against other US government sources.
  • FeatureDescriptionHistory_National.txt (15 MB) contains the additional Description and History text fields for records in DomesticNames_National.txt, using the Feature ID as a primary key.
  • GovernmentUnits_National.txt (249 KB) contains GNIS data identifying US states and counties (or their equivalent). These records also appear in the Civil class in DomesticNames_National.txt.

The current GNIS data also includes flat files that are subsets of the primary set of GNIS records:

  • HistoricalFeatures_National.txt (4 MB) contains GNIS records for features that no longer exist. GNIS actively collects and maintains records of historical features. These features are also present in DomesticNames_National.txt and can be identified by the "(historical)" text at the end of the name field.
  • PopulatedPlaces_National.txt (26 MB) contains GNIS records for all the features in the Populated Place class in NationalFile_20210825.txt. See below for discussion about the issues with this class of features.

As of September 2023, USGS has not yet released the file containing the alternate and historical names for features. The "concise" subset of GNIS records is no longer available.

3. The National Map

USGS provides a web interface to The National Map which allows detailed searches for GNIS records based on attributes and/or the visible extent of the map. This is an excellent way to look up GNIS records by Name or Feature ID, or to search for GNIS records in a certain area by zooming and panning the map and selecting the "Visible in current extent" option. The data available from this web site is based on the current GNIS data set, including the alternate names, description, and history for each record. Records in the archived feature classes are no longer accessible on the USGS web site. GNIS still retains these records, but USGS does not provide access to them except in the archived data set.

4. The Geographic Databases

USGS also makes the current GNIS data available as ESRI Geodatabase (.gdb) and GeoPackage (.gpkg) files. These files are also available in the Geographic Names folder in The National Map Staged Products Directory. The Geodatabase and GeoPackage files contain all the data in the current flat files with one significant difference: these files may contain multiple coordinates for each GNIS record. Both the archived and current flat files contain Primary and Source coordinates for features.

For features mapped as a single point, only the Primary coordinate is present. Features mapped as areas also have only the Primary coordinate, typically at the place where the label would have been located on a USGS Topo map. Linear features such as waterways, valleys, etc. have both the Primary and Source coordinates. When USGS updated the GNIS schema, they made it possible for each GNIS record to contain a list of coordinates. The first coordinate in the list is still the Primary coordinate, and for linear features the last coordinate in the list is the Source coordinate. Current GNIS records now contain one coordinate for each of the USGS Primary series 1:20,000-scale quad maps in which the feature is present. So linear features that cross several quads now have one coordinate for each quad. And area features that span several quads now have one coordinate for each quad. The location of these intermediate coordinates is essentially at the "center" of the portion of the feature contained in the quad.

The additional coordinates for the GNIS records can also be viewed on The National Map web site.

Populated Places

The Populated Place class of features in GNIS is intended to represent "a named community with a permanent human population, usually not incorporated and with no legal boundaries, ranging from rural clustered buildings to large cities and every size in between." However, the data in the Populated Place class in both the archived and current data sets is problematic for several reasons:

  • Many of the Populated Place records were interpreted from historical USGS topographic maps. On these maps, the labels for different types of features often only had very subtle differences in typography and there were many errors in interpreting the USGS map labels that resulted in unpopulated locales being recorded as Populated Places.
  • Many Populated Place records were derived from train stations that may or may not have been permanent communities.
  • Many Populated Place records were derived from other unreliable sources.

Both Wikipedia and OpenStreetMap have had problems with imported GNIS records of Populated Places. The  Wikipedia:Reliability of GNIS data page describes some of the problems with GNIS Populated Place records imported as Wikipedia article stubs.

Not all Populated Place records in GNIS are bad. Where a Populated Place corresponds to an established community, the Populated Place record from GNIS should be mapped in OSM as a node with the gnis:feature_id=* from the Populated Place record. If the boundary of the incorporated community is mapped in OSM, the Populated Place node should be present in the boundary relation with the "label" role. However, many questionable Populated Place records are present in OSM tagged as place=hamlet. Both mvexel and samely have attempted to encourage cleanup of these features in OSM.

Cleaning up GNIS

Archived features

Some GNIS feature classes have been archived (see table below). Looking up these features in USGS GNIS web site using the value of the gnis:feature_id=* or name=* will not find any results. However, these GNIS records are still available in the archived data set (see above).

The data quality of records in the archived feature classes was generally poor. If you encounter one of these features mapped in OSM, consider correcting it by:

  • Verifying all of the attributes using reliable current data sources
  • Correcting the location and geometry using current aerial imagery or other reliable sources
  • Removing features that are no longer present

If the feature in the real world no longer matches the data in the archived GNIS record (e.g. a former church that is no longer a place of worship), removing the gnis:feature_id=* tag is appropriate. However, if the feature in OSM simply corrects errors in the archived GNIS record (e.g. a post office that still exists and is in use but is now mapped at a more correct location than the information in GNIS), the gnis:feature_id=* tag should be retained.

Removing historical features

If you come across a feature that no longer exists in the real world, feel free to delete it.

As of 2023, roughly 2.6% of the features in the current GNIS database (or 7.4% in the 2021 data set) are designated as historical, typically meaning those features no longer exist. The GNIS name of historical features ends with "(historical)". Historical features were imported in 2009.

Example: "Leschi Glacier (historical)" (Feature ID 1522032) was destroyed following the eruption of Mount St. Helens in 1980. It was imported as node Leschi Glacier.
  • Many features in the archived classes of administrative and man made features no longer exist. Buildings, churches, schools, and hospitals that were in GNIS are frequently gone in the real world. Where the buildings remain, the names and purposes of the buildings recorded in the archived GNIS classes are frequently out of date.
  • Many landuse=quarry are not quarries. Most can be changed to historic=mine if they're obviously disused. The ones that are in fact quarries should be mapped as areas, possibly with disused=yes, but only if they are still visible on the ground. If it is not obvious that the location was ever a mine, use the "razed" namespace or similar, or just remove the feature.

Converting GNIS nodes to areas

While the GNIS dataset includes only nodes, some of the features they represent are often better mapped as areas (e.g. islands, parks, buildings). When creating or editing an area that is also represented by a GNIS node, the GNIS tags should be copied to the area and the node should be deleted. In the JOSM editor the "paste tags" function is quite useful for this purpose.

Merging duplicate nodes

Some GNIS imports have created multiple redundant nodes for a single GNIS feature. (Example: node 288650069/node 356549620 and other summits in that area.) If there's no need for two nodes, then copy the tags onto the node you wish to keep and then delete the unnecessary nodes. In JOSM you can select the nodes and press "M" or go to Tools > Merge Nodes.

Adding newer features

A large number of entries were added to GNIS from May 2009 through 2013, after the import took place, so none was ever imported. Search GNIS for features in an area of interest, then sort descending by entry date to see missing features that you can map. You can also query Sophox for an interactive map of missing or untagged GNIS features in a given state.

Contributing fixes

One of the positive features of using USGS's GNIS data set is that they offer a method of feeding changes, additions, and deletions back into the data set by the public. To facilitate this, all nodes were imported with the gnis:feature_id tag that corresponds to the FEATURE_ID column in the USGS database. This is their primary key and allows anyone to submit changes back to the GNIS public websites. When merging a GNIS-tagged map feature in OSM with a duplicate feature, be sure to include the feature_id tag in the merged feature.

The method to submit changes to GNIS data is through the National Maps Corps, via their Web-based editor.

(In the past, you could email the GNIS administrator with improvements or to get feedback; but due to budget cuts, they no longer accept contributions to the GNIS database in this way.)

Tagging

Feature ID

The Feature ID uniquely identifies a feature in the GNIS database and is thus the most important thing to tag when relating an OSM feature to a GNIS feature. The tag gnis:feature_id is by far the most commonly used for this purpose. Other tags for GNIS Feature IDs include gnis:id and tiger:PLACENS.

Sometimes these Feature IDs have leading zeroes. The online GNIS interface handles leading zeroes just fine.

Multiple GNIS features can be represented by a single feature in OSM. The semi-colon is most commonly used to list multiple IDs. In some cases these might be actual duplicates in the GNIS database (perhaps because the feature appears at slightly different coordinates on different maps).

Feature class

There is a single FEATURE_CLASS column in the data set that is a key for the type of record and for the OSM tags that were applied.

FEATURE_CLASS GNIS status OSM Tag(s)
Beach In use natural=beach
Cemetery Archived amenity=grave_yard
Church Archived amenity=place_of_worship
Cliff In use natural=cliff
Crater In use natural=crater
Dam Archived waterway=dam
Forest Archived landuse=forest
Geyser / Spring In use natural=geyser
Glacier In use natural=glacier
Harbor Archived waterway=dock
Hospital Archived amenity=hospital
Island In use place=island
Park Archived leisure=park
Post Office Archived amenity=post_office
Rapids In use waterway=rapids
Reservoir In use landuse=reservoir
School Archived amenity=school
Summit In use natural=peak
Tower Archived man_made=tower
Mine Archived landuse=quarry
Airport Archived aeroway=aerodrome
Bay In use natural=bay
Swamp In use natural=wetland + wetland=swamp
Woods In use natural=wood
Military In use landuse=military
Plain In use natural=heath
Building Archived Various tags based on building name, including amenity=library, amenity=townhall, amenity=public_building, amenity=fire_station, tourism=museum, and others.

Not all feature classes have been imported. Significant imports involving GNIS include:

  1. Many classes: The big 2009 import. More than a million features.
  2. "Stream" class: NHD imports have included the gnis:id tag (the Stream class was not imported in 2009).
  3. "Populated Place" class: Imported in 2007. See Changeset 85362 for one part of that.
  4. "Civil" class: From imports involving TIGER place boundaries with the tiger:PLACENS key. Done in 2009 (see Changeset 378277 as one small example). Commonly tagged on ways in some states and relations in others.

Other tags

The 2007 import included a bunch of data:

  • gnis:Class = Feature Class name
  • gnis:County = County name
  • gnis:County_num = County FIPS code
  • gnis:ST_alpha = State name (2-Letter abbreviation)
  • gnis:ST_num = State FIPS code

The 2009 import included:

  • ele = GNIS data includes elevation
  • gnis:county_id = County FIPS code
  • gnis:created = MM/DD/YYYY when the GNIS entry was created
  • gnis:state_id = State FIPS code

In some cases mappers have converted the GNIS county/state tags to is_in=* or addr=* format. Beware that GNIS tags often specify only one county/state per feature; if the feature crosses or forms a boundary, as many do, the GNIS tags may be insufficient to create proper is_in/addr tags.

Example: node Lemah Mountain is part of the boundary between Kittitas and King counties in Washington. The associated GNIS Feature [1] shows both counties, however only Kittitas is mentioned in feature's OSM tags.

See also

External links