User:B1tw153/recoGNISer

From OpenStreetMap Wiki
Jump to navigation Jump to search

The recoGNISer is software that automatically matches GNIS records to features in OSM and identifies discrepancies between OSM and GNIS data.

Origins of the Project

Secretary's Order 3404, issued on November 19, 2021, declared the word “sq___” to be derogatory and directed a series of Department of the Interior (DOI) actions to replace the word in all geographic names used by the federal government. As a result, the U.S. Board on Geographic Names issued a list of 650 GNIS records where names containing "sq___" were changed to more suitable names. Suddenly, any of these features that had been mapped in OSM were out of date and needed to be renamed. Starting in September of 2022, there were some ad hoc and semi-organized efforts to manually correct all of the OSM features whose names had been changed. Matt Whilden and I worked together on one of the semi-organized efforts.

Much of this work was tedious and the process went something like this:

  1. Start with a GNIS record for a feature that had been renamed
  2. Find the feature in OSM if it has been mapped
  3. If the feature has been mapped, update it with the new name
  4. If the feature has not been mapped, map it with the new name
  5. Rinse and repeat

Repeating these steps over and over led to some observations about where there might be opportunities to automate the process.

Finding Features in OSM

The obvious way to do this is to search OSM for the `gnis:feature_id` value since the GNIS Feature ID is the unique and persistent identifier for all GNIS records. However, there are quite a few GNIS features in OSM that are mapped but that are missing the `gnis:feature_id` tag or have incorrect values.

When searching by `gnis:feature_id` fails, we have to search for features based on their names and geometry and hope to find the relevant features in OSM. Sometimes there's a feature in OSM of the right type in the right place, but without the right tags. Sometimes, only part of the feature has been mapped in OSM (which happens frequently for waterways). But very often the feature is simply not mapped in OSM.

Updating Existing OSM Features

GNIS gives us both the `name` and `gnis:feature_id` values, so we can update the feature to include these tags.

GNIS also gives us a location for features mapped as points (e.g. summits) or areas (e.g. lakes), and a start and end point for linear features (e.g. valleys or streams). We can use this data to confirm that the feature is mapped in the right place. This is particularly helpful for mapping waterways where it's easy to mistakenly map a waterway up the wrong tributary, which puts the name in the wrong place on the map.

Although the GNIS Feature Class doesn't always map directly to OSM tagging, we can use it to make sure the primary tags on the feature are reasonably correct.

Mapping Missing Features

One major lesson of the Sq___ renaming is that many GNIS records have never been mapped in OSM at all. During the renaming, we decided to map these features with their new names so that someone wouldn't be likely to map a feature with the old sq___ name.

The `name` and `gnis:feature_id` come directly from GNIS. The location or start and end points from GNIS are a starting point for mapping the feature, but linear features always need manual editing. And it helps to verify the location of features mapped as nodes too.

Again, the GNIS Feature Class gives us a good idea what the tags should be in OSM. But because the categorization in GNIS doesn't always map directly to OSM's data model, this usually needs manual editing too.

Development

The initial stages of development were focused on using GNIS records to identify OSM features. This code got tweaked to handle some special cases and to prefer false negatives over false positives.

Because much of the editing requires manual review, the next stage of development was to have the code output MapRoulette challenges. Some changes to OSM are relatively easy to make based on the GNIS data (e.g. name updates), so the MapRoulette challenges can be created with cooperative tasks that automate the updates in JOSM.

The MapRoulette workflow is good for interactive editing, but the pace can be relatively slow when there are a large number of similar changes to make. So, the recoGNISer also has the capability to output OsmChange XML files that bundle all the changes together. This works well for limited tasks like adding larger numbers of missing features in bulk, where each of the feature can still be manually reviewed in JOSM.

Using the Software

The recoGNISer is a command-line executable that takes a portion of a GNIS data file as input, matches the GNIS records to OSM by querying an Overpass server, and outputs one or more of the result file types. The current GNIS data set has records for nearly one million individual features. Of these, about 40% have been mapped in OSM. In theory, you could ask the recoGNISer to process the entire GNIS data set. However, that would take a lot of CPU time for both the recoGNISer and Overpass, and the output will contain more data than any one person would likely want to work with. In practice, it makes more sense to use the recoGNISer to process a small subset of the GNIS data set.

The typical workflow is:

  1. Identify an interesting subset of GNIS data that is small enough to work with. This is often a single GNIS Feature Class in a single US state. But sometimes it makes sense to work with several GNIS Feature Classes in a single county, or all the GNIS records in a smaller area.
  2. Extract a subset of the GNIS records as the starting data set. The recoGNISer package includes a PowerShell script that can slice GNIS data files by Feature Classes, states and counties, or a bounding box.
  3. Run the recoGNISer to process the data. It's best to use a private Overpass instance for this so that the recoGNISer isn't abusing the public Overpass servers.
  4. Upload the GeoJson output to MapRoulette to create a new challenge, or open the OsmChange XML output in JOSM.
  5. Manually review and edit each of the features, either by working through the tasks in MapRoulette or reviewing each of the feature in JOSM.

Source Code

The official recoGNISer repository is on Github. The repository also includes detailed instructions for configuring the software and using the command-line options.