Comparing OSM with other datasets

From OpenStreetMap Wiki
Jump to navigation Jump to search

There are several methods of comparing OpenStreetMap with external datasets, such as government created authoritative data (such as TIGER or Ordnance Survey), in order to determine discrepancies between the two sets. This can include such things as missing features, differences in geographic location or shape, or differences in attributes such as missing or misspelled names. The next step is to reconcile these differences, which belongs to the topic of conflation.

Methods and implementations

  • Create tiles for the external dataset with a simple style such as a transparent background and a single color for the highways. This can be overlaid on an OSM map to see what roads are missing in the external data, or can be shown under an OSM road layer or in an OSM editor to see what roads are missing in OSM.
    • The DC Sidewalk Project uses this type of method by generating tiles from government sidewalk data.
  • Similar to the above method, external data can be converted to OSM format and loaded into JOSM in an inactive layer behind the current OSM data. This also allows a form of conflation by merging missing features. TODO: Expand on this, from "[Talk-us] Grand Junction + TIGER 2010" discussion."
  • Compare names of streets, to find missing streets and show spelling inconsistencies.
    • The OSM and OSL differences analysis from ITO World takes this one step further by taking into account the approximate street location provided by the external dataset, the U.K. Ordnance Survey OS Locator data. OS Locator data consists of a list of street names and their approximate geographic location.
    • OS Locator Musical Chairs also analyses OS Locator data, and shows discrepancies on a web map.
    • OslVosm is a script that similarly analyses OS Locator data for missing roads, missing names, and misspelled names, resulting in a GPX, KML, or Mediawiki-formatted output.
    • It would be useful to develop a generic toolset that could work with any data set, e.g. TIGER.
  • Find roads outside a buffer, and flag as missing road. See this page.

See also