Import/Past Problems

From OpenStreetMap Wiki
Jump to navigation Jump to search

Imports invoke a complex set of issues within OSM. Past OSM imports have created problems at a variety of levels, including data and community, among others. This page is designed to outline a few case studies and lessons learned from those imports with the goal of helping bring OSM newcomers up to speed on these issues.

You can find more detail in the past threads on the imports@ mailing list, found here: https://lists.openstreetmap.org/listinfo/imports. Here are some selected threads worth reading:


General Import Problems

Not all data should be imported

Example: Parcel data

  1. Parcels change. A major task for a tax assessor's office is keeping the parcel layer up to date. Lots are merged and split all the time.
  2. Parcels are not "surveyable." You can't go look for the parcel lines on the ground or in an aerial to see if they are correct.
  3. Parcels aren't useful on a map. When you are going somewhere you go to an address not a parcel.

They are not good for addresses because sometimes a parcel has one address, sometimes hundreds (e.g. an apartment complex or mobile home park.) Some parcels have only one address, but if it is a very large lot (a farm for example) even that can be ambiguous.

Example: Terrain data

  1. Terrain shapes is not mapped in OSM, with exception of peaks (natural=peak with some use of natural=hill) and elevation data - typically ele=* - on some objects)

Not all external data should be trusted

Example: Parcel data

Explanation of why it shouldn't be imported.


Imports might reduce mapper responsibility for/ownership of the imported data

Example: Imports: technical method & social impact - thread at imports@ mailing list.


Node Tag duplication

Example: Israel / Gaza Import

Example: KSJ2 Import - see: Japan KSJ2 Import - thread from imports @

Verifiability

Address catalog of Moscow, Russia (one of layers at Digital Atlas of Moscow), published by city government in 2016, contains building outlines with "document date" attribute. Dates after 2004 seem to be reflecting dates of official registration (roughly equal to construction completion date), but it's impossible to verify this property for all those objects. Therefore, it's better to avoid copying information like that, since without any third-party source, it's impossible to confirm, if specific date is correct.

Stale Data

Address catalog of Moscow, Russia (one of layers at Digital Atlas of Moscow), published by city government in 2016, contains significant amount of outlines of buildings, which no longer exist for a couple of years. Those buildings were demolished or rebuilt, but this dataset doesn't reflect it, since current regulation requires government to publish open data, but doesn't establish any formal requirements for its quality and responsibility for publishing bad quality data.

Therefore, every outline should be checked against an independent source to confirm it still exists.

Other GIS data formats don't always translate well to OSM data formats

Example: large polygons

Massive Polygons - thread from imports@ mailing list.


Fixing later can be very difficult

Example:

And we've also shown that finding error is far harder than entering new data. So starting with a clean slate and adding data in is easier than starting from a dataset and discovering errors- especially when (like in this example) you'd be starting from the same dataset.

In addition fixing badly done import is typically less interesting than entering a new data.

Unmapped area is more inviting for new mappers than sea of misaligned badly imported low quality data.

Cross-referencing databases

Identifiers from external databases.

  • Difficult to understand how to manage when modifying (e.g. splitting, merging) objects

Data Density Problems for Editors

As the data in OSM becomes richer and richer and information density increases, editing can become more difficult.

Past Import Lessons Learned

US: TIGER Import

Import page:

Issues: see: TIGER_fixup

French Parcel Imports

Import page: French parcels

Issues:

Corine Data

Import page

Issues:

UK Ordnance Survey

Import page

Issues: Consider, for the moment, Ordinance Survey. A study was done once comparing OSM and OS in terms of data quality. The guy running the study took OSM data and compared it to OS data (before OS was released under a Free license) and checked where they differed. He then went out and manually surveyed the areas.

It turned out that OS and OSM had approximately the same number of errors, but not the same errors. In other words, the error rate for the two datasets was the same. But here's the difference: Once you find an error in OSM, you can fix it.


US National Hydrography Dataset

Import page

Issues: See: Talk:National_Hydrography_Dataset#Dec_2012_Cleanup_Request_and_Notes

US Government Land Ownership data

Data imported from http://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.BasicOwnership.zip should not be used to represent land use or vegetation cover without extensive correction. It represents a land ownership boundary rather than the actual border of the forest. Forests have a chessboard appearance probably derived from the allocation of land on a one-mile-square grid. That is, there are regular squares of forest where none exist on the ground. For a good example, see the area NE of Grass Valley, California, U.S.A. This data needs to be edited to reflect ground reality, or tagged to represent land ownership rather than land use or vegetation cover. Sample way: https://www.openstreetmap.org/way/370128450#map=15/39.3965/-120.4460 .

Alternate/Non-Import Approaches to Adding Data to OSM Maps

Map Layers / Overlays / Mashups

See also