Imports invoke a complex set of issues within OSM. Past OSM imports have created problems at a variety of levels, including data and community, among others. This page is designed to outline a few case studies and lessons learned from those imports with the goal of helping bring OSM newcomers up to speed on these issues.
You can find more detail in the past threads on the imports@ mailing list, found here: https://lists.openstreetmap.org/listinfo/imports. Here are some selected threads worth reading:
- Import guidelines - official or not? was: parcel data in OSM - thread at imports@ mailing list
General Import Problems
Not all data should be imported
Example: Parcel data
- Parcels change. A major task for a tax assessor's office is keeping the parcel layer up to date. Lots are merged and split all the time.
- Parcels are not "surveyable." You can't go look for the parcel lines on the ground or in an aerial to see if they are correct.
- Parcels aren't useful on a map. When you are going somewhere you go to an address not a parcel.
They are not good for addresses because sometimes a parcel has one address, sometimes hundreds (e.g. an apartment complex or mobile home park.) Some parcels have only one address, but if it is a very large lot (a farm for example) even that can be ambiguous.
Example: Terrain data
- Terrain shapes is not mapped in OSM, with exception of peaks (natural=peak with some use of natural=hill) and elevation data - typically ele=* - on some objects)
Not all external data should be trusted
Example: Parcel data
Explanation of why it shouldn't be imported.
Imports might reduce mapper responsibility for/ownership of the imported data
Example: Imports: technical method & social impact - thread at imports@ mailing list.
Node Tag duplication
Example: Israel / Gaza Import
Example: KSJ2 Import - see: Japan KSJ2 Import - thread from imports @
Address catalog of Moscow, Russia (one of layers at Digital Atlas of Moscow), published by city government in 2016, contains building outlines with "document date" attribute. Dates after 2004 seem to be reflecting dates of official registration (roughly equal to construction completion date), but it's impossible to verify this property for all those objects. Therefore, it's better to avoid copying information like that, since without any third-party source, it's impossible to confirm, if specific date is correct.
Address catalog of Moscow, Russia (one of layers at Digital Atlas of Moscow), published by city government in 2016, contains significant amount of outlines of buildings, which no longer exist for a couple of years. Those buildings were demolished or rebuilt, but this dataset doesn't reflect it, since current regulation requires government to publish open data, but doesn't establish any formal requirements for its quality and responsibility for publishing bad quality data.
Therefore, every outline should be checked against an independent source to confirm it still exists.
Other GIS data formats don't always translate well to OSM data formats
Example: large polygons
Massive Polygons - thread from imports@ mailing list.
Fixing later can be very difficult
And we've also shown that finding error is far harder than entering new data. So starting with a clean slate and adding data in is easier than starting from a dataset and discovering errors- especially when (like in this example) you'd be starting from the same dataset.
In addition fixing badly done import is typically less interesting than entering a new data.
Unmapped area is more inviting for new mappers than sea of misaligned badly imported low quality data.
Identifiers from external databases.
- Difficult to understand how to manage when modifying (e.g. splitting, merging) objects
Data Density Problems for Editors
As the data in OSM becomes richer and richer and information density increases, editing can become more difficult.
Past Import Lessons Learned
US: TIGER Import
Issues: see: TIGER_fixup
French Parcel Imports
Import page: French parcels
UK Ordnance Survey
Issues: Consider, for the moment, Ordinance Survey. A study was done once comparing OSM and OS in terms of data quality. The guy running the study took OSM data and compared it to OS data (before OS was released under a Free license) and checked where they differed. He then went out and manually surveyed the areas.
It turned out that OS and OSM had approximately the same number of errors, but not the same errors. In other words, the error rate for the two datasets was the same. But here's the difference: Once you find an error in OSM, you can fix it.
US National Hydrography Dataset
US Government Land Ownership data
Data imported from http://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.BasicOwnership.zip should not be used to represent land use or vegetation cover without extensive correction. It represents a land ownership boundary rather than the actual border of the forest. Forests have a chessboard appearance probably derived from the allocation of land on a one-mile-square grid. That is, there are regular squares of forest where none exist on the ground. For a good example, see the area NE of Grass Valley, California, U.S.A. This data needs to be edited to reflect ground reality, or tagged to represent land ownership rather than land use or vegetation cover. Sample way: https://www.openstreetmap.org/way/370128450#map=15/39.3965/-120.4460 .