Fi:Maastotietokanta/Road Import Stage1 Plan

From OpenStreetMap Wiki
Jump to navigation Jump to search

Finnish Road Import Stage 1 Plan


The goal of this import is to include road dataset for those parts which DO NOT OVERLAP with the existing road/building objects in OSM. This stage 1 import only considers the objects which do not have overlap! The rest of data will be handled later in a different import. Conflation and other data joining will be performed in a later stage in another import (if/when such is deemed necessary and reasonable safe).

The dataset in this phase includes only road centerline, highway=* (some manual classification will be necessary), ref=*, names (name=*, name:fi=*, and name:sv=*), surface=*, oneway=yes, bridge=*, tunnel=*, layer=* (when appropriate), route=ferry, and type=cable. The Samish names (only 263 in total) need to be handled separately independent from this import (due to complexity of character and language encodings, it does not seem wise to include into this stage of import).


The checking and importing process will begin as soon as the proper copyright notice is in place. As only a limited time for checking and completing a reasonably sized subnet is required, the whole import stage 1 is expected to complete in few months if not sooner.

Import Data


Data source site: (Roads with addresses is the subset of topographic-database used for this import). In addition, boundaries were trivially used during minor name cleanup in Ahvenanmaa prior to subnet generation. The exact data extract used is (at least currently) available from:
Data license: + attribution clarification:
Type of license (if applicable): CC-BY like with clarifications on attribution requirement (essentially: "OSM copyright page is enough like with the other sources")
Link to permission (if required):
OSM attribution (if required): and
ODbL Compliance verified: yes

Import Type

Import is one-time. Current (or very recent OSM data) is checked against data so that empty areas in OSM are located and the corresponding road subnets for the empty parts are built. No conflation or such data matching is necessary for this stage as the objects should not match any geometry currently in OSM (unless OSM data is very much off from right position, dealing way-off OSM data is found and handled in a later phase). Currently around 165k subnets have been located. The largest networks are around 600km long in total but the size reduces dramatically to more managable networks. Most of the networks are rather small.

.osm files are created for each subnet. The subnets are distributed among mappers participating in the import using a modified OSM Tasking Manager (OSMTM). The data is compared against imagery ( orthoimagery and/or Bing) where available (coverage is quite good although our current source of TMS lacks ~30% of the tiles for unknown reason, which is to be fixed as soon as possible). Some road classification cannot be automatically handled by the output script, and this needs to be handled by the mapper. As the data contains quite significant number of small geometry error (at least unattached highway endpoints seem rather common), it's important that JOSM validator is run for each subnet and all errors are review by the mapper before uploading.

The matching tool is able to provide also auto-validation data and therefore manual validation in OSMTM is disallowed. If auto-validation is to fail for some particular subnet (unlikely but possible), it's still possible for the admin user to manually set the validated state.

Data Preparation

Data Reduction & Simplification

The dataset road geometries are split at every intersection. Prior to generation of .osm files, the highway parts with identical tags are passed through ST_Linemerge to reduce object splits where possible. Initial impression was that there would be no over-noding, however, it was recently discovered that especially the forest tracks have quite high number of nodes even for straight segments. Therefore 0.2m/0.4m simplify step was added to reduce number of unnecessary nodes. Effect to the geometries is very limited, and the lower value is selectively used for ways which don't match those forest tracks.

Removing information that is already in OSM is key preparation for this import. There should be no need for manual removal of information already in OSM in any of the subnets because we are essentially importing to empty canvases only.

Significant exceptions to the no-overlap rule

Water bodies and snowmobile routes are not considered in the overlap checking.

Water bodies have been largely drawn from misaligned landsat (and will need to be later fixed, possibly by importing replacements that are significantly higher quality) and overlap with them very likely only means that the water is incorrect, not the road data.

Snowmobile crowd in Finland has been very active in the past two years (a good thing!) which means that there are lot of snowmobile accessable ways in OSM where there is otherwise empty canvas. Part of such snowmobile routes are actually using ways that are in the dataset but it is not always the case (could go next to the way, inaccurate gps, etc.). Therefore each such overlap needs to be addressed by the snowmobile community using survey but that can only happen after chicken-and-the-egg problem with the highways have been resolved by adding the highways which are invisible during wintertime snowcover. Inclusion of snowmobile enabled ways to overlap detection was also tested earlier and it reduced effectiveness of the non-overlap detection quite significantly (like said earlier, there are quite many of snowmobile routes all around in OSM already).

Tagging Plans

Highway and Ref

The highway classification in Finland is mostly based on ref=*, with some exceptions in city centers due to importance conflicting with high ref=*. However, almost all major roads already appear in OSM and will therefore not be included into the non-overlapping subnets in the first place. Thus, mostly highway=tertiary or lower in the hierarchy is expected to appear in the subnets (less than 2k intersection-to-intersection segments with primary/secondary appear in the subnets, probably mostly in the sparsely populated and mapped Northern Finland).

As highway classification cannot be fully performed from dataset alone, some manual classification will be necessary after the automated process. Such ways appear as highway=road in the subnet .osm files.

Winter roads (class=12312) are excluded from this import. Such ways are mostly ice roads (i.e., shortcuts across ice during winter-time) and snowmobile routes. As snowmobile routes are quite well covered already in OSM and are excluded intentionally from the overlap checks, it would not be wise to duplicate them in this import. Sadly it also means that ice roads cannot be handled in this stage but need to done separately later (as far as we know, there is no way to differentiate these two in the source data set).


Ferries have own class in dataset, they are mapped to route=ferry and type=cable. Non-cable ferries are excluded because their GPS measurable routes often differ significantly from that of on map. Matching non-cable ferries can be done manually later as it only affects small number of objects (427 objects).


Ways with oneway=yes are reversed geometrically when oneway=-1 equivalent appears in dataset.

Bridges and tunnels

Bridges and tunnels are simple to match to OSM model.


surface=paved and surface=unpaved will be used.

Names provides name:fi=*, name:sv=*, and Samish names.

The name=* selection is somewhat complicated in Finland. The streetname signs in Finland are done such that the topmost position is determined by the mother tongue shares in the population in a place. In OSM we try to follow this custom for the selection of name=* tag. As the administrative borders in Finland are currently somewhat lacking in OSM (and there's some restructuring of administrative domains occurring too in Finland currently), determining which of the languages (name:fi=* or name:sv=*) should be selected seems more complicated than it could (at least some CC-BY datasets exist for the mother-tongue shares per a place but it would be undesirable to have additional license dependencies at this point). Luckily changing the selection of the name=* tag is quite trivial to change afterwards and will be necessary anyway because of the upcoming administrative domain changes, this can easily be done in a separate task and name:fi=* that is the most dominant language be used as default for the purposes of this import. When there is no name:fi=* in NLS dataset, name:sv=* is used automatically (these highways are on areas where Swedish has such significant majority that there even isn't Finnish names given).

Samish names are ignored in this import, only very few of them appear in the data in the first place (263 in total) and they can be dealt separately later. Verifying the handling Samish language encodings and selection of language ISO codes would unnecessarily complicate the import (there are three recognized Samish variants in Finland).

Width and lanes data (not imported now)

The road classification system includes limited information on number of lanes and range of width for the road. Because neither is simple single value, the usefulness is not very good and the current width=* specification does not include way to indicate a range.

Construction/Planned stage objects (not imported now)

There is VALMAS field in NLS dataset to indicate completeness of an object. Value 1 probably means highway=construction and 3 possibly indicates highway=proposed but the timestamps for those objects are quite old which probably means they are likely to be stale anyway. It is better to handle these non-complete objects later (only around 500 objects). Besides, some of them are already in OSM anyway (such as ref=7 construction).

Address data (not imported now) dataset includes also addr:housenumber=*+addr:street=* low and high for each way (intersection to intersection granularity similar to how addr:interpolation=* in OSM), however, quality of the addresses is not confirmed. Therefore it is better to postpone handling of the addresses to a later point. In addition, better quality datasets for addresses in Finland exists but acquiring them is currently too expensive to be practical (it seems that license-wise the data might be ok).

Source dataset identification

We believe that there is no need to include arbitary ID numbers from a foreign dataset (for matching in future). The geometries themselves provide much more accurate and well defined identification of each object. Besides, there's no specification from which would guarantee any stability of the foreign IDs nor anything explaining how they behave when changes to objects are performed.

Current import guidelines mandate separate account for the import. We feel no need to have source attribution in each object because the dedicated import account should be enough to distinguish import from the other edits. Besides, such per object attribution would just bloat the database unnecessarily.

Changeset Tags

Key Value
comment Import road subnet ID
source Topographic Database
source:date 2013-08-27
import yes

Data Transformation

Postgis DB operations are used to calculate subnets.


  1. Cleanup NLS geometries that do not terminate at an intersection by splitting them (few hundred objects)
  2. Remove very broken geometries with massive overlaps (24 objects). This step does not exclude objects which overlap fully.

Core process:

  1. OSM buildings are cleaned with ST_Buffer(..., 0) to avoid failures in overlap detection due to invalid geometries
  2. Overlap detection with 40m ST_DWithin "buffer" for highways and railways and without buffer for buildings. Operation is done per source data cell.
  3. ST_Union with 1m ST_Buffer for non-overlaps constructs outline of each subnet.
  4. Subnets are merged using another ST_Union when touching cell edges. Subnets not touching edges are passed on as is
  5. Components of the original dataset that fall within subnet outlines are calculated
  6. Tails connecting outside of a subnet are calculated and split so that at least 40m buffer-zone is retained. Some connecting ways would be rather long to draw manually if this step would not be done (which was the original plan)
  7. Networks can be analyzed/dumped from directly using the tables created by the previous steps, during dumping 0.2m or 0.4m max simplification is applied to geometries.

Postgis DB operations related to auto-validation:

  1. Compare old vs new subnet sets to find out if a whole subnet is now complete. Also detect if tails were left dangling (the mapper failed extend them as per post import process steps), in that case don't mark the subnet as validated.
  2. Compare old vs new subnet sets to find out if a subnet is not changed in any way
  3. Rest of the subnets are known to be changed
  4. Perform state changes and possible subnet split/merges to OSMTM tables

The toolset is available here:

The OSM dataset tables are based on hstore enabled osm2pgsql DB. It is kept up to date using hourly diffs. Initially the update/validation process cannot run hourly because the subnet calculation time exceeds one hour. It's unknown how the import process itself affects the run-time. Nevertheless, the complex geometry unions are likely getting significantly faster when less subnets will remain available as the import progresses.

Data Transformation Results

Check the resulting data in subnets through the (new) coordination tool: (old coordination tool was at

Data Merge Workflow

Team Approach

The generation of subnet .osm is automated. The subnets themselves are distributed to the team handling the import using OSMTM based tool which also bookkeeps the process.


See: Fi:Maastotietokanta/Road_Import_Stage1_Workflow


Trivial no-overlap criteria with buffers (40m) is used to find areas which do not overlap with existing OSM data. No other conflation, automated nor manual is necessary in this stage.


  • Revert of an import changeset should be trivial because no overlap condition
  • Some improvements to quality of the NLS dataset were done prior to calculations
  • Auto-validation tool allows checking if some of the process items were performed or not (e.g., tails extented or not as per post import process steps)