Import/Catalogue/Central place name register import (Norway)

From OpenStreetMap Wiki
Jump to: navigation, search

Introduction

Description (no): http://www.kartverket.no/Kart/Stedsnavn/Sentralt-stadnamnregister-SSR/

For information about other imports from Kartverket, see No:Kartverket import (page in Norwegian only).

Source information

Type of license: CC-BY 4.0 http://kartverket.no/Kart/Kartverksted/Lisens/
Data download site: http://www.kartverket.no/Kart/Kartdata/Stedsnavndata/
Permission to use the data: Kartverket has confirmed that the data can be integrated into OSM. A copy of the e-mail is archived here (in Norwegian only): [1].
OSM attribution: http://wiki.openstreetmap.org/wiki/Contributors#Kartverket_.28Norwegian_Mapping_Authority.29
ODbL Compliance verified: yes
Data format and schema documentation:: http://www.kartverket.no/Documents/Standard/SOSI-standarden%20del%201%20og%202/SOSI%20standarden/SOSI%20standarden%204.3/SOSIStedsnavn_4_3_20111005.pdf

The data set can also be accessed through a web service, for example:
LOD factsheet: http://faktaark.statkart.no/SSRFakta/faktaarkfraobjektid?enhet=314537 (uses ssrId, not ssrObjId)
XML LOD factsheet: http://faktaark.statkart.no/SSRFakta/faktaarkfraobjektid?enhet=314537&format=xml

Data File description

The data is currently available in GeoJSON and SOSI (Norwegian geostandard) format. Conversion to .osm is possible with ogr2ogr, and may be required for the JOSM approach.

SOSI

The data format documentation linked above applies to this format.

Navneenhet (object) contains single name units. Name, municipality, name_type, language, SSR_ID, SSR_OBJID, geometry.

Skrivemåte (spelling) contains multipe permitted alternative names. Name, municipality, SSR_ID, geometry.

SSRForekomst (occurence) contains the multiple occurences in different map products. Name, name_type, map_product, SSR_ID, geometry

The data requires diffrent imports and tagging routines based on name_types.

GeoJSON

GeoJSON data is provided as a single file for the entire country with more than 1M entries. Its structure and properties are not documented by Kartverket, but properties can be be inferred from similarity to SOSI property names. User:huftis is working on a conversion to .osm to allow conflation with JOSM. The API tools can access GeoJSON directly.

The GeoJSON data set is a flat list of point geometries with the following properties attached:

name description
enh_ssr_id spelling id
enh_ssrobj_id object id (one object may have multiple spellings)
enh_snspraak language of the name: Norwegian bokmål and nynorsk share a code, Sami, Finnish, North Sami, Lulesami
enh_navntype type of object (long list of codes and translation table below)
enh_snavn name for the object without abbreviations
enh_snmynd naming authority for the object ("SK" = Statens Kartverk)
enh_sntystat status code for the object: main name, side name, sub-name.
enh_komm municipality where the object lies
skr_snskrstat status code for the spelling: accepted, declined, appealed, private, international
skr_sndato date when spelling status (skr_snskrstat) was last updated
for_regdato date when the occurrence was first registered in the database
for_sist_endret_dt last updated date for something (fixme: find out what, field is not in SOSI documentation)
for_snavn place name used in occurrence
for_kartid reference/id to the map product of the occurrence
kpr_tekst title/name of the map product of the occurrence
nty_gruppenr the id of the group/category where the spelling/occurence belongs
kom_fylkesnr the number of the county where the object belongs, useful for splitting data files

Trimming and partitioning the data set

Invalid, historic, rejected names within the data set

The data set contains place name spellings and occurrences from various map products, which are not necessarily the official correct name. If the property skr_snskrstat[2] contains any of the following, the entry should be ignored:

  • U (uvurdert): the spelling has not been evaluated
  • A (avslått): the spelling has been evaluated, and has been rejected.
  • F (foreslått): the spelling is a proposal which has not yet been decided
  • K (vedtak påklaget): the spelling has been evaluated and accepted but a complaint has been filed; final decision pending
  • I (internasjonalt): the spelling concerns objects outside Norwegian territory, and are not subject to the provisions in stadnamnlova (Norwegian Place Name Act)
  • H (historisk): the spelling concerns an object that has changed name or has been deleted

The remaining entries have the following codes:

  • V (vedtatt): the spelling has been evaluated and accepted, and is required for official use
  • S (samlevedtak): a part of the spelling has been evaluated and accepted as part of some "batch" decision, like "-sæter"->"-seter"
  • G (godkjent): the spelling was used in official context before 1991-07-01
  • P (privat): the spelling has been decided by a private entity and not by the authorities

Elements covered by other data sets

Highway and road names, including names for ferry routes are to be imported from Elveg [3] where they already are associated with geometry. This means entries with name type (navntype) of 140 Address name (road/street), 145 Ferry route or 240 Other roads must be skipped in the SSR data set.

Import plan

The following type of import could be possible:

  • Recurring import
  • Manual one-time association of OSM objects with names, IDs and metadata from the place name register.
  • Possibly automatic tracking of updates in the place name register.
  • Semi-manual conflation approach using JOSM or API access (see tools below)

Element Tags

key value (example) description
name = Peskanuten main accepted and recommended name from dataset
alt_name = second accepted and recommended name from dataset, if more than one exists
source:name += Sentralt stadnamnregister, Kartverket
no-kartverket-ssr:url = http://faktaark.statkart.no/SSRFakta/faktaarkfraobjektid?enhet=314537 URL to factsheet (use spelling ID)
no-kartverket-ssr:objid = 313225 Object ID
no-kartverket-ssr:date = YYYY-MM-DD The last modified date for purpose of finding out-of date items by comparing SSR and OSM (fixme: this must be last modified date of the spellings status field, please verify)

In addition, each created or updated element should be present the proper tags that reflect its type. The dataset contains two numbers enh_navntype and nty_gruppenr which identify the type of object which the spelling/occurence refers to. These must be mapped into OSM-appropriet tag=value combinations.

  • User:huftis is working on a translation table, to be inserted here.

Multiple names

One object can have more than one name, or the same name can have multiple spelling variants. In these cases the following tags are used:

Where there are multiple names or spelling variants, the merge should be done manually. name=* should be set either to the name/variant that is "recommended" (NO: anbefalt) in the central place name register or to the name/variant that is preferred locally.

If there are more than two names, alt_name=* should be used with semicolon ; as separator.

Example of an object with three names/variants [4]:

Names in multiple languages

The place name register contains about 2000 objects with name in Kven language and about 25000 objects with name in Sami. For objects with names in more than one language, the following tags are used:

name=* (without language code) should always be set. name=* should be set to the Norwegian name, unless the object is situated within samisk forvaltningsområde (in which case it should be set to the Sami name). Samisk forvaltningsområde consists of the following municipalities [5] (numbers from [6]):

  • 2021 Kárášjohka/Karasjok
  • 2011 Guovdageaidnu/Kautokeino
  • 2027 Unjárga/Nesseby
  • 2020 Porsáŋgu/Porsanger/Porsanki
  • 2025 Deatnu/Tana
  • 1940 Gáivuotna/Kåfjord
  • 1920 Loabák/Lavangen
  • 1850 Divtasvuodna/Tysfjord
  • 1739 Raarvihke/Røyrvik
  • 1736 Snåase/Snåsa.

Example of an object (outside samisk forvaltningsområde) with one official name in Norwegian and two in Northern Sami [7]:

If the obejct has name(s) in one language only, use name=* without language code.

Changeset Tags

As elements are to be conflated one-by-one, a changeset should be restricted to changes affecting one ssr/osm element combination.

The following tags SHOULD BE used in the changeset:

source:name=Sentralt stadnamnregister, Kartverket 
no-kartverket-ssr:url=http://url-to-factsheet

Any approach should reflect the following checklist:

  • Imports must create or update the element tags specified above
  • Imports should not modify existing geometries (placement of nodes), this is primarily a name import. Node geometries may be created from the coordinates in the dataset.
  • ...

Names may be used in one or several OSM elements - only the original element should present the no-kartverket-ssr:objid tag. Derived place names (e.g. the bus stop in front of the place) are not currently considered in this import.

In the case of a name conflict with existing data, the official name in the place name register should be used. Norwegian signage is known to be highly erroneous and riddled with dialects, one of the reasons the place name register was created in the first place. See No:Map_Features#Spelling_of_street_names

Import workflows

JOSM

...

API

User:relet has some utility scripts that import and LOD-enable elements at https://github.com/relet/ssr and has manually done some test imports in Buskerud kommune.

The utility loads an entire SSR GeoJSON file (optionally skips to a certain point), and imports the SSR elements in order:

  • Names in the general area of the element are compared using Levenshtein distance, and the best match presented
  • The exact location of the element is scanned for tags matching the element type (e.g. natural=water for lakes)
  • Previous SSR imports are identified by their no-kartverket-ssr (and earlier deprecated) tags
  • Links to the exact location in Openstreetmap, ID editor, and beta.norgeskart.no are presented for comparison and/or manual edits.

The user is then presented with a choice:

  • To create an element of the proper type if there is none present [1].
  • To extend the metadata of one of the suggested elements, or another element that can be manually entered
  • To add a note to the location, asking for conflict resolution.

[1] Ways and areas are created as a short stub or triangle. It is therefore recommended to trace the element first, then extend the metadata of the newly created element. If there is no aerial coverage of the location, areas and ways should not be imported.

Team Approach

  • The data set covers the whole country
  • Elements have to be manually checked.
  • Elements may not be ready for import, where no corresponding geometry can be created in OSM. (Way or area elements require a secondary source if they are not present in OSM already.)

Hence, we will require an amount of teamwork, and the initial conflation will be a work in progress. Subsequent updates may be done by an automated process, or a smaller group of people. We will track progress on this site.

People interested: relet, huftis, flumsen, [insert name here]

The SSR dataset is currently split by fylke, further subdivisions are possible. See below for the import status.

QA

The original dataset is the official place name register of Norway. The approved spelling in this list must be used in all official context by the authorities. We can therefore assume a high data quality in the approved names.

The no-kartverket-ssr:objid tag allows to trace changes in the official register, and reflect them in the map.

Some common mistakes (tag usage) can be corrected by the API tools.

Import status

Relevant mailing list discussions
NUUG: http://www.mail-archive.com/kart%40nuug.no/msg00959.html
OSM Import: http://thread.gmane.org/gmane.comp.gis.openstreetmap.imports/1698/

No regular import yet. Known partial imports include contributions from users grekvard, grekvard_import (fully automated), relet (semi-manual). The metadata in these imports has to be updated to conform to the final tag specifications above.

  • 01 Østfold - 0% - 22k names
  • 02 Akershus - 0% - 28k names
  • 03 Oslo - 0% - 4k names - good secondary coverage
  • 04 Hedmark - 0% - 62k names
  • 05 Oppland - 0% - 56k names
  • 06 Buskerud - 0% - 41k names - good secondary coverage, some test imports named above (ca 10%, to be retagged)
  • 07 Vestfold - 0% - 15k names
  • 08 Telemark - 0% - 61k names
  • 09 Aust-Agder - 0% - 60k names
  • 10 Vest-Agder - 0% - 57k names
  • 11 Rogaland - 0% - 35k names
  • 12 Hordaland - 0% - 70k names
  • 14 Sogn og Fjordane - 0% - 56k names
  • 15 Møre og Romsdal - 0% - 68k names
  • 16 Sør-Trøndelag - 0% - 82k names
  • 17 Nord-Trøndelag - 0% - 66k names
  • 18 Nordland - 0% - 119k names
  • 19 Troms - 0% - 62k names
  • 20 Finnmark - 0% - 53k names
  • 21 Svalbard - 0% - ~70 names
  • 22 Jan Mayen - 0% - 4 names

Total: 0% - ca 900k names.

Keeping the data up to date

...

Links