Proposal:Add languages: tags for name rendering

From OpenStreetMap Wiki
Jump to navigation Jump to search
Add languages: tags for name rendering
Proposal status: Proposed (under way)
Proposed by: Aseigo
Tagging: languages:official=*, languages:preferred=*
Applies to:
Definition: Introduce new official and preferred language tags to be added to administrative boundaries which would define how renderers should select and display correctly localized names.
Statistics:

Draft started: 2024-11-13
RFC start: 2024/11/13


Problem Statement

OSM stores the "default" or "native" name of a feature using the `name` tag. This is sufficient when a feature has one name in a single language, but fails in other cases such as in multilingual communities.

However, there is currently no mechanism for an OSM renderer to determine which `name:<language-code>` tags should be shown in which order. This causes editors to record multiple official names for the same location in the `name` tag, and this causes a variety of problems.

Current OSM Data

Currently, there are various approaches to addressing these challenges, and there is little consistency between the approaches. For example:

Winnipeg, Canada

A predominantly English-speaking city, there is a historical French quarter of the city and French Canadian speakers still live in Winnipeg. Due to this, some street names are recorded in the `name` field simultaneously in both French and English following the chimera pattern seen on street signs, e.g. "Avenue Nerville Avenue" or "Rue John Road".

In actuality, these streets do not exist! "Avenue Nerville" (French) does, as does "Nerville Avenue" (English), but nobody uses "Avenue Nerville Avenue": not Canada Post, not the City of Winnipeg, nor residents when talking to each other.

Making it even more inconsistent, not all streets in Winnipeg have French names. In this case, only English names are recorded in the `name` tag.

Biel/Bienne, Switzerland

Biel/Bienne is a highly bilingual city, with most residents speaking both French and German fluently, and conversations often held with each participant speaking their preferred language.

All street names, and the city itself, have both official French and German names which are full translations into each language, e.g. "Rue du Moulin" in French and "Mühlestrasse " in German.

In OSM, both names are included in the `name` field, separately by a `/`: "Rue du Moulin / Mühlestrasse".

The canton of Bern in which Biel/Bienne is found is also officially bilingual, though with more German than French, while the country of Switzerland as a whole has two additional official languages: Italian and Romansh.

Brussels, Belgium

Brussels is also a bilingual city, with French and Dutch being the two official languages. Belgium itself has three official languages, however: French, Dutch, and German.

Similar to Biel/Bienne, mappers in Brussels combine both official language names into the `name` tag, but use `-` as a separator rather than a `/`.

The administrative boundary of the bilingual territory contains default_language=fr - nl and multilingual names always follow the same convention.

Haida Gwaii, Canada

In recent years, some location names in the archipelago of Haida Gwaii off the north-west coast of British Columbia, Canada, have been officially changed from English colonial names to their Haida language names. In 2024 the British Columbia government officially recognized the Haida nation's land title throughout the archipelago, suggesting this naming trend will almost certainly continue.

Currently, these changes need to be made manually in the OSM dataset by altering the `name=` tag, even though the `name:hai` tag exists and is in use.

South Africa

The country of South Africa has eleven official languages, though one is nationally dominant and the others are used in specific regions of the country. In these regions, names in the regional language are used and should also be reflected on maps.

However, there is currently little reflection of this in the OSM data set.

Weaknesses Of The Current Use of `name` Tags

  • inconsistent data across OSM, such as differences in separator characters used
  • multiple names are merged into the `name` tag, e.g. "Rue Broyère - Broyèrestraat", even when the same localized names exist in `name:<language code>` tags
  • it leads to falsehoods in the dataset, such as how there is no "Avenue Neville Avenue" in Winnipeg, Canada
  • the multi-lingual `name` entries do not always match the `addr:streetname` tags, especially when the latter are mono-lingual
  • this causes poor results in visual and audio renderings as there is no mechanism provided for a renderer to understand which language(s) are in a `name` tag
  • it is error-prone and labour-intensive to update regional data to reflect evolving regulatory standards, as it requires editing all affected `name` tags to change the default rendering of the data, causing the OSM dataset to lag behind real world usage even when the correct data exists in the relevant `name:<language code>` tags.
  • it creates disputes among contributors over the content of the default `name` tag, with discussions often driven by regional language politics and local editing customs, rather than focused on usability or data quality
  • there is no (good) way to detect missing localized names by editing software

The root of these problems is that data in the `name` tag is expected to be localized with no information as how that localization was done. This loss of information leads to the inconsistent and confusing results.

Localization is better left to the renderer, which should be able to use the OSM database to identify and use the correct set of localized names.

The OSM schema already provides suitable language-specific data in `name:<language code>` tags, but there is not enough information in the OSM schema for renderers to know how to use those entries.

Proposal

By treating OSM as a database rather than a map, a fully backwards-compatible solution can be achieved with minimal changes to existing OSM data.

NEW: `languages:official` tag

The `languages:official` tag contains a semicolon-separated lists of ISO 639 language codes.

This tag distinguishes endonyms from exonyms among a feature's`name:<language code>` tags.

NEW: `languages:preferred` tag

The `languages:preferred` tag contains a semicolon-separated list of ISO 639 language codes.

This tag defines which `name:<language code>`tags are preferred when rendering, including their order of appearance.

Defined on Administrative Boundary Features

These new `languages:*` tags would be added to features with the`boundry=administrative` tag to encode regional language standards.

Map features would inherit their language settings from the nearest enclosing administrative boundary.

For `languages:*` tags that are not present in the nearest enclosing `boudary=administrative` way, renderers will recursively consult the next enclosing administrative boundary until both a `languages:official` and `languages:preferred` tag is found, or until the country administrative boundary is reached.

This allows, for example, official languages to be defined at the country level, while regions within the country can define their own official and/or mandatory languages only where it differs.

Special Case: Defined on a Feature

A feature may include the `languages:*` tags if, and only if, that specific feature has a name in a language that is different from the preferred or official languages which is also the "correct" name for it, either due to official decree or common local usage.

This sometimes happens in the case of monuments, streets with historical names still in modern use despite language changes, or specific geographic features. This is uncommon, however, and nearly all features which are not administrative boundaries should not have `languages:*` tags.

Name Resolution Algorithm

The algorithm used to resolve which name tags to be used is captured in the following flow chart.

Language tag resolution flowchart

Use By Renderers

A renderer would display all `name:<language code>` tags which match languages in the `languages:preferred` tag, respecting the order they appear in the tag.

If no such tags are found, a renderer would instead display all `name:<language code>` tags which match those in the `languages:official` tag, respecting the order they appear in that tag.

Any languages in the preferred and/or official language lists which do not have a matching `name:<language code>` entry on the feature being rendered would be skipped.

If no matching `name:<language code>` tags are found, a renderer would default to the current default of using the `name` tag.

The renderer is free choose an appropriate separator character, audio language pack, etc. to render the names.

Use By Editing Software

Editing software should fetch the list of preferred *and* official names to use as guides for those editing data. Instead of only the `name` tag, users can be prompted to fill in the official and preferred languages for a given feature.

Editors would also be be able to detect missing language data, allowing tools such as StreetComplete and Vespucci to pro-actively prompt users to add missing localized names.

Benefits

  • Allows renderers rather than OSM contributors to define how multi-language names are displayed to the user, including visual separators, which localized audio to use, etc.
  • Improved renderer accuracy and routing instructions as the language(s) would be explicitly defined
  • Allow for changes in policy to be immediately reflected in rendered maps, without having to go through entire datasets and edit each `name` tag individually
  • Editing software could prompt users for localized names
  • Respects regional policies and preferences, with minimal data duplication
  • Prevents well-meaning contributors working around the lack of proper rendering hints by putting chimeras in the `name` tag
  • It may open ways to improve searching by address, as `addr:street` tags could more reliably matched to street `name:<language code>` tags, particularly in areas where `addr:street` tags are not multi-lingual.

Practical Examples

Biel/Bienne, Bern, Switzerland

The Switzerland administrative boundary may contain this `languages:official` tag:

 languages:official=de;fr;it

While the bilingual canton of Bern might set it to:

 languages:official=de;fr


And the city of Biel/Bienne may swap that ordering to match current OSM rendering:

 languages:preferred=fr;de

As a result, the name of the country of Switzerland would be shown in all four languages (as it currently does), while locations within the canton of Bern would show German followed by French (where such names exist), while in Biel/Bienne that order would be reversed to show French first (as it currently does).

If a `name:it` tag is provided for a place in Biel/Bienne, it would not be rendered by default, even though it is an official language defined on the Switzerland administrative boundary.

Update: it has come to our attention that the names in OSM are French/German, while the actual street signage in the city is German/French. So in this case, the OSM data is "wrong" if the intent is to mimic on-the-ground signage. Fixing this would be a non-trivial edit using the current approach of multiple names in the `name` field, while it would be a single edit of the `languages:preferred` field under this proposal.

Haida Gwaii, British Columbia, Canada

The Canada administrative boundary may define:

 languages:official=en;fr

This would allow consistent combined English/French name rendering wherever both exist. Naturally, Quebec would reverse this order:

 languages:preferred=fr;en

While an administrative region covering Haida Gwaii may define:

 languages:preferred=hai

This would result in the Haida language name being used when rendering features within Haida Gwaii, gracefully falling back to English and French names where it is missing due to the `languages:official` tag on the Canada administrative boundary.

Open Questions

  • Would renderer authors be willing to adopt this approach? It is a moot exercise without renderer buy-in.
  • Are there real-world edge cases that are not able to be modeled with the `languages:official` and `languages:preferred` tags?
  • There are places where the administrative boundary's name differs from the language ordering of the contents within it! While an edge case, this is an actual issue. The example provided in the discussion was Vitoria-Gasteiz. Possible solutions to this edge-case are being discussed in the proposal. Any conclusions from that will appear as amendments to the proposal.
  • Some have found the name of the tag `languages:official` to be problematic as it implies an "official-ness" for which there may be dissent over in the actual location. Discussion is ongoing about whether there may be better names for the tag that avoid this issue entirely. If such a thing exists is uncertain, and suggestions are welcome in the discussion thread.

Features/Pages affected

Multilingual_names

External discussions

Comments

Please comment on the discussion page.