Proposal:Add languages: tags for name rendering

From OpenStreetMap Wiki
Jump to navigation Jump to search
Add languages: tags for name rendering
Proposal status: Proposed (under way)
Proposed by: Aseigo
Tagging: languages:presentation_order:self=<List of ISO 639 language codes>
Applies to:
Definition: Introduce new official and preferred language tags to be added to administrative boundaries which would define how renderers should select and display correctly localized names.
Statistics:

Draft started: 2024-11-13
RFC start: 2024/11/13

This is the second revision of this proposal. Changes from the first revision include:

  • The name of the proposed tag changed to `languages:presentation_order`
  • Dropped the `preferred` tag for more-local tag ordering; this use case is instead covered in the proposed rendering algorithm.
  • Added the `languages:presentation_order:self` tag to cover an edge case discovered in the first rond of discussion
  • Added clarification on user preferences in applications, as they are not affected by this proposal, but that was not clear to many reviewers.
  • Added a `boundary=aboriginal_lands` along with `boundary=administrative` as features to consider in language metadata discovers.
  • Various clarifications and simplifications made in response to the first round of discussion.

Problem Statement

OSM stores the "default" name of a feature using the `name` tag. This is sufficient when a feature has one name in a single language, but fails in other cases such as in multilingual communities.

However, there is currently no mechanism for an OSM renderer to determine which `name:<language-code>` tags should be shown, or in which order. This causes OSM editors to record multiple official names for the same location in the `name` tag, and this causes a variety of problems including:

  • inconsistent data across OSM, such as differences in separator characters used between different language names (e.g. '/' vs '-')
  • multi-lingual `name` entries do not always match the `addr:streetname` tags, especially when the latter are mono-lingual
  • sometimes the content of the `name` tag is a mashup of differnet language names that do not actually exist
  • this causes poor results in visual and audio renderings as there is no mechanism provided for a renderer to understand which language(s) are in a `name` tag
  • it is error-prone and labour-intensive to update regional data to reflect evolving regulatory standards, as it requires editing all affected `name` tags to change the default rendering of the data, causing the OSM dataset to lag behind real world usage even when the correct data exists in the relevant `name:<language code>` tags.
  • it creates disputes among contributors over the content of the default `name` tag, with discussions often driven by regional language politics and local editing customs, rather than focused on usability or data quality
  • there is no (good) way to detect missing localized names by editing software

Current OSM Data: Use Cases For Multi-Lingual Naming

Currently, there are various approaches to addressing these challenges, each with their own challenges and weaknesses. What follows is a selection of examples of how multi-lingual naming occurs today in OSM. This is not an exhaustive list of all places with multi-lingual names, but a representative cross-section of them.

 NOTE: If a use case is not represented in the examples that follow, please leave a comment on the forum thread describing the use case along with a real-world example of it.

Consistent Multilingualism: Biel/Bienne, Switzerland and Brussels, Belgium

Biel/Bienne is a completely bilingual city, with most residents speaking both French and German fluently, and conversations often held with each participant speaking their preferred language.

All street names, and the city itself, have both official French and German names which are full translations into each language, e.g. "Rue du Moulin" in French and "Mühlestrasse " in German.

In OSM, both names are included in the `name` field, separately by a `/`: "Rue du Moulin / Mühlestrasse".

The canton of Bern, in which Biel/Bienne is found, is also officially bilingual, though with more German than French, while the country of Switzerland as a whole has two additional official languages: Italian and Romansh.

Brussels is also a bilingual city, with French and Dutch being the two official languages. Belgium itself has three official languages, however: French, Dutch, and German. Similar to Biel/Bienne, mappers in Brussels combine both official language names into the `name` tag, but use `-` as a separator rather than a `/`.

The administrative boundary of the bilingual territory contains default_language=fr - nl and multilingual names always follow the same convention.

So while both cities are internally consistent, they are different from their enclosing region (canton and country, respectively) and render differently from each other on the same map due to inclusion of the literal separator character.

  Use-case Requirement: It must be possible to define consistent language policies for regions on the map so they adhere to local policy and custom. At the same time, these policies should not overspecify choices such as the separator character as it leads to inconsistencies.

Different Policies In The Same Area: Winnipeg, Canada

Winnipeg is a predominantly English-speaking city, but there is a historical French quarter of the city and there are French Canadian speakers living in Winnipeg. Due to this, street names in some areas of the city are noted in both French and English, while in much of the city street names are English-only.

In OpenStreetmap, multilingual names in Winnipeg are recorded in the `name` field simultaneously in both French and English following the same chimera pattern seen on the physical street signs, e.g. "Avenue Nerville Avenue" or "Rue John Street".

In actuality, these streets do not exist: "Avenue Nerville" (French) does, as does "Nerville Avenue" (English), but "Avenue Nerville Avenue" is a chimera that is not used in that form by e.g. Canada Post, not the City of Winnipeg, nor residents when talking to each other. In the rest of Winnipeg, English-only (and, though rarer, French-only) names are also recorded in the `name` tag.

This is a very common pattern seen around the world in ethnic or language enclaves within larger settlements.

  Use-case Requirement: It must be possible to note language policies in a specific area which are different from enclosing regions. Such policies can vary from neighbourhood to neighbourhood, but are expected by users to be consistent within these areas.

Language Quilts: South Africa and Papua New Guinea

The country of South Africa has 11 official languages, though one is nationally dominant and the others are used in specific regions of the country. In these regions, names in the regional language are used and should also be reflected on maps.

This is an extremely common pattern in countries with large numbers of local languages, including those with very large numbers of languages such as Papua New Guinea which has some 840 languages in use within its borders, often a dozen or more in any given administrative region, though it is common for names or signage to be in only one or two.

However, there is currently little reflection of this variety of language in the OSM data set, and getting movement in that direction is made more difficult by the need to edit any and all relevant `name` tags manually. OSM editing software currently has no insight into the language requirements of the different regions, and so can not offer useful guidance to editors either.

  Use-case Requirement: It must be convenient to note distinct language policies within a given region (e.g. country). This language information should also be available to OSM editing software to allow guidance to be offered to editors.

Policy Changes: Haida Gwaii, Canada

In recent years, some location names in the archipelago of Haida Gwaii off the north-west coast of British Columbia, Canada, have been officially changed from English colonial names to their Haida language names. In 2024 the British Columbia government officially recognized the Haida nation's land title throughout the archipelago, suggesting this naming trend will almost certainly continue.

To track this policy change, edits need to be made manually in the OSM dataset by altering the `name=` tag, even though the `name:hai` tag exists and is in use. This is a significant amount of work, can not easily be automated (if at all), and is error prone.

This is not as rare as one might expect either, as tracking Gaelic language areas in Ireland or areas in Wales adopting a stronger representation of the Welsch language in recent years can attest to.

  Use-case Requirement: A way to respond to language policy changes would be beneficial. They may not happen often, but when they do they are often extremely important to the people who live there and should be reflected in their maps quickly and with as little opportunity for error as possible.

Outliers and Unique Cases: Vitoria-Gasteiz

While far less common than regional policies, there are locations which need special language policies which are different from their enclosing region, or features within their boundaries.

An example of this edge case is the Basque capital of Vitoria-Gasteiz which has a different language ordering for features within its boundary, but which also from its own enclosing administrative boundary. Other examples are memorials or specific destinations which for historical or origin reasons have distinct language requirements.

  Use-case Requirement: While uncommon, it must be possible to define language preferences specific to an individual feature.

Weaknesses Of The Current Use of `name` Tags

The root of the problems around relying only on `name` for default name display in multiple languages is that the data in the `name` tag is expected by renderers to already be localized, but there is no information in the OSM database as to how that localization was arrived at. This loss of information leads to the inconsistent results currently seen in OSM maps.

Localization is better left to the renderer which can make dynamic choices at runtime. However, to do so the OSM database must be able to identify what the correct set of localized names to use are, and the definitions of "correct set" must be left to the local OSM editors (as it currently is).

The OSM schema already provides suitable language-specific data in `name:<language code>` tags, but there is not enough information in the OSM schema for renderers to know how to use those entries, and so they fall back to using `name` in most cases.

 NOTE: This proposal does not seek to codify the content of the `name` tag, as it is orthogonal to the goals of this proposal, and already receives extensive discussion elsewhere, often without resolution. This proposal instead focuses on novel approaches to handling the presentation of multilingual names in a way that maps to use cases that exist in the real world.

Proposal

By treating OSM as a database rather than a completed map, a fully backwards-compatible solution can be achieved with minimal changes to existing OSM data.

NEW: `languages:presentation_order` tag

The `languages:presentation_order` tag contains a semicolon-separated lists of ISO 639 language codes. It would be applied to administrative boundaries, and be applied during rendering to all features within that border.

For the purpose of language presentation order, administrative boundaries are considered to be ways tagged with either `boundary=administrative` or `boundary=aboriginal_lands`. In future, other tags could be added if widely used and identified as being a form of tag relevant to language metadata, though the list should be kept as small as possible to keep complexity low.

This tag distinguishes endonyms from exonyms among a feature's`name:<language code>` tags. This allows both determining that e.g. French names are local in a French-speaking town, even when there are `name:language code` tags in other languages.

It also defines the locally prefered order for rendering multi-lingual names, so that renderers may select from the appropriate `name:<language code>` tags (if they exist) to render the names fully.

NEW: `languages:presentation_order:self` tag

The `languages:presentation_order:self` tag contains a semicolon-separated lists of ISO 639 language codes.

This tag covers the edge case where an administrative boundary has a different language selection than the features within it. In that case, the `languages:presentation_order` would still be defined, but the `langauges:presentation_order:self` tag would be used to render the administrative boundary's name.

 NOTE Features (e.g. restaurants) whose official names happen to be in a language differing from the language(s) commonly used elsewhere in the region should still have this official name appear in the `name` tag. While it is possible to use `langauges:presentation_order:self` even in these cases, it is not necessary in cases where there is no other translated name in common use.

User Preferences

End-user preferences, such as a locale or language set in a mobile mapping application, are not affected by this proposal. This proposal only affects default rendering before any application-specific user preferences are applied.

Defined on Administrative Boundary Features

These new `languages:*` tags would be added to features with the`boundry=administrative` tag to encode regional language standards.

Map features would inherit their language settings from the nearest enclosing administrative boundary.

For `languages:*` tags that are not present in the nearest enclosing `boudary=administrative` way, renderers will recursively consult the next enclosing administrative boundary until both a `languages:presentation_order` tag is found, or until the country administrative boundary is reached.

This allows e.g. official languages to be defined at the country level, while regions within the country can define their own language preferences where they differ.

Special Case: Defined on a Feature

A feature that is not an administrative boundary may include the `languages:presentation_order` tag if, and only if, that specific feature has a name in a language that is different from the from adminsitrative boudnary's, and which is also the "correct" name for it (either due to official decree or common local usage).

This sometimes happens in the case of monuments, streets with historical names still in use, or specific geographic features. This is uncommon, however, and nearly all features which are not administrative boundaries should not have `languages:*` tags.

Use By Renderers

A renderer would display all `name:<language code>` tags which match languages in the `languages:presentation_order` tag, respecting the order the languages appear in the tag.

Any languages in the `languages:presentation_order` list which do not have a matching `name:<language code>` entry on the feature would be skipped. Example: rendering a shop with only a `name:fr` name when applying `languages:presentation_order:de;fr` would use the `name:fr` tag.

If no matching `name:<language code>` tags are found, a renderer would default to the the `name` tag.

The renderer is free choose an appropriate separator character, audio language pack, etc. to render the names.

Use By Editing Software

Editing software should fetch the list of presentation languages to use as guides for those editing data. Instead of only the `name` tag, users can be prompted to fill in the official and preferred languages for a given feature.

Editors would also be be able to detect missing language data, allowing tools such as StreetComplete and Vespucci to pro-actively prompt users to add missing localized names.

Benefits

  • Allows renderers rather than OSM contributors to define how multi-language names are displayed to the user, including visual separators, which localized audio to use, etc.
  • Improved renderer accuracy and routing instructions as the language(s) would be explicitly defined
  • Allow for changes in policy to be immediately reflected in rendered maps, without having to go through entire datasets and edit each `name` tag individually
  • Editing software could prompt users for localized names
  • Respects regional policies and preferences, with minimal data duplication
  • Prevents well-meaning contributors working around the lack of proper rendering hints by putting chimeras in the `name` tag
  • It may open ways to improve searching by address, as `addr:street` tags could more reliably matched to street `name:<language code>` tags, particularly in areas where `addr:street` tags are not multi-lingual

Name Resolution Algorithm

  • If the feature has a `languages:presentation_order:self` tag, use that tag to define language selection and rendering order.
  • Otherwise, find the regional `languages:presentation_order`:
    • Use the first `languages:presentation_order` tag set in the nearest enclosing administrative border, up to the country administrative boundary.
  • If one or more of the languages defined in the `languages:presentation_order` tag are set on the feature, render those languages in the order defined in the tag.
    • If the languages defined in the `languages:presentation_order` tag are not set, reference the next nearest enclosing administrative border, up to the country adminitrative boundary feature, until a `languages:presentation_order` tag that can be rendered is found.
  • If no applicable `languages:presentation_order` tag is found, then render the contents of the `name` tag. This ensures something is always rendered, and provides backwards-compatibility.

Practical Examples

Biel/Bienne, Bern, Switzerland

The Switzerland administrative boundary may contain this `languages:presentation_order` tag:

 languages:presentation_order=de;fr;it;rm

While the bilingual canton of Bern might set it to:

 languages:presentation_order=de;fr

And the city of Biel/Bienne may swap that ordering to match current OSM rendering:

 languages:presentation_order=fr;de

As a result, the name of the country of Switzerland would be shown in all four languages (as it currently does), while locations within the canton of Bern would show German followed by French (where such names exist), while in Biel/Bienne that order would be reversed to show French first (as it currently does).

If a `name:it` tag is provided for a place in Biel/Bienne, it would not be rendered by default, even though it is an official language defined on the Switzerland administrative boundary.

Currently, the street names in OSM within Biel/Bienne are French/German in order, while the actual street signage in the city is German/French. If at some point the local mappers wanted to mimic on-the-ground signage, fixing this would be a non-trivial edit using the current approach of multiple names in the `name` field, while it would be a single edit of the `languages:presentation_order` field under this proposal.

Haida Gwaii, British Columbia, Canada

The Canada administrative boundary may define:

 languages:presentation_order=en;fr

This would allow consistent combined English/French name rendering wherever both exist. Naturally, Quebec would reverse this order:

 languages:presentation_order=fr;en

While an administrative region covering Haida Gwaii may define:

 languages:presentation_order=hai

This would result in the Haida language name being used when rendering features within Haida Gwaii, gracefully falling back to English and French names where it is missing due to the `languages:presentation_order` tag on the Canada administrative boundary.

Open Questions

  • Would renderers be willing to adopt this approach? It is a moot exercise without renderer buy-in. After discussion, it appears there are at least few blockers to adoption in some renderers. But before these issues can be tackled, the tags must be available for them to use and, of course, someone needs to do the work of adapting the renderers. However, it seems there is appetite for this among those working on renderers. Most importantly, a standardized approach to recording the needed language metadata in the OSM dataset is required as a precondition.

Features/Pages affected

Multilingual_names

External discussions

Comments

Please comment on the forum thread.