User:Nebulon42/Multilingual names

From OpenStreetMap Wiki
Jump to: navigation, search

What it is

The term multilingual names refers to a name which is available in more than one language and where these languages are spoken by a significant amount of people that live at places where this name is used. Such multilingual names often occur in regions where language regions overlap. Often a minority lives in those regions, which can be either officially recognized or not. The status of multilingualism can be disputed so the naming in some way or another can cause emotional reactions and generally should be handled with delicacy.

Multilingual names can refer to

  • place names (city, town, village, places) including street names and addresses
  • administrative entities such as countries, states, municipalities
  • natural features (especially when occurring at borders) such as peaks, rivers and streams

What it is not

Multilingual names does not mean that a name is simply known in other languages without significant usage by the people who live where this name is used. E.g. Wien is known in English and Italian as Vienna, in Hungarian as Bećs, in Spanish as Viena and in Slovenian as Dunaj. But since officially only German is spoken there this does not qualify as a multilingual name.

Status Quo in OSM

The status quo in OSM is currently incoherent and differs from region to region. Below are some examples in Europe.

Carinthia, Austria

Slovenian minority, officially recognized. State of multilingual names and signage was long disputed, but was settled a few years ago.

Names in name:de and name:sl, name:de/name:sl in name tag, e.g. Bad Eisenkappel/Železna Kapla (https://www.openstreetmap.org/node/240110468)

Street names in German

Admin entities in German with name:sl tag

One river multilingual, e.g. Vellach/Bela (https://www.openstreetmap.org/way/122506913), name: de/sl, name:sl, name:de missing, most probably no consensus on this

Peaks at the border to Slovenia: name:de / name:sl e.g. Steinberg / Kamnati vrh (https://www.openstreetmap.org/node/474853269)

Brussels, Belgium

The metropolitan area of Brussels is officially a multilingual area - in Dutch and French.

Names in name:fr, name:nl with name:fr - name:nl or vice versa, e.g. Bruxelles - Brussel (https://www.openstreetmap.org/node/1635651356)

Street names: e.g. Rue des Bouchers - Beenhouwersstraat (https://www.openstreetmap.org/way/8511528)

Train stations: e.g. Gare Centrale - Centraal Station (https://www.openstreetmap.org/node/142700522)

Rivers: e.g. Senne - Zenne (https://www.openstreetmap.org/way/14840503)

Admin entities: e.g. Région de Bruxelles-Capitale - Brussels Hoofdstedelijk Gewest (https://www.openstreetmap.org/relation/54094)

Sorbian Area, Germany

Officially recognized as minority, multilingual names in the Sorbian area are/have been still disputed in OSM e.g. http://forum.openstreetmap.org/viewtopic.php?id=54294.

Names in name:de, name:hsb with name:de - name:hsb in name tag, e.g. Bautzen - Budyšin (https://www.openstreetmap.org/node/30361883)

Admin entities in German with name:hsb e.g. https://www.openstreetmap.org/relation/1309674

Train stations in German or bilingual e.g. Bautzen / Budyšin (https://www.openstreetmap.org/node/2796772650)

Rivers in German

Street names in German

South Tyrol, Italy

South Tyrol is an autonomous region in northern Italy with a German speaking majority. Additionally Italian and Ladin language is spoken. This is the most complete multilingual region I was able to find (maybe except for Brussels).

Names in name:de and name:it, name:de - name:it or name:it - name:de (according to majority of speakers) in name tag, e.g. Bruneck - Brunico (https://www.openstreetmap.org/node/64777001)

The same for street names: e.g. Stadtgasse - via Centrale (https://www.openstreetmap.org/way/271519135)

The same for peaks: e.g. Kronplatz - Plan de Corones (https://www.openstreetmap.org/node/283551177)

Train stations: e.g. Bruneck - Brunico (https://www.openstreetmap.org/way/132203851)

Rivers: e.g. Rienz - Rienza (https://www.openstreetmap.org/way/159043040)

Admin entities: e.g. Bruneck - Brunico (https://www.openstreetmap.org/relation/47317)

Names in name:de, name:it and name:lld, name:lld - name:de - name:it in name tag, e.g. Al Plan de Mareo - St. Vigil in Enneberg - San Vigilio di Marebbe (https://www.openstreetmap.org/node/64777128)

Southern Slovakia

Mostly Hungarian minority, sometimes even majority, officially recognized when a certain percentage threshold is crossed.

Names in name:sk and name:hu, official Slovakian name in name tag, e.g. Komárno (https://www.openstreetmap.org/node/26037382)

Street names in Slovakian

Admin entities in Slovakian

River Danube with name:hu / name:sk, e.g. Duna / Dunaj (https://www.openstreetmap.org/way/30453748)

Slovenia

Italian minority in Primorska region, officially recognized.

Names in name:sl and name:it, name:sl / name:it in name tag, e.g. Koper / Capodistria (https://www.openstreetmap.org/node/1640456909)

Some street names are multilingual, name:sl / name:it, e.g. Čevljarska ulica / Calegaria (https://www.openstreetmap.org/way/372046282)

Admin entities have name:sl and name:it, name:sl / name:it in name tag, e.g. Koper / Capodistria (https://www.openstreetmap.org/relation/541958)

Peaks at the border to Italy: name:it - name:sl, e.g. Piccolo Mangart - Rateški Mali Mangart (https://www.openstreetmap.org/node/2525474492)

Peaks at the border to Austria: name:de / name:sl e.g. Steinberg / Kamnati vrh (https://www.openstreetmap.org/node/474853269)

Switzerland

Switzerland has four official languages: German, French, Italian and Romansh. Despite that there are not many multilingual names (in the name tag) to be found. At the border of language regions the names tend to change without multilingual notations in the name tag. Multilingual variants are present in the name:* tags. Some exceptions:

Train station Sierre/Siders (https://www.openstreetmap.org/node/2859798310), official name?

Lake Lac de Morat / Murtensee (https://www.openstreetmap.org/relation/398830)

Canton Fribourg - Freiburg (https://www.openstreetmap.org/relation/1698314)

Drawbacks of Status Quo or What should be solved by this work

  • Naming inconsistencies (e.g. - vs. / as separator)
  • Names tend to get (too) long
  • No control for the renderer like e.g. displaying names side by side or on top of each other or choosing the separation character

Thus this work is a pure technical or rendering-oriented change that seeks to make multilingual names more accessible to renderers and not to (significantly) change mapping practices. At the same time multilingual names might get more accessible to other data consumers such as Nominatim too. But this work is mainly targeted at renderers.

What cannot be solved by this work

  • disputes on whether objects should get multilingual names or not
  • disputes on the order of languages (there is always a perceived order if it is left to right or top to bottom)
  • renaming or additional names in the name tag to make it possible for non-native speakers to understand (mostly for non-latin alphabets) e.g. https://www.openstreetmap.org/relation/1473947, which are not multilingual names with regard to the definition above

Possible solutions

Region matching

Objects within this region could be treated as multilingual names. Languages are specified on the region. Drawback is that it is not possible to specify which type of objects should be included and which shouldn't. Also matching objects within another object is a quite expensive operation and difficult to do on-the-fly.

Tag on the object

An additional tag on the object can both serve as indicator that the names should be treated as multilingual names and also specify the languages from which the multilingual name is composed of. In this way objects that don't have this tag are left unchanged and the name tag can be still used as fallback.

The name of the tag is not that important and not yet thought through. It could be something like name:multilingual or similar. A multilingual name consisting of German and Slovene would then be specified with name:multilingual="de;sl". A data consumer should then examine the contents of name:de and name:sl and present them accordingly. Note that the sequence may be perceived as ordered list (see "What cannot be solved by this work").

This concept could theoretically also be extended beyond name tags. E.g. addr:street:multilingual="de;sl" would reference addr:street:de and addr:street:sl.

Possible implementation for renderers

Data

Preprocessing

This approach would process the data when importing them to the rendering database. Through a Lua tag transform name:multilingual would be evaluated (i.e. splitted into parts) and the respective name tags would be placed into name1, name2, name3 or similar. CartoCSS would then evaluate if name1, name2, name3 are empty and if not would render them accordingly.

On-the-fly

This would require to select columns in the SQL based on the value of another column (or in the case of HSTORE a tag). name:multilingual would need to be splitted into parts and then the parts would need to be used to assemble the correct name columns.

Splitting can be achieved in PostgreSQL by using split_part, regexp_split_to_array or regexp_split_to_table, see https://www.postgresql.org/docs/current/static/functions-string.html for details.

We will need PL/pgSQL and HSTORE for that:

CREATE OR REPLACE FUNCTION name(_h hstore, _code text) RETURNS text AS
$body$
BEGIN return _h->('name:' || _code); END $body$ LANGUAGE plpgsql;
SELECT
  name,
  name(tags, split_part(tags->'name:multilingual',';', 1)) AS name1,
  name(tags, split_part(tags->'name:multilingual',';', 2)) AS name2,
  name(tags, split_part(tags->'name:multilingual',';', 3)) AS name3
FROM planet_osm_point
WHERE osm_id = 1234;

Why name1, name2, name3? First, the various name:* variants have to be mapped to defined names and second I thought that 3 names is the maximum that is feasible to display. Most multilingual names will be bilingual.

Rendering

Point or Area Features

Multiple names can be displayed either side by side with a separation character or (preferred) on top of each other. The relevant CartoCSS is:

text-name: "[name]";
[name1 != null] {
  text-name: [name1];

  [name2 != null] {
    text-name: [name1] + '\n' + [name2];
[name3 != null] { text-name: [name1] + '\n' + [name2] + '\n' + [name3]; } } }
Line Features

Multiplie names can be displayed either side by side with a separation character or (preferred) alternating when repeated.

Proof of Concept

before after
Multilingual names Proof of Concept - before.png
Multilingual names Proof of Concept - after.png
Here you see the rendering of the name tag with both the German and

the Slovene variant separated by a /.

Here you see the rendering of name:de and name:sl on top of each other.

The place node has an additional name:multilingual="de;sl" tag.