Talk:Multilingual names

From OpenStreetMap Wiki
Jump to navigation Jump to search

Jordan Street Names

Currently, there is no established naming convention for naming streets in Jordan. The majority of the naming scheme currently being followed in Jordan is:

  • name=Latin-Lettered (International) Name
  • name:ar=Arabic Lettering

Kaart does ground surveys all over the world and we recently returned from our ground survey in Jordan. Before we go ahead and follow this naming scheme, we'd like to get a community consensus on the naming scheme and to make it official on this page.
Are there any thoughts from the community on the matter?

Transliteration or/and translation?

I'm not sure whether transliteration is better than translation. Usage is very tricky. I've 2 of the rare recent paper city maps of Ulaanbaatar in hands: on both maps, "Ulaanbaatar" is written like this, i.e. according to the ISO 9 transliteration from Cyrillic, and this is the usual spelling inside Mongolia for foreigners. But for instance the French embassy still uses "Oulan-Bator", the traditional French transliteration system (according to French pronunciation of "ou"), which is much better known among French living in France than "Ulaabaatar". Airlines often use "Ulan Bator", the traditional English transliteration, and Russians write "Улан-Батор", a Russian phonetical spelling due to the fact that, when the name was given to the town, Cyrillic was not yet used in Mongolia.
This is not all. Mongolian has been written in... 10 different scripts during history ! and 2 of them are still in everyday use : Cyrillic in (independent) Mongolia and Mongol-Uighur script in Inner Mongolia and, as a cultural thing (taught to all pupils in junior high schools) in Mongolia too. The transliteration of the Mongol-Uighur spelling would be "Ulaganbagator", which is never used. The translation is "Red-Hero", which is also not used.
This is still not all. On my maps, made for tourists (locals nearly never use maps), the main street is written "Peace avenue": a translation. The 2 main boulevards are named "Baga toyruu" and "Ih toyruu" in one map, "Baga toiruu" and "Ikh toiruu" in the other one, non-ISO 9 transliterations (the meanings are "small boulevard" and "big boulevard"). And this is indeed the way foreigners are used to talk: "Peace avenue", "Ih toiruu" etc.. On the same map, a secondary street is named, in both maps, "Zaluuchuud avenue": the combination of a transliteration (without the grammatical case) and of a translation. Both maps have a "Seoul street", where "Seoul" is written according to the usual English/French spelling of that town, not as a transliteration from the Mongolian spelling, which would be "Sôùl" (ISO 9). The 2 maps diverge for the name of the main bridge: "Enkhtaivany bridge" (non-ISO 9 transliteration + translation) on one, "Peace bridge" on the other map.
The reasons why the ISO 9 standard is not always followed is that it has not even been translated into Mongolian, and, for some letters, leads to an English pronunciation too far from the Mongolian one, while an obvious non-standard transliteration is phonetically better. According to ISO 9, the transliteration of тойруу should be "tojruu", though nothing sounds like an English "j" in the Mongolian word. A problem in using ISO transliteration standards is that they are copyrighted (and expensive) so that we cannot provide them freely on OpenStreetMap, and we cannot expect that all contributors would buy them.
The first question is: "Is the usage of foreigners living in the place so important that it should be followed?". I'd say no. Should we follow the local cartographers usage? I'm not sure it's always a good idea, and these usages sometimes differ from one cartographer to the other. Should we follow the Post office usage, since the Universal postal union makes it compulsory to write addresses in latin characters for international mails. The U.P.U. doesn't say if it has to be a transliteration (and which one) or a translation, and local post offices might accept several solutions. I propose that be always provided:

  • the original name in the original language,
  • a translation into English (if the name is not English and has a meaning in the present local language),
  • a transliteration into latin characters (if the original script is not latin) according to systematic rules among those usually accepted locally. By "systematic", I don't mean universal. For instance, the transliteration rules of Cyrillic could be different for Ukrainian and for Mongolian.
  • the traditional English name if there is one.

The reason for translation is that it's culturally interesting and sometimes gives geographical information. For instance, the county of Erdenet city (a name meaning "Precious", because it's a mining town) is called "Bayan-Ôndôr", which means "Rich-height", while the central municipality is called "Uurhaičid", meaning "Miners", which is even clearer. My proposal implies that the number of local names could be multiplied by up to 3. If there are 2 different local names with different meanings in non-latin scripts, this makes 6 fields. It should then be clear which field is the translation/transliteration of which one. So "name:en" is not sufficent at all in this case, because it doesn't say if it's a transliteration, a translation or a tradition, and doesn't say, for a translation or transliteration, of what it is the transliteration or the transliteration. "name:en" should only be used when English is indeed (one of) the local language(s), or if there is a proper traditional English name. I'd suggest "translation:mn/en" and "transliteration:mn-Cyrl/Latn" (The hyphen cannot be used between 2 languages because it enters in dialects codes, such as "es-AR" for Argentinian spanish, or for scripts, as "mn-Cyrl" for Mongolian Cyrillic). We also need a way to specify the local usual language(s) for each big zones. And we may also need a way to say, for each place, which of the translation, the transliteration and the traditional name is most used. For Ulaanbaatar, we could have:

  • name:mn-Cyrl=Улаанбаатар
  • translation:mn/en=Red-Hero
  • transliteration:mn-Cyrl/Latn=Ulaanbaatar
  • name:en=Ulan Bator

and optionaly:

  • name:mn-Mong=(Sorry, I've not Mongolian-Uighur script keyboard to write this. See here )
  • transliteration:mn-Mong/Latn=Ulaganbagator
  • name:fr=Oulan-Bator
  • translation:mn/fr=Héros-Rouge
  • name:ru=Улан-Батор

etc. I prefer calling "Ulan Bator" and "Oulan-Bator" proper English and French names rather than (phonetical) transcriptions (that could be specified as "transcription:mn/en" and "transcription:mn/fr"), because it's not the result of a transcription system still in use nowdays.

For Lyons:

  • name:fr=Lyon
  • name:en=Lyons

"Lyon" has no meaning in present French language, so need no translation.
--http://solages.site.voila.fr/index_en.html 17:41, 10 March 2009 (UTC)

Wales-centric (Greece mentioned)

This seems to be only about Wales. Could it be changed to be as generic as possible? Bruce89 21:02, 26 April 2007 (BST)

I agree. It would be nice to have the article extended a bit to include other examples. For greece for instance the name= tag should use greek spelling and so on. Drawing the line on most frequently language used locally can be hard sometimes I guess though. Karlskoging1 21:41, 26 April 2007 (BST)
In Athens, I saw a "mess" of Greece and English street names in the name tag. This is a big problem, because you can't find street names when searching. So, there something should be done about it. I agree that the name tag should be in Greek, and name:en should be used for the English transliteration, which is on the street signs. --Willem1 19:22, 1 March 2009 (UTC)
I think a solution is to ask each renderer, asking if they can combine two tags in the 'displaying name' of streets and other object names. I've asked at mapnik-users today about it.

Missing ISO639-1 language codes

I just discovered that there are no ISO639-1 language codes for at least two of the minority languages used in sweden. I suggest that in such cases we use the ISO639-2 language codes instead. Karlskoging1 22:05, 26 April 2007 (BST)

I am already using the ISO 639-2 language code for Old Norse (non) for some Scottish Islands. Bruce89 22:31, 26 April 2007 (BST)

information

I'm a bit concerned this doesn't preserve all needed information.

If we have, say,

  • name=[name in Welsh]
  • name:en=[name in English],

or indeed, if we have

  • name=[name in English]
  • name:cy=[name in Welsh]

then we lose the information as to which language the default name is in. If I'm rendering a map of the UK in English, I can easily pull out name:en before name, so this isn't a terrible problem. However, if you were specifying complex rules as to language preference order, this might be a problem, especially when rendering large areas where several different languages are likely to occupy the 'name' thing: imagine that I wanted to show all places with Welsh names with that Welsh name, but to show English names in preference to Gaelic ones? On the Welsh nodes i'd want to pick out "name:cy", "name", then finally "name:en", but on Gaelic nodes I'd want to pick out "name:cy", "name:en", "name". If we never had a bare "name" field this wouldn't be a problem. Morwen 12:35, 27 April 2007 (BST)

This is a fair point, but ATM renderers ignore all name:code=* tags, hence name=* containing the default language. I'm not sure if it would be possible to tell a worldwide renderer to use the local languages if they weren't in the name=* tag. Bruce89 13:23, 27 April 2007 (BST)
Well, you could have a defaultlanguage=* tag, or declare that if you have a "name" you also need a "name:code". I can think of several other ways of doing this, with varying complexity. Morwen 14:09, 27 April 2007 (BST)

A way to keep the all information is very needed.--http://solages.site.voila.fr/index_en.html 17:54, 10 March 2009 (UTC)

I tend to use lang=* - probably from a HTML analogy. --tms13 17:15, 13 November 2009 (UTC)

Street name

Surely this should be moved to Bilingual names, and the text changed accordingly. This doesn't only apply to street names. Bruce89 16:31, 8 May 2007 (BST)

Rendering names where there are two

Has there been any progress about rendering multilingual street and place names? In Brussels all name tags are now getting something like "Bruxelles - Brussel", since it's no option to show just one of them.

If I can do a suggestion, I'd like to see either something like "name=Bruxelles;Brussel" where the renderer chooses an appropriate method for displaying both (like each on a separate line for place names, seperated by a dash in a long street). We can have multiple tags for each key separated by a semicolon, so why not make use of that then? The other option is to just have no name key at all and a new tag like "display_languages=fr;nl", but I like the former more. --Eimai 21:37, 3 January 2008 (UTC)

That would be really useful indeed. --Moyogo 16:29, 4 January 2008 (UTC)
Perhaps it would be best that the name tag is the default local language, if someone wants to render a map in English they can start with name:en=* tags and fallback to name=*, equally, if someone wants to produce a hybrid English/French map they can combine the name:fr and name:en tags into the rendered name... Changing the values purely for rendering such as this is dirty imho... dkt 11:00, 9 December 2008 (UTC)
Well the name tag should hold the default local language name yes. But I think the question was about situations where there *is* no default local language. The streets have two different equally valid names.
So looking in Brussels at the moment this tram stop for example has been set up with the correct values in 'name:fr' and 'name:nl' but the value in the main 'name' tag should (some might argue) use a Semi-colon value separator to indicate that it has two different names.
Anyone developing a renderer would then have to decide what to do about that (e.g. swap in a hyphen instead). I believe technically it would be easy to do that for Mapnik renderers e.g. it could be added as a new feature of osm2pgsql. Other rendering systems and mobile apps everywhere could make a similar change, and then we could slowly transition to a more technically correct set up of all the name tags in places like this
...and the end result would be no difference (at best, although anything not transitioned would show an ugly ';' in there) All in all, quite a lot of faffing around just to satisfy some tagging pedantry. I can see why this hasn't happened yet! :-)
-- Harry Wood 12:59, 18 March 2012 (UTC)

Accented/non-accented?

Is there really a point in making a difference between accented and non-accented names? One major highway in Brazil is named Rodovia Governador Mário Covas, how should this be tagged in other languages? Is name:en Rodovia Governador Mário Covas, Rodovia Governador Mario Covas, Governor Mario Covas Highway or Governador Mário Covas Highway?

Spain

Is "ga" the correct ISO code for Galego / Galician? "ethnologue" suggests "gl" or "glg", with "ga" being Gaelic / Irish. (SomeoneElse 20:28, 27 January 2009 (UTC))

Breizh - Brittany - Bretagne

The system indicated for Wales does not work in France for bilingual streets, in french and in breton language (Brezhoneg [1]) (br). Many towns and villages in the west of Brittany have bilingual street names.

If you do this : name:fr=[name in French] and name:br=[name in Breton], then we loose each name !

Today, the only way is to indicate the bilingual form into the same field : name=[name in French] - [name in Breton], which is not satisfactory. Do you have any other way to suggest ? thanks! --Rimael 19:47, 2 August 2009 (UTC)

For China

(Discussion moved from the main page Ouleyang 08:21, 11 August 2010 (BST))

The way that names have been tagged in Hangzhou and Shanghai so far is as follows...
name=<Chinese>
name:zh=<Chinese>
name:en=<English>
name:zh_py=<Chinese pinyin (toneless)>
name:zh_pyt=<Chinese pinyin (with tones)>

An example of the given methodology follows...
name=朝阳公园南路
name:zh=朝阳公园南路
name:en=Chaoyang Park South Road
name:zh_py=Chaoyanggongyuan Nanlu
name:zh_pyt=Cháoyánggōngyuán Nánlù

This gives us the ability to render maps useful to as many users as possible, default rendering would use Chinese, which is good from the point of view that most of the population in China reads Chinese, but it would be easy to render maps with English and pinyin with tones as well for particular uses... For someone who reads Chinese, they'd only need the Chinese, for someone who doesn't, the English could make the map more understandable, if they want to try to communicate a name to someone in Chinese, having pinyin would be very helpful, if they want to be understood, having pinyin with tones would be very important...

Dtucny 01:00, 15 October 2007 (BST)

zh_py and zh_pyt is defined not any standard, we should not use it. Furthermore, zh_py could be generated automatically from zh_pyt. I propose zh zh-Hans and zh-Hant. For places in mainland China, zh=zh-Hans, otherwise zh=zh-Hant. (comment added by Python eggs)

Thanks for the comment Python eggs... As far as I'm aware, no standard language code exists for Pinyin, with tones or without... There has been discussion about creating language codes for it (in August 2008), but I'm not aware that this has been put into a standard as yet and the discussion didn't even venture into use of tones, but, when that happens, we can batch convert the tags if needed... Converting from pinyin including tones to pinyin without tones is possible, as you have said, though I think there is benefit to having both as options at data entry, it is definitely more difficult to add pinyin including tones than without and I would expect that not everyone would want to, but, where is it missing, someone else could easily see that and add it, whereas with a single pinyin field that could include tones or not, that would be less easy to spot.

Your point about use of zh-Hans and zh-Hant have definite merit though and I have used them before. The ideal combination of names that could be captured would be Simplified Chinese, Traditional Chinese, Pinyin, Cantonese Pinyin and English as there is no way to automatically convert between any of these without some pretty large lookup tables and even then, it wouldn't likely get everything correct...

So, for now, I stand by the proposal above, but, do admit there is room for improvement especially regarding handling of Simplified and Traditional Chinese... dkt 12:08, 21 April 2009 (UTC)

There is also an issue of parts of China, such as Autonomous provinces of Xinjiang and Tibet where the first official language is not written in Chinese script e.g. Uyghar or Tibetan. Here the name tag should probably have at least Simplified Chinese and Uighar names.We may also need the latin script version of these language scripts, such as Tibetan Pinyin (tb_py), although there are 5 version of this in Uyghur, and a cyrillic version too, since previously Russian governed Uyghur areas use this. Of course, since Uyghur is written right to left, by it will appear written first if included on the right hand side of a bi/tri lingual name tag. There are also often a range of older 'international names' for places in these areas. jamesks 31 July 2009 (UTC)

Tibetan : name:bo
Uyghur : name:ug
Manchu : name:mnc
Mongolian : name:mn
Russian : name:ru

name=<Chinese
name:zh=<Simplified Chinese>
int_name=<latin script names>
name:en=<English>
name:ru=<Russian> name:zh_py=<Chinese pinyin (toneless)>
name:ug=<Uyghur>
name:ug_??=<latin Uyghur>

Moved this from the main page

Ff5722 (talk) 21:26, 24 November 2016 (UTC)

Tags for the Pinyin romanisation in Chinese languages

Some important objections:

  • "zy-py" (or "zh_py" which is equivalent in BCP 47 syntax) is conforming and valid, but it does not encode what you expect: it would parse as "zh" (the Chinese macrolanguage) as spoken in the region with code "py" (i.e. Paraguay both in the IANA registry for BCP47 language subtag, and in ISO 3166-1s !). Don't use it!
  • "zy-pyt" (or "zh_pyt" which is equivalent in the BCP 47 syntax) is conforming but not valid: it would parse as "zh" (the Chinese macrolanguage) as spoken in the region with the extlang "pyt" subtab (extlang subtags are deprecated, all of them in BCP47 have been aliased to ISO 639-3 language codes, but there's no such "pyt" language tag in the IANA registry, and not even in ISO 639-2 or ISO639-3 where they originate), so it is clearly invalid, even if its syntax is conforming. It may be assigned later to some unrelated language (e.g. some language in Papua-New Guinea). Don't use it!
  • "zh_pinyin" or "zh-pinyin" are equivalent syntaxes in BCP 47 (which does not differentiate underscores and hyphens in codes; however we use hyphens preferably to underscores for all language codes (as well as lowercase only, when BCP47 consider language tags to be case insensitive: we opted only for lowercase language tags).
  • Replacing "zh_pinyin" or "zh-pinyin" by "zh-latn_pinyin" does not change the syntax: it just inserts the script code "latn" at an appropriate place (but not needed for Pinyin which is implicitly based on Latin)
  • the "-pinyin" or "_pinyin" part is the only question to ask: under PBP47 it parses as a "language variant subtab" which should then be listed as a "language variant subtag" in the IANA registry of BCP 47 language subtags (meant for example to exhibit different orthographic conventions, or some relgional dialectal differences).
  • However the romanization system (variant of a script, and not variant of a language) does not fit here. BCP 47 uses another scheme for script variants.
  • So let's lookl in the IANA registry of language subtags to see if (and how) "pinyin" (not to be confused with the "Pinyin" language (pny) spoken in the Niger-Congo region is registered, we see this:
    %%
    Type: variant
    Subtag: pinyin
    Description: Pinyin romanization
    Added: 2008-10-14
    Prefix: zh-Latn
    Prefix: bo-Latn
  • So effectively "pinyin is registered as as language variant. Then we have look at the registered "prefixes" (combination of leading subtags) where these variants are declared valid
  • So yes "pinin can only appear after "zh-latn" or "bo-latn". So the tags "zh-latn-pinyin" is not just "more standard", it is in fact "standard" and recommended. Why is the script code "latn" required in the prefix ? because Pinyin is not valid for any other script than Latin.
So yes the ONLY correct and standard tag for the Piniyin romanization of Chinese is "zh-latn-pinyin" (and there's no alias "zh-pinyin" registered in the IANA registry to make it equivalent to "zh-latn-pinyin").
What is strange is the limitation to just "zh" and "bo", and not "cmn" (Mandarin), "yue" (Cantonese), and other languages included in the "zh" macrolanguage ! But let's see how "zh" is declared in the IANA registry:
%%
Type: language
Subtag: zh
Description: Chinese
Added: 2005-10-16
Scope: macrolanguage
Yes it is effectively declared with a scope of "macrolanguage". Let's now look at entries for Mandarin (cmn) or Cantonese (yue):
%%
Type: language
Subtag: yue
Description: Yue Chinese
Description: Cantonese
Added: 2009-07-29
Macrolanguage: zh
What this means is that any tag that is valid for the macro language "zh" is also valid to "yue". This includes variant tags using "zh" in their prefix. So "cmn-latn-pinyin" is also valid, as well as "yue-latn-pinyin".
IANA has currently registered 14 isolated languages codes as part of the "zh" macrolanguage (and that can then all use the "-latn-pinyin" extension): cdo, cjy, cmn, cpx, czh, czo, gan, hak, hsn, lzh, mnp, nan, wuu, yue; it has also registered 14 "extlang" subtangs (appended after "zh"): cdo, cjy, cmn, cpx, czh, czo, gan, hak, hsn, lzh, mnp, nan, wuu, yue (so "zh-cmn-latn-pinyin" is also valid and standard), but these extlang are all registered as aliases (with a "preferred-value" mapped to the individual languages without the "zh" macrolanguage prefix.
It should be noted that the default script for "nan" (the Min Nan language) is Latin (not any ideographic Hani/Hans/Hant script code) and "nan-latn" may seem superfluous, but "nan" still does not specify "Latn" as its default script. So Min nan has multiple scripts ("Latn", "Hans" and Hant" are common) and the "nan" code alone does not specify it: "nan-latn" is then distinct from "nan" alone, but it does not specify the romanization system in use: it is assumed that this is the standard/common orthography of Min Nan in the Latin script, but this orthography is different from the Pinyin romanization system (in fact made for rewriting the "Hans" script for to Latin, but not intended to reproduce the modern Latin orthography on Min Nan, that does NOT follow the Pinyin system). So "nan", "nan-latn" and "nan-latn-pinyin" are all different. But "zh-nan" and "nan" are both valid and equivalent (the later one being preferred); "zh-nan-latn" and "nan-latn" are both valid and equivalent (the later one being preferred); "zh-nan-latn-pinyin" and "nan-latn-pinyin" are both valid and equivalent (the later one being preferred). As a rule in OSM, we use only the preferred tags, so the Min Nan language written in Pinyin can only be "nan-latn-pinyin", while Min Nan written in its usual latin orthography (or in its oral form) is just "nan".
In summary, yes, we should not use any underscore (our own rule in all language tags), and should insert "latn": all "zh_pinyin" tags in OSM have to be replaced by "zh-latn-pinyin" if we want strict conformance with BCP47! But rewritting it only as "zh-pinyin" is very acceptable too (conforming to the BCP47 standard syntax, and correctly parsed as a language variant, but only not registered for the "zh" prefix) (notably because Pinyin conveys a "reading" which is not very accurate for any other Chinese language than Mandarin (cmn) and because Pinyin in Chinese is necessarily (implicitly) targetting the Latin script and only from the "simplified" variant of the Han script (Hans).
Note that some Chinese languages are also written with the Arabic script, such as Uighur (but it is not considered part of the Chinese macrolanguage), that has separate romanization systems NOT based on Pinyin.
Note also that Pinyin is a complete non sense for Classical Chinese (lzh) which cannot be correctly written in the simplified variant (Pinyin does not work with the traditional variant "Hant" of the Han script, just as it does not work with the Kanji variant (Hani) used in Japanese or the Hanzu variant used in Korean... Pinyin also does not work correctly with the Wu language (code wuu), or Cantonese (code yue), or traditional Min Nan that are only correctly written in the tradtional Han variant (Hant). So forcing the insertion of "latn" in tags using "pinyin" is in fact absolutely not necessary.
So All we really have to use then is to replace underscores by hyphens in language tags (and also deprecate the use of region subtags such as "zh-tw" or "zh-cn", which should be rewritten as "zh-hant" and "zh-hans").
But note also that "zh" alone is also an alias (its registration in the database says that, when the **full** tag is "zh" it is equivalent to "cmn": "zh" alone in a language tag without any extension subtags means Mandarin and not any other language of the Chinese macrolanguage.
And completely replace immediately the invalid tags such as "zy-py" or "zh-pyt".

Verdy_p (talk) 21:06, 24 November 2016 (UTC)

Mongolian names

I have found most Mongolian names to be given in three writing systems? Which one is most commonly used by the Mongolian minority in China? Ff5722 (talk) 21:33, 24 November 2016 (UTC)

For Mongolian written in Mongolia, the Cyrillic script is standard and the most widely used. The traditional mongolian script is used only in arts or religion. Few people in Mongolia can decipher it now.

In China, it was written administratively using only Han sinograms (in their simplified form), but now the traditional Mongolian script is accepted too (it has two forms: horizontal or vertical; the vertical top-to-bottom form of the Mongolian script, with rows ordered left-to-right, is traditionnal, the horizontal left-to-right form one is simply rotating rows of text 90 degrees anticlockwise, but reordering rows top to bottom: both are used but the veritcal presentation is used in artistic products or books, and the horizontal one used in modern applications and predominating in web browsers and mobile applications (due to limited or inexistant support in browsers for the vertical layout) and glyphs in fonts for the Mongolian script are also prerotated 90 degrees for the horizontal layout.

Mongolian people in China prefer using their traditional script which is simpler to learn and use than Han sinograms. But many of them can also read Mongolian written in Cyrillic from medias published in Mongolia (or Far-Eastern Russia). Very few use romanizations (except with the very unfriendly Pinyin input method for writing Mongolian with Han sinograms which are anyway too approximative: Cyrillic is far better than Pinyin romanization for this language). Note that past centuries, Mongolian as also been romanized with the Turkic/Altaic Latin alphabet, or with the Arabic script (in areas in contact with Uyghurs), or with Brahmic scripts (Buddhists generally use Devanagari or the old Brahmic script).

For out maps, I think that in the Mongolian Republic there's no doubt it should be written in Cyrillic (and this likely never change to Latin). In Chinese Inner Mongolia the choice is between Han sinograms (even if they are not perfect), or very rarely Cyrillic, but some communities in China (and universities) are reviving the old traditional Mongolian script (but with some adaptations for today's dialect that has borrowed many Chinese Mandarin terms (so the script coexists with pure Han characters, generally in their simplified form).

In fact China has now a more active support than Mongolia for the traditional Mongolian script, just because there's more resources and money to support it in research and educational institutions, than in the poor republic of Mongolia (where litteracy is also very poor compared to China). I'd say that in Mongolia, the old Mongolian script is now almost extinct and will not revive soon like what is happening now in China (after Both Mongolia and China had banned the Traiditonal script during the hardest time of Communism before the recent adoption of market economy: Mongolia still highly depends on exchanges with Russia, more than with China that still limits the interactions of Chinese Mongolian people with foreign countries: the border between China and Mongolia is more opaque than between Mongolia and Russia).

If you search for resources in Mongolian on the web, almost all of them are using Cyrillic (including the Mongolian Wikipedia). And for supporting the oral language itself, Mongolia is definitely the source (in China, it is more difficult to speak and write in Mongolian language than in Mandarin which is mandatory for lot of things in civil and admisistrative life; Chinese Mongolians are first educated in Mandarin and with the Han script, and the Mongolian language is still considered as a "minority dialect" and devalued, except in arts). — Verdy_p (talk) 02:09, 25 November 2016 (UTC)

Rename page or re-organise a little

We've had a section for long time (although the page title moved recently) : Names#Localization which documents how to use 'name' with a language code. By comparison this page carries only a small description followed by lots of country-by-country info. I think we should either

  • Make this page the primary documentation, meaning Names#Localization details would move to here, and that page would just have a short paragraph and a link to here.
  • Renaming this page to something like 'Name tag use by country' ...not a great name, but you see what I mean.

I think the first option makes most sense.

-- Harry Wood 13:17, 18 March 2012 (UTC)

Standard language codes

Some of our language tagging does not follow the standards. Maybe some usage has to be grandfathered in, but I wonder if we can change to following the standards. New language tagging should follow the standards and renderers should be written to understand standard language-code format.

Examples of correct language codes. The names of romanizations are registered as IANA variant subtags – we shouldn’t just make up more codes unless we use an -x- private-use code.

  • bu – Bulgarian
  • bu-Latn – Bulgarian in Latin characters
  • zh – Chinese
  • zh-Latn – Chinese in Latin characters
  • zh-Latn-pinyin – Chinese in pinyin romanization (not zh_pinyin, zh_py, nor zh_pyt)
  • zh-Latin-wadegile – Chinese in Wade–Giles romanization
  • zh-Hans – Simplified Chinese
  • zh-Hant – Traditional Chinese
  • ja – Japanese
  • ja-Latn – Japanese in Latin characters (not ja_rm)
  • ja-Latn-hepburn – Japanese in Hepburn romanization
  • ja-Latn-alalc97 – Japanese in Library of Congress romanization
  • ja-Latn-x-osm – Japanese in Latin characters, according to some private OSM scheme

A few more romanization methods are registered in the Unicode CLDR,[2] and can be used with the t singleton and m0 separator. In this case, an ISO date can be added indicating a version of a standard.

  • mn-Latn – Mongolian in Latin characters
  • en-t-mn – English translated from Mongolian
  • mn-Latn-t-mn-Cyrl – Mongolian transliterated from Cyrillic into Latn
  • und-Latn-t-und-Cyrl – Text transliterated from Cyrillic to Latin (und = undetermined language)
  • ja-Latn-t-ja-Jpan-m0-alaloc – Japanese in Library of Congress romanization (equivalent to the shorter version above)
  • ja-Latn-t-ja-Jpan-m0-alaloc-1949 – Japanese in Library of Congress romanization, 1949 version

CLDR v24 transforms:

  • alaloc – American Library Association-Library of Congress
  • bgn – US Board on Geographic Names
  • buckwalt – Buckwalter Arabic transliteration system
  • din – Deutsches Institut für Normung
  • gost – Euro-Asian Council for Standardization, Metrology and Certification
  • iso – International Organization for Standardization
  • mcst – Korean Ministry of Culture, Sports and Tourism
  • satts – Standard Arabic Technical Transliteration System (SATTS)
  • ungegn – United Nations Group of Experts on Geographical Names

Language and script codes are governed by BCP 47: Tags for Identifying Languages and the IANA Language Subtag Registry. Also relevant might be BCP 47 Extension T: Transformed Content and BCP 47 Extension U (Unicode).

Codes should be kept as short as possible. Michael Z. 2013-11-06 07:28 z

Sardegna Edit War

There is an edit war on what the section about multilingual naming in Sardegna (Italy) should look like.

The subject is currently also discussed on talk-it.

I reverted the paragraph to a state that is compatible with the pre edit-war content, which apparently is what the community on talk-it decided about: “Santo cielo, ancora? Abbiamo discusso per settimane di questa cosa, basta!” (quoting Luca Meloni)

If you think the paragraph needs updating, please do not continue with the edit-war, but discuss here, at talk-it, or wherever, until you reach some kind of consensus with the community. Only after that, the paragraph should be updated.

--Tyr (talk) 19:12, 9 January 2014 (UTC)

This edit was added citing an inexistent and unsourced "co-officiality" for six (!!!) different languages. This formulation is a "unique" invented here, instead of using the rules already established for the Friulian and all other local/regional languages. In addition, this formulation puts a local name in front of ot fhe official, main and common use name, both locally and internationally (ex. Nuoro, comune.Cagliari and Cagliariturismo, comune.Alghero and Algheroturismo). In the community there's stil the same debate with no general consensus. --Drinz (talk) 15:49, 12 January 2014 (UTC)
  1. The local languages are official in Sardinia thanks to italian and regional law.
  2. This formulation is a standard proposed for all the other regions.
  3. The local names are put before the italian ones because majority of population have those local languages as their native one.
  4. In the community there is a general consensus with this standard. Consent is different from unanimity, the last one is simply impossible to obtain. The discussion was closed months ago, it's only because of you if it's now open again.
  5. Don't open a meaningless discussion here. The discussion place is the mailing list.--L2212 (talk) 17:19, 12 January 2014 (UTC)
+1 for the local mailing list as the "best" place to discuss this.
But also: Please note that legal finesses of certain laws is not really the most important thing that counts in OSM. Instead, we have always been mapping what is on the ground – here that would be the term(s) that local people use and what local signposts read.
-- Tyr (talk) 21:22, 12 January 2014 (UTC)

Tamazight language ISO code not supported

Hello, I noticed that the official language of Morocco is not supported when naming places in this country.

Name:ar and Name:fr work fine but Name:zgh should show ⵜⴰⵎⴰⵣⵉⵖⵜ

It does, at least on this wiki:
  • "{{Languagename|zgh}}" returns "ⵜⴰⵎⴰⵣⵉⵖⵜ ⵜⴰⵏⴰⵡⴰⵢⵜ" (with the additional precision currently in French, that this is a Moroccan standard for the new interdialectal language taught now in Morocco), and
  • "{{Languagename|tzm}}" returns "ⵜⴰⵎⴰⵣⵉⵖⵜ" (without the precision, for the "pure" historical Central Tamazigh dialect spanning a region shared between Morocco and Algeria which does not follow the new Moroccan standard and where some linguists disagree with the new choice made by Moroccan autorities of mixing several other Berber languages in "zgh" to create in fact a new "hat" language.
    This decision made in Morrocco is quite similar to the one made in the 17th century by King Louis 14 of France to create "Modern French" (with the foundation of the French Academy) as a new "hat" language mixing several regional "Oil languages" and some Occitan terms, plus many invented terms contructed from Latin and Greek origins, plus later many terms borrowed from various parts of the world, notably from Venetian, Genovese, Prussian, Dutch/Flemish, Norwegian, Norman, but also Arabic and Turkish, and later also from English and Spanish, in Southwestern France: Modern French (fr/fa/fre) initially started like today's "zgh" in Morocco, or like "bs" alias "Bosnian" in Bosnia&Herzegovina that mixes into a large Croatian substrate some Serbian and Albanian, and then operates some simplifications/unifications).
    The French Academy had a very successful result for French, and Academic French (which was perceived only as "Parisian French", became the new national standard to replace Latin and all regional languages of France. But this standard was really adopted only during the Republic with Jules Ferry mandatry school, and after WW1 that mixed millions French people from all regions on the battlefields: they came back to their regions speaking an unified language, and their regional languages really phased out rapidly (including non-Oil languages such as Breton, Basque, Gascon, Occitan, Catalan, Auvergnat, Provençal, Niçard, Ligurian, Corsican, Alsatian, Franconian, Lorrain, and Flemish Dutch), with many of these regional languages being now seriously endangered, notably the other Oil languages (like Norman, Gallo, Poitevin, Angevin, Champenois and Picard/Ch'timi), but also Flemish; the southern regional languages (those from langues d'Oc and Corsican), but also Breton and Alsatian are much more persistant and still not so much endangered: they are resisting due to their current standardization effort that is unifying some of their own dialects and orthographies, notably in Breton and in Alsatian (with other Alemanic dialects in Northern Switzerland and Southwestern Germany). Attempts to save Norman in France have failed multiple times due to lack of resources or political involvement by the two former Norman regions (the only significant efforts being made in Jersey, not in France), this may change now that Normandie is reunified in a single region.
Verdy_p (talk) 02:24, 25 November 2016 (UTC)

Here is the SIL report on the language code: http://www-01.sil.org/iso639-3/documentation.asp?id=zgh

The Ethnologue report: http://www.ethnologue.com/language/zgh

Gagnabil (talk) 02:13, 29 April 2015‎ (UTC)

Not sure what you mean by "not supported". The tags name:zgh and name:ber work fine. Tags are always written in the Roman/UK alphabet; the content of the tag can be written in Tamazight. So, for one example, Ben Slimane Airport is written ⴰⵣⴰⴳⵯⵣ ⵏ ⴱⵏ ⵙⵍⵉⵎⴰⵏ in Berber/Tamazight and tagged correctly. See https://www.openstreetmap.org/node/1042041497
Also, please sign your comments with four tildes.
Johnparis (talk) 12:01, 7 August 2016 (UTC)
About this node, it is tagged with the language code "ber", however this does not designate an isolated language, but a language family (not even a macro language). "ber" should not be used at all (it is currently used everywhere in Maghreb, from Morocco to Lybia but for unrelated languages, including for Algerian Berber which is written most often in the Latin script, based on the French alphabet).
But "tzm" (for the standard modern dialect adopted by Morocco), or "zgh" for the historic/cultural form are perfectly valid (and these should use by default the Tifinagh script, any romanization of these two forms should be appendeed with "-latn", but these romanizations have no agreed standard for their orthography, just like there's no agreed orthography when Berber languages are written with a dominant Arabic script !). — Verdy_p (talk) 02:19, 25 November 2016 (UTC)
Ber is present in ISO Code 639-2 and 639-3 and we use name:ber with name:kab (latin) in Algeria. BoFFire (talk) 17:39, 10 March 2017 (UTC)

Edit warring

Could the people who have been editing this page today stop changing it please and explain why their preferred version is better?--Andrew (talk) 20:36, 3 August 2016 (UTC)

  • The user User:Fayor is editing a topic that's still being discussed in the community (because he reopened the issue) without the approval of the community, and without telling anyone about it. --L2212 (talk) 20:51, 3 August 2016 (UTC)
http://wiki.openstreetmap.org/w/index.php?title=Multilingual_names&action=history suggests that there are two sides to the edit war in the wiki. That edit war was also happening in OSM itself and the participants have both been blocked by the DWG temporarily to provide a "cool down" period. --SomeoneElse (talk) 21:18, 3 August 2016 (UTC)
The edit war is the restart of an old discussion, the one described in the "Sardegna Edit War" section before. The difference is that this time the user Fayor is in the role that was Dritz's.--L2212 (talk) 01:05, 4 August 2016 (UTC)

Iraq

Seems there has been no discussion or project on naming places in Iraq so far, so I'd like to start. For the time being, there's some write-up in WikiProject_Iraq#Place_names_.28proposal_for_handling_multiple_names_and_multiple_languages.29 Øukasz (talk) 18:21, 1 October 2016 (UTC).

Hello, I left a message on the project's talk page--Ghybu (talk) 14:37, 4 October 2016 (UTC)

Suffixes only for proper names?

Mention if the ":en", ":zh" suffixes only apply to proper names, or if even e.g., Key:lamp_mount should use them. If not, then we should always use English, lamp_mount=wall, or can one and/or use lamp_mount=附壁式 or better yet lamp_mount:zh=附壁式 or lamp_mount:zh-tw=附壁式? Jidanni (talk) 16:15, 26 July 2017 (UTC)

Chinese (zh_CN vs. zh_TW vs. zh-Hans vs. zh-Hant ...)

":zh" is not a one-size fits all. Please mention what to do when one needs to distinguish between zh_CN vs. zh_TW vs. zh-Hans vs. zh-Hant etc. Say what is the right way to do it. Jidanni (talk) 16:22, 26 July 2017 (UTC)

This question may not get to anyone who has actually been tagging alternative Chinese spellings, or enough people to comment on what is good practice, here on the wiki. Asking on the tagging@ mailing list or a Chinese language mailing list or forum will get more knowledgeable replies.--Andrew (talk) 17:01, 26 July 2017 (UTC)
Ah, yes. Thanks! https://lists.openstreetmap.org/pipermail/tagging/2017-July/032903.html Jidanni (talk) 12:26, 27 July 2017 (UTC)

Kurdistan Regional Government

In Kurdistan Regional Government which language should be used for main language ("name")? Kurdish or Arabic? For example Erbil: "name=اربیل" (Arabic) or "name=Hewlêr / ھەولێر" (Kurdish)?--Ghybu (talk) 21:26, 5 August 2017 (UTC)

I believe Kurdish rather than Arabic is more appropriate as it is much more often used locally. I'm not sure however if the example "name=Hewlêr / ھەولێر" isn't tagging for the renderer by pasting two scripts in one line. By default, Sorani (ckb) uses Arabic script. Additionally, there are towns and villages where other languages are dominant - but that might be too much granularity? Øukasz (talk) 23:08, 6 August 2017 (UTC)

Germany

  • The Low German language (Plattdeutsch) is also used on official placename signs (de: "Ortsschilder") in Northern Germany. Example: "Buxtehude" - "Buxthu".
    • This should be added as name:nds=Buxthu.
    • Question: Should we add name name:de=Buxtehude and name=Buxtehude - Buxthu? Similar to the description for Sorbian names.
  • The Danish language is also used on official placename signs (de: "Ortsschilder") in South Schleswig. Example: "Flensburg" - "Flensborg".
    • This should be added as name:da=Flensborg.
    • Question: Should we add name name:de=Flensburg and name=Flensburg - Flensborg? Similar to the description for Sorbian names.

Personally I do not like this way of setting the name with a "space dash space" in the middle. It looks too similar to a double name like Henstedt-Ulzburg. I think then name should better be "Buxtehude (Buxthu)".

I think there is a lack in the OpenStreetMap naming definition. International names and local multilingual names are mixed up.

My proposal:

  • there should be a marker for official names in multiple languages. Something like name:official:nds.
  • "name" should always be the first official name used. Which is always standard German in Germany.
    But: In South Tyrol (Italy) the German name "Deutschnofen" comes first on traffic signs, then Italian "Nova Levante". But on the other hand Italian is the major language of the country. So it is necessary that each country has to define its own rules.

The advantage of my proposal is, that then it would be possible for renderers to render names as they like. I would prefer a rendering like this.
______________
  Buxtehude
   (Buxthu)
______________

Hayo (talk) 12:05, 22 October 2017 (UTC)

Slash, space, or spaced hyphen?

This page (and perhaps actual practice) is inconsistent in suggesting:

  • hyphens: name=Vitoria-Gasteiz (Basque Country)
  • slashes: name=L'Alguer/Alghero (New Zealand, Portugal, Sardinia)
  • spaced hyphens: name=Rue du Marché aux Poulets - Kiekenmarkt (Belgium, Spain)
  • spaces: name=干諾道中 Connaught Road Central (Hong Kong)
  • spaced slashes: name=Le Rhin / Rhein (shared boundaries)

Greater consistency would surely be advantageous? Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 13:47, 26 March 2018 (UTC)

Armenia

I was surprised, when visiting the Armenian capital Yerevan, recently, to find all the streets mapped with names in the Armenian script, only, when all the street signs (and the free paper map I was given by my hotel) were in both Armenian and Western script. Andy Mabbett (User:Pigsonthewing); Andy's talk; Andy's edits 12:16, 8 August 2018 (UTC)

Poland

Poland also has officially German as a regional language. The (remaining) official German population seems also to be larger than the cited Belorussian population. Towns with bilingual names which should be included in the names in the map currently are Radłów/Radlau, Cisek/Czissek, Leśnica/Leschnitz, Tarnów Opolski / Tarnau, Chrząstowice/Chronstau, Izbicko/Stubendorf, Dobrodzień/Guttentag, Jemielnica/Himmelwitz, Kolonowskie/Colonnowska, Krzanowice/Kranowitz, Ujazd/Ujest, Biała/Zülz, Zębowice/Zembowitz, Strzeleczki / Klein Strehlitz, Komprachcice/Comprachtschütz, Dobrzeń Wielki / Groß Döbern, Głogówek/Oberglogau and Łubowice/Lubowitz. --User:Stan_Tincon

Hi, i have the same problem i wrote something about it in https://wiki.openstreetmap.org/wiki/Talk:Names#Old_Names and hope on discussion how to resolve this problem. --JSmith (talk) 18:53, 8 March 2020 (UTC)

BCP 47 Extension U for measurement systems

A portion of Interstate 265 near Louisville, Kentucky, has distances signposted in both metric and U.S. customary units (or "English" units, as traffic engineers call them). [3] A single semicolon-delimited distance tag was inadequate for tagging the interchange sequence signs, which sport two distances per destination. [4] I used distance:en-US-u-ms-metric=* and distance:en-US-u-ms-ussystem=* to distinguish the two systems. [5] name:*=* keys are documented as conforming to BCP 47, but I needed to use this RFC 6067 extension to indicate the measurement system. This might be relevant in the future if anyone ends up needing to use Extension U in a name for some reason. – Minh Nguyễn 💬 21:40, 19 April 2022 (UTC)