Proposal talk:Language information for name

From OpenStreetMap Wiki
Jump to navigation Jump to search

tag name

Generally I really appreciate this proposal, we have been missing a way to tell which is the default language used in the name tag. Minor nitpicks: why do you suggest an abbreviation (lang) rather than a word like name_language=* or name:language=*? This concept could also be extended to other name tag variations like loc_name_language=* or official_name_language=* (here the colon approach would make it clearer what this is about, e.g. loc_name:language=*). --Dieterdreist (talk) 12:34, 5 May 2017 (UTC)

Thanks for your feedback! Now the proposal uses “language” instead of “lang”. About the colon approach and extending it to other keys like loc_name: I was also thinking about that. When writing the proposal, I was a little bit reluctant to the colon approach, because usually what you find after the colon is the language code itself within the key, and then a localized name in the value. On the other hand, right know there exist yet non-language-code suffixes (:right and :left for border features), so anyway yet now a data consumer cannot rely on the assumtion that everything after “name:” is a language code. I’ve changed the proposal to use the colon. Let’s see what the others think about… --Sommerluk (talk) 08:28, 6 May 2017 (UTC)

That proposition needs default languages to be defined first


You say "This tag is not always necessary" which, as often in OSM, is not really precise.
That means that if there is no name=* tag, the region's default languages applies.
Alas, OSM seems to hate tagging default values inside its database.
Rather they are scattered in many other places and if one changes, all applications have to be updated instead of a transparent OSM map update.
See the nice Proposed features/Defaults that they removed (even though it is used).
So, your proposal puts the cart before the horses and default languages should be defined first.
If you want to do that, call on my advice on how to do it.
Here in Belgium, we have defined the linguistic regions relations.
But, owing to that OSM shortcoming, they are of course not used as defaults. --Papou (talk) 22:47, 5 May 2017 (UTC)

Thanks for feedback. After skimming the proposal you mentioned, I did not find anything about languages. It seems to be rather about road speed limits. Anyway, it tries to provide defaults for already existing keys (Like, for example, maxspeed. So, following the proposal, if the real-world maxspeed differs from the default in this region, the mapper can of course add a maxspeed=* tag to its object). As there still does not exist a key that describes the language of the name=* key, I cannot see how the proposal for defaults could help here. Also, the proposal for defaults is abandonned since 2010… --Sommerluk (talk) 08:28, 6 May 2017 (UTC)
This proposal doesn't need to define a default language, it simply introduces a tag to say in which language(s) the name tag is given. --Dieterdreist (talk) 08:54, 6 May 2017 (UTC)

I don't see how you can say that def:conditions;new tag = default_value is limited to maxspeed.
It's about anything and if it doesn't speak of languages, it's because it can't speak of everything.
You still don't say how to determine the language of a name without a :language tag. Papou (talk) 13:00, 26 May 2017 (UTC)

multlingual areas

Can we have

name:language=fr - nl

for places where name is like this:

name=Rue Haute - Hoogstraat

--Polyglot (talk) 19:57, 23 May 2017 (UTC)

Thanks for your feedback. The proposal for multilanguage names is name:language=fr;nl basicly because the “;” character is yet known from other keys as value separator. In Belgium, multilanguage names are usually like “a - b”, but there are other regions of the world, where the tagging convention is different: “a/b” or “a / b”. Nebulon42 has made a good overview, that is available at https://wiki.openstreetmap.org/wiki/User:Nebulon42/Multilingual_names#Status_Quo_in_OSM With name:language=* I want to propose a tag with a clean and strict syntax that can easily (and unambigously!) be processed. That’s the reason why the proposal uses the “;”. --Sommerluk (talk) 20:39, 23 May 2017 (UTC)
My preference would be to use the same separator in both the name field and the name:language field. That will be a lot easier to parse automatically. If it had made sense to use ; between both languages, then we would have used that, but in something that's going to be rendered on maps, as is, it would have looked extremely ugly.
The hyphen "-" is unsuitable as a separator (IMHO) because it also occurs in regular names (that are not multilingual), e.g. Dessau-Roßlau. Better use something that doesn't occur, I would suggest the slash "/" (the traditional OSM-multivalue-approach would be the semicolon ";", but it is ugly in rendered maps). --Dieterdreist (talk) 11:17, 24 May 2017 (UTC)
The separator is not "-", it is " - ", which is quite suitable. I'm a big proponent of using ";" as a separator for machine readable tags, but since no preprocessing will happen on name tags to convert it into something readable to humans, before rendering, it's better to create human readable strings for situations like in Brussels. --Polyglot (talk) 09:29, 27 May 2017 (UTC)

Use geographic boundaries

This is the question you will almost certainly get, so I will ask: Why aren't you simply using geographic boundaries to determine the glyph in the name tag? Everything in Japan is rendered with the Japanese glyphs, everything in China with the Chinese and everything in Korea in Korean Hanja. For the case we have name:jp name:ko we already have the language information and can use those glyphs.

Partial answer: Yes, but there are special cases where it doesn't work. For example in Korea, the name:zh tag is sometimes used for Korean Hanja which can be different from Chinese. There is currently no way to determine if the tag name:zh is to be rendered in actual Chinese or Korean Hanja. However: I'm not suer how this proposal will solve this issue because it only seems to be aimed at the name= tag, not its language specific name:xy= tags. --Panoramedia (talk) 14:21, 24 May 2017 (UTC)

Additional to what you have already mentioned we can add this argument: The assumption that all name=* in China are Simplified Chinese is not true. In China there are more that only one language. There are quite a few regional languages. The geographic boundary might be a not-so-bad approximation when looking to CJK characters only, but it will still be only an approximation and it would only solve the CJK glyph issue. Having name:language=* works potentially for all languages – also when language boundaries and country boundaries are different. The name:language=* information can be used to tell to an OpenType rendering engine to use the specific rendering rules for a particular language. Today we have OpenType fonts that are developed with internationalization in mind and that provide various glyph variants that can be chosen based on the language (also for various African languages that are written with the Latin alphabet but use sometimes different glyph variants). The corresponding OpenType features is called “locl”. --Sommerluk (talk) 15:12, 24 May 2017 (UTC)
Thanks for the clarification. Just to make it super clear, please add info how the name:language will solve your examples and how they will be tagged: Node with name:en=Beijing name=北京市 name:ja=北京市 name:zh=北京市. Additionally maybe a node in Korea with Korean Hanja and Chinese (it doesn't exist because there is no way to tag it, since booth are tagged name:zh= at the moment and this proposal doesn't address that. --Panoramedia (talk) 15:46, 24 May 2017 (UTC)
Node with name:en=Beijing name=北京市 name:ja=北京市 name:zh=北京市. The default map style at openstreetmap.org only renders name=北京市 and ignores the other tags like name:en=Beijing (This means it uses local names by default to show that OSM is an international project.) However, the map style cannot know what language name=北京市 is. Currently, it defaults worldwide to Japanese (which is sort of arbitrary) and uses Japanese glyphs if various glyph variants exists. When name:language=zh is available, the map style could instead adopt its rendering and use Chinese instead of Japanese glyphs. The other example: name=* and name:language=ko will work without problems, both if name=* is Hangul and also if name=* is Hanja. OpenType smart fonts need a language information, not a script information (the script information is already contained in the string that will get rendered). It is up to the font to deal correctly with this information. The default style at openstreetmap.org uses the font “Noto” that indeed does correctly deal with this information. So it will work in both cases. --Sommerluk (talk) 19:34, 24 May 2017 (UTC)

OSM made an error by using a generic name key and this proposal provided a solution to it

In the early days the OSM community made an error by introducing the generic name key (usage example: `name=Hamburg`) without the need to suffix it with a language code (usage example: `name:de=Hamburg`). Now it is to late to change all occurrences of the `name` key to the corresponding `name:<lang code>` version. So this proposal comes in. Instead of touching `name` it suggests to use a suffix to indicate the default language used for the value of the `name` key.

Improving this proposal

This proposal was just not really right.

Instead of `Example: name=London and name:language=en` it should have been `Example: name=London and name:default=en` to not confuse `name:language` with the language placeholder `name:<location>` (converts to e.g.: `name:de`, `name:fr`, `name:nl`, ...)

Instead of `For double-names in the name=* tag (like “Bruxelles - Brussel”) a semicolon-separated list: name:language=fr;nl` which is unambiguous for data consumers and breaks the database normalization rules it should have been `Double-names in the name=* tag (like “Bruxelles - Brussel”) are discouraged and should go in separate tags: `name:fr=Bruxelles` `name:nl=Brussel` `name:de=Brüssel` leaving the `name` empty because in Belgium there seems not to be a single official language`

How data consumers can deal with it easily

If the user defined a language preference like `de,nl,fr` then the data consumer should watch out for `name:de` first, `name:nl` second and then `name:fr` at last. If there is still no key match then continue to watch out for the existence of `name` key. If there is a `name` key then watch out for `name:default`. If `name:default` exists, then translate the language of the value of `name` with the help of the key `name:default` to German (because the users' first preference is `de`) if there is no `name:default=de`, `name:default=nl` or `name:default=fr`. If `name:default` does not exist then display `name` as it is.

This is how a good data consumer, map provider etc. wants to deal with this key. The last option with the automatic translation approach is optional and just adds to the user experience as translation infrastructure is expensive (at least for free map providers).

--Valor Naram (talk) 15:01, 29 January 2023 (UTC)

Which language is McDonald’s in? Or this shop in Berlin (there are millions of similar examples in the world), which is the default language for it, or how would it look if you tagged it according to your idea? https://www.openstreetmap.org/node/5531511394 —-Dieterdreist (talk) 14:21, 31 January 2023 (UTC)
in such cases `name:default` should be skipped. But I forgot to mention that case. The deal case above just was an example how data customers can handle such cases. Will improve it --Valor Naram (talk) 14:29, 31 January 2023 (UTC)
One way to consume the data, synthesizing speech output, would clearly benefit from knowing the language the name is in, particularly useful for names in a different language than the regionally expected. It can get even more complicated if there are combinations of different languages in the same name, e.g. https://www.openstreetmap.org/way/23824579 --Dieterdreist (talk) 18:21, 31 January 2023 (UTC)
I don't get your point. If a feature has a name in more than one language because there are more than one official languages in the region then `name:<language>` should be used and not `name`. Valor Naram (talk) 18:40, 31 January 2023 (UTC)
look at the example, Montreuxstrasse. —Dieterdreist (talk) 19:48, 31 January 2023 (UTC)
I think you refer to https://www.openstreetmap.org/way/23824579 . In this example: `name=Montreuxstrasse` `name:default=ch`. Valor Naram (talk) 19:58, 31 January 2023 (UTC)
This would be really confusing, “ch” is a language code, but for Chamorro which isn’t a common language in Switzerland. —Dieterdreist (talk) 20:12, 31 January 2023 (UTC)
Sry I mean for Switzerland there are four languages but in the different regions they is only one common language with some overlap at the internal borders (between the states). The region in the example is German speaking so the name for the street would be tagged as

`name=Montreuxstrasse` and `name:default=de` Valor Naram (talk) 20:23, 31 January 2023 (UTC)