Proposed features/Language information

From OpenStreetMap Wiki
Jump to: navigation, search
Language information text tags
Status: Rejected (inactive)
Proposed by: sommerluk
Tagging: language:*=code
Applies to:
Definition: Describes the language of the text
Rendered as: Not rendered itself. But might improve rendering of name=*
Drafted on: 2017-30-07
RFC start: 2017-08-05
Vote start: 2017-08-30
Vote end: 2017-09-13

Proposal text

The prefix language:*=<code> can be used to describe the language that the value of another tag has. This makes only sense if the other tag has a free-text value. Allowed values for language:*=<code> are the standard BCP-47 codes.

It is not necessary to add language information to most objects and keys in OSM. But in regions where a different language/script combination makes a difference in rendering, it can be usefull to add language:name=* to determine the language of the name key. Anyway using language:name=* is not mandatory.

Example: For the Bulgarian city of Montana use:

name=Монтана

language:name=bg

Usually rendering engines default to russian cyrillic, but this city is in Bulgary. See here the significant difference in russian rendering (above) and the bulgarian (below) rendering:

Montana.svg

Rationale

There are many applications that use the name=* tag in OSM. You will usually use the name=* tag when you intentionally want to use the name in the default language. (Example: OSMand lets you choose between “local names” or a specific language for map rendering. And the default style at openstreetmap.org uses exclusivly name=* because it wants to use always the default local names, so you can see the names at the map like they are written locally at each place of the world. They do intentionally not use tags like name:en, name:jp, name:de…)

The content of name=* is plain Unicode. Problem: This is not enough to render the text correctly. There are glyphs (character shapes) that are different in the four variants (japanese, traditional chinese, simplified chinese, korean) of the CJK script, but Unicode encodes them at the same codepoint. This process is called “Han Unification”. Also there are four variants of some cyrillic glyphs (russian, bulgarian, serbian, mazedonian) that are encoded at the same Unicode codepoint. And there are also seldom cases in the latin alphabet: Ŋ has different glyph forms in Sami language and in african languages. In the web, this problem is easily solved: The HTML code contains a language tag that gives the necessary information about the language. So the Internet browser can display everything correctly. In OSM this information is missing.

Deduce this information by the country in which our OSM element is located is not very reliably. Also within the same country may exist (much) more than only one language. Also within the same region, there might be objects who’s name is in a different language than the mayority language of this region, for example shops that sell Bulgarian food in Ireland. So it’s error-prone to deduce this from the geocoordinates. That’s not an option.

Deduce this information by comparing name=* with the other name:en, name:jp, name:de … tags does also not help. Example: The node http://www.openstreetmap.org/node/25248662 (english: Beijing) has name=北京市 and name:ja=北京市 and name:zh=北京市. They are identical. We cannot reliably determine the language of the name value.

This tag helps to respect the cultural heritage of the local writing.

Wether or not multiple values for cases like “Bruxelles - Brussel” can be used might (or might not) be subject of another proposal…

A possible usecase could look like this: A cartographic style uses this information for rendering names. It requests language=name directly in the SQL querry. This value is passed simply as-is to Mapnik (Mapnik will likely support language tags starting with Mapnik 3.1). Mapnik uses Harfbuzz internally for text rendering, and Harfbuzz accepts BCP-47 values (and if the value is invalid, it is silently ignored). BCP-47 is in wide-spreaded use, and it allows to distinguish not only between Chinese and Japanese, but also between Traditional Chinese and Simplified Chinese.

Representation

Not rendered itself. But can be used to make correct language-specific rendering of name=* possible.

This informaction can be used also by text-to-speech-engines to correctly pronounce the default local name of a place.

Voting

Instructions for voting
  • Log in to the wiki if you are not already logged in.
  • Scroll down to voting and click 'Edit source'. Copy and paste the appropriate code from this table on its own line at the bottom of the text area:
I approve this proposal yes {{vote|yes}} --~~~~
I oppose this proposal no {{vote|no}} reason --~~~~
Replace reason with your reason(s) for voting no.
I abstain from voting but have comments abstain {{vote|abstain}} comments --~~~~
If you want don't want to vote but have comments. Replace comments with your comments.

Note: The ~~~~ automatically inserts your name and the current date.


  • I approve this proposal I approve this proposal. --Sommerluk (talk) 11:33, 30 August 2017 (UTC)
  • I approve this proposal I approve this proposal. --Polarbear w (talk) 12:45, 30 August 2017 (UTC)
  • I abstain from voting but have comments I have comments but abstain from voting on this proposal. This should only be tagged where the language is different from the reasonable default. In other words: The only example given (a Bulgarian city having a Bulgarian name) is one where the tag should not be used. The tag itself is sound, but it really needs explicitly stated limitations on its use to prevent well-meaning mappers from overusing it. --Tordanik 21:05, 30 August 2017 (UTC)
  • I oppose this proposal I oppose this proposal. I have to points to vote with no: 1. the tag is better *:language and 2. I think the existing tagging with name:de is clear enough. So that any language can be clearly assigned. --Foxxi59 (talk) 21:34, 30 August 2017 (UTC)
@Foxxi59 - I do not get your first point, what is better than what? For your second point, it seems you have missed the problem the proposal wants to solve. It is about to give additional information to the principal "name=*" thag, it does not obsolete additional languages in "name:ISO". --Polarbear w (talk) 22:43, 30 August 2017 (UTC)
@Polarbear - I think order of the tags must be from global to local: name:language instead language:name. And I think that name:ISO maps all possible languages. I find the additional use of e.g. name:de for German names better than a new tag. --Foxxi59 (talk) 17:34, 31 August 2017 (UTC)
This proposal is not to replace name:de, it is to give specific rendering information for specific cases. --Polarbear w (talk) 17:45, 31 August 2017 (UTC)
  • I approve this proposal I approve this proposal. --Waldhans (talk) 08:35, 31 August 2017 (UTC)
  • I approve this proposal I approve this proposal. --Kocio (talk) 18:16, 31 August 2017 (UTC)
  • I abstain from voting but have comments I have comments but abstain from voting on this proposal. name tag have many issues (should be only one of language, don't solve international objects as oceans and rivers, do not say what exactly language used and etc.) so this proposal trying to fix on of it, but not all of name tag issues. --Tbicr (talk) 13:35, 1 September 2017 (UTC)
  • I oppose this proposal I oppose this proposal. language:name=bg is in the wrong order. it is the language of the name, so name:language=bg. "It is not necessary" or "not mandatory" is a nonsense with both the purpose of the proposal and the goal of avoiding the creation of millions of unnecessary tag. When the rendering of a name conforms to local use, it MUST NOT add a language tag on every object for example in Bulgaria. Conversely, if a name have a rendering problem (a Bulgarian shop in Ireland with a local name in Bulgarian) a, it WOULD be necessary to put the tag language to have a correct rendering. The problem may require solutions other than the proposal (default value for a country or a subarea instead of having to put it on all objects, default value for all tag without a language code instead of having to set it many times on the same object, always use default local language in tag like name, ...) I also disagree with a second vote on a motion that does not deal with the many opposing opinions that were first published Marc CH (talk) 23:04, 1 September 2017 (UTC)
Well, the first proposal missed 74% approval, but it it has clearly more than 50% approval. Using the order language:name was proposed during the voting of the old proposal. The user who proposed language:name instead of name:language wanted to avoid to break the assumption that every suffix after name: is always a language code. (Yes, I know that this assumtion is also right now not 100% correct, but anyway.) So the order “language:name” is the result of the discussion of the old proposal… Sommerluk (talk) 09:43, 3 September 2017 (UTC)
the % is not the most important, a few friends could raise the %. the most important imho is to see if it is a solution considered good or if there are big criticisms that will complicate its implementation (because having an approved tag but with little use, is not very useful). language:name=bg or language:name=bg shows that the system has a more general problem. the critics of the previous version show that having to duplicate all "free text" tag of all objects in a country is a problematic solution. other solutions are possible as language=bg valid for all "free text" tag on object. or better a default values ​​at the scale of a country or region. I think you have made a new proposal too quickly since this problem raised in the first vote is not resolved. do you really have to say that in your country, for each store, for every building, the shop name, the street name, the description are all in the same language ? I think it would have been useful to find a general solution before a new without resolving this main point. Marc CH (talk) 17:16, 3 September 2017 (UTC)
  • I oppose this proposal I oppose this proposal. Because of the following reasons fully explained in the message on <tagging> 2017-09-05 13:13 in reply to this voting announcement (please note that I find this a good proposal, but with defects to be corrected, principally the lack of defaults)
  • The syntax is incorrect
  • Missing short "why" in "proposed text" -> some people did not understand your proposal and they should.
  • Contrarily, the value of language=* is needed for every name.
  • Either we have to use :language=* for every name or there is a default mechanism for it. But OSM has just abandoned the proposal for defaults. Please restore that proposal to define a default :language mechanism first.
  • OSM.org does not use only the local language intentionally but because it uses precomputed tiles.
  • No explanation of how to use the BCP-47 codes.
  • Bulgarian Cyrillic does not exist. Your example is just cursive Cyrillic.
  • No mention of how to specify alternative scripts and that they must be used for name:??=* too
Papou (talk) 11:45, 5 September 2017 (UTC)
  • I abstain from voting but have comments I have comments but abstain from voting on this proposal. I agree with Marc CH that "language:name=*" is in the wrong order. It should be name:language=*. I believe we should move or change this to a new proposal, name:language=*, and abandon language:name=*. --EzekielT (talk) 18:40, 6 September 2017 (UTC)
  • I oppose this proposal I oppose this proposal. use wikidata for reconciliation! --Kenji (talk) 21:24, 7 September 2017 (UTC)
  • I oppose this proposal I oppose this proposal. I see name as a established name, independent of language. If there are different languages use name:* --Robybully (talk) 08:43, 8 September 2017 (UTC)
  • I oppose this proposal I oppose this proposal. Such a proposal should be able to cope with Multilingual names. --Tyr (talk) 11:39, 8 September 2017 (UTC)
  • I oppose this proposal I oppose this proposal. use wikidata for reconciliation! -- User 5359 (talk) 06:44, 9 September 2017 (UTC)
  • I oppose this proposal I oppose this proposal. This only solves the problem for a few very specific situations and is not really suitable for general application. Also: Wikidata. --De vries (talk) 06:41, 10 September 2017 (UTC)
  • I abstain from voting but have comments I have comments but abstain from voting on this proposal. Could we consider using the code other BCP-47? e.g. "languagename=int_name" on disputed area for avoidance of conflict of name rendering. --nyampire (talk) 01:43 September 2017 (UTC)
  • I approve this proposal I approve this proposal. --ToniE (talk) 18:29, 11 September 2017 (UTC)
  • I approve this proposal I approve this proposal. --★ → Airon 90 09:44, 13 September 2017 (UTC)