Proposed features/Default Language Format
|Default Language Format|
|Tagging:||default:language=ISO language code|
|Definition:||The default language format used for the majority of names of places and features within a region|
|Rendered as:||Properly rendered multilingual name lables|
Specify the default language format used for names, and recommend use of language-specific name tags.
By making it easier to use language-specific name:code=* tags to be used instead of the default name=* tag, this proposal will encourage the use of name tags that include the language code for all features. This will improve the quality and utility of the database. It will be possible to display non-Western languages in their correct orientation and script, properly display multilingual names, and to research the most commonly used language formats in a particular area.
The key default:language=* with a 2 or 3 letter ISO language code should be tagged on administrative boundary relations, such as countries, provinces and aboriginal communities. This is the language used for the majority of named features within a particular region, as indicated on public signs and in common oral language use by the local community. If more than one script is common in the area, a qualifier can be added to specify the script format. More than one language code can be listed, separated with a semicolon, if the local community uses more than one language on signs or by consensus.
The language tag should be applied to the largest boundary relation that accurately represents the language used for default names. When a smaller administrative boundary has a different default language format, this boundary should receive a language tag as well. This would include boundaries of provinces or aboriginal lands where a different language is used.
The language tag may also be applied to individual features when the name is in a different language than the default for the region, or when the feature crosses a border.
The tag name=* is one of the most frequent in the OSM database. Wiki documentation states that the default name should be the most commonly used local name for a feature or place. It is recommended that name:code=* should also be included, where code is the 2 or 3 digit ISO language reference code, but it is common that only a default name=* is used, without any language code.
Unfortunately, the name=* tag does not make it clear what language is being used. This causes several problems:
1) Computer language processing is still at a limited stage, and place names are a particularly difficult problem. It is not possible to automatically determine the language used in a tag without local knowledge of the script and language. Therefore, it is not currently possible for a map or database query to return the most useful name for database users from different countries.
2) In places where bilingual names are common or compulsory for street and businesses, or where the local community prefers a multilingual name to be displayed on international maps, the default "name=*" tag currently contains more than one name, often separated by a nonstandard symbol such as "-" or "/". While this currently allows the name to be displayed in a script and language preferred by the local community, it complicates use of the database and is difficult to interpret by people from outside of the community who cannot read the language.
3) It is not possible for renderers to properly render bilingual names when one of the languages should be written vertically and the other horizontally, using the name=* value.
4) Localized and personalized maps, such as those used on smartphone apps or those optimized for a certain language, cannot show the user's preferred language as well as the locally preferred name. For example, the French map style currently renders name:fr=* when available. But this loses the locally used name, which is likely to be found on signs, reducing the utility of the map for orientation and routing. If the map attempts to display both name:fr=* and name=* together, this will lead to rendering the French name twice, when French is included in the name=*, for example in Morocco or for mountains on the border with Italy. (See the table below for examples, in the third column)
5) Researchers cannot query the database for the languages used to name map features. Any use of the OSM database for linguistic or cultural study would require manually checking each name for the script and language used.
6) Pronouncing names correctly is difficult when the language is not specified, especially when there are two languages in one tag. This is important for routing applications which provide turn-by-turn directions, and vision-impaired users.
Examples of current rendering problem
Tiles from Openstreetmap Carto style ("standard" tile layer) and French style (openstreetmap.fr) show the problems with the current system (See table below)
The label for Mont Blanc de Courmayeur / Monte Bianco di Courmayeur is produced from either the name=* tag, which has both names separated with a dash, or from name:fr=* for the French map. If a map were to attempt to show both the name:fr=* and the name=*, half of the name would be duplicated (see the third column). The same situation would happen in Hong Kong, where the local mapping community wants to show the name both in Chinese characters and in the Latin alphabet.
The area around Mount Everest ( 珠穆朗玛峰 - सगरमाथा) shows a mix of names in Nepali, Chinese and English. Many mappers have added names only in their preferred language, often English even when this is not appropriate. Note that the Tibetan names are also not rendered.
This proposal will solve these problems:
1) By associating a default language format in the database with administrative boundaries, it will be possible to automatically match a default name=* with the correct language and script. It will also be possible to provide a multilingual name label rendering without using a complicated value in the name=* tag, including rendering the names in different orientations based on the language and script.
2) Bilingual and trilingual places will have names displayed properly without implying a certain order of primacy. For example, names in Brussels, Belgium are currently written first in French, then in Flemish, with a dash as a dividing character, even though name:fr=* and name:nl=* are also specified, but ideally the order should be alternated, as on local street signs.
3) It will be possible for smartphone apps and localized maps to show the name in the users language first, followed the locally used name. For example, the French maps could show name:fr=* as one of the default language formats, plus the name or names that match default:language=*. If default:language=fr;nl, the French name will still be shown only once.
4) In the long term it will become possible to use only name:code=* without requiring an additional name=* tag. This will improve the consistency and quality of the database everyone, while still allowing the local community to select the default language format for display
5) Beyond the use in rendering name labels and improving the name data, this new language tag will provide additional linguistic data about each region, based on what is verifiable on-the-ground. The most common language used in local names is important characteristic of each settled place. This characteristic is already implied in the OSM database in the names used for places, geographic features, structures and shops, but is not accessible while most local names are in the default name=* form. By encouraging consistent use name:code=* in all names, and specifying the default language format used for the name of each region, it will be possible to analyze the proportion of names in each place that are in a particular language.
6) Most names and labels will be properly pronounced, even when more than one language is used. Routing applications with Turn-by-turn directions will provide the correct pronunciation for most destination and name tags without mappers needing to add specific pronunciation tags for every name, and vision-impaired users will get better audio maps.
Notes: When a particular feature (such as a shop, village, river or road), has a name that is not in the default language format for the enclosing boundary, it should be tagged individually with default:language=code. This states that the feature's name should be displayed using the specified language format, found in the name:code=* tag with matching language code. In general, this should be used for features where the on-the-ground oral name and written signage is in a language or script that is different from the script or language commonly used for other features in the region. It should also be used for features that cross an international or regional boundary with different default language formats used on each side. For example, a mountain or river that forms a border should be tagged with the correct language format for both bordering regions.
This proposal does not recommend creating new boundary relations to define the limits of languages. While many people would like to include maps of languages in OSM, this is not yet feasible, unfortunately. The existing administrative, protected_area and aboriginal_lands borders will be used when it is possible to verify the official language or default language used for naming features and places within the administrative boundary. The indefinite, fuzzy borders between language communities, such as in uninhabited wilderness and areas with sparse settlements, make a distinct border non-verifiable.
Add a tag in form default:language=code, where code is the 2 or 3 digit ISO language code, possibly with an additional qualifier if needed to specify a non-standard script, or to more precisely identify dialects.
For example, default:language=en means that English in the latin script is the default language format for commonly used names in this region, if tagged on a boundary. If tagged on an individual feature, it specifies the default language format for this feature only.
default:language=zh_pinyin specifies Chinese written in the Latin alphabet is the preferred language format. This is not an ISO code, but has been used extensive in name tags in the form name:zh_pinyin=*, so this language format would map the existing usage in names within OSM.
The language and script used for most local names should be tagged as the default language format. If two languages are always used, both may be tagged as defaults. These decisions should be made by the local community.
The languague codes should be the same as those already in use for names in the form name:code=*. Generally these are the 2-letter ISO 639-1 codes for large languages, and 3-letter ISO 639-2 or ISO 639-3 codes for smaller local languages that do not have a 2-letter code. OSM currently uses several additional language codes, such as zh_pinyin for Chinese written in latin characters ("Pinyin"). The codes should match between name=* and default:language=code. It has been suggested that script or dialectal varients could be specified by adding an additional qualifier to the ISO code, separated by an underscore, as is done for zh_pinyin.
Administrative boundaries: Relation with type=boundary and boundary=administrative, admin_level=2 to 10 or 11
Boundaries of semiautonomous or designated indigenous communities: Relation with type=boundary and boundary=aboriginal_lands Relation with type=boundary and boundary=protected_area and protected_area=24 (eg Native American reservations, Aboriginal lands, etc)
The largest valid boundary should be tagged with a default language.
For example, if the country is tagged, the provinces do not require tags unless their default language is different. If a municipality is tagged, its neighborhoods or wards do not require separate tags unless they have a different default language format.
Individual named map features: If the default language format of an individual feature name is different than the normally used language format for the area, the feature should be tagged directly. Individual features that cross a boundary, such as an international border, or share nodes with the boundary, should also be tagged with the default language format for each side of the boundary.
Administrative subdivisions (Eg Provinces, Counties, Municipalities) Smaller level administrative boundaries do not require a default:language=* tag if the value would be the same as that on the next higher relation. But the tag is required if there is a difference. For example, in Belgium, the area of Flanders would be tagged 'default:language=nl, but in a few municipalities on the borders the language used is French or Germany, and these municipalities would be individually marked with default:language=fr or default:language=de, or default:language=fr;nl in the case of a municipality where both languages are used on signs and in conversation. The other municipalities in Flanders would not require a default:language=nl tag, because the default language would be the same as in the larger, enclosing administrative boundary.
Aboriginal communities (eg Native Americans, First Nations, minority ethnic communities): The lands of these communities are often delineated with an official boundary, such as a reservation, protected area, or aboriginal lands. Most of these boundaries are boundary=aborignal_lands but boundary=protected_area has also been used. These boundaries may be tagged with default:language=* if the local community chooses to uses a different language, or combination of languages, as the default for names.
Neighborhoods: If the street signs and shop signs in a "Chinatown" business district are displayed in Chinese characters and Latinized Chinese (aka Pinyin), the administrative boundary corresponding with the neighborhood could be tagged default:language=zh;zh_pinyin to specify that both language formats should be displayed. If there is no appropriate administrative boundary, the individual shops and streets should be tagged.
These tags should be determined by the local community, as has been done for the default name=* tag in the past, and should correspond with majority usage on-the-ground.
Individual features: If there is a Hindu temple in Malaysia, commonly known by a Tamil language name in Dravidian script on public signs, the place of worship would be tagged with name:ta=*. This allows the renderer to show a Tamil langauge tag in Dravidian script, even though the default language format for Malaysia is Bahasa Malaysia, and also informs all database users to expect public signs to show the name in Tamil with Dravidian script.
Mont Blanc, on the border between Italy and France, should be tagged with the appropriate default language format for both areas: default:language=it;fr. This would also be the case for a river which forms the border between two areas with different default language formats.
Please do not create new boundaries that do not have official recognition. These tags should only be placed on verifiable administrative boundaries, boundaries declared by the government (including indigenous lands), or on individual features as described above.
Individual features should have a fully documented name, with name:code=* used for the locally preferred name, as found on signs and used by locals in conversation.
A general-purpose, international map renderer should query the default language values on the boundaries or places the containe a named object, before a name label is rendered. If there is a single default language, the name:code=* value that matches the default language should be used to render the name. If there is more than one default language, then multiple lables should be placed. For example, in Brussels the administrative boundary would show fr and nl (French and Flemish) as default languages. Street names and other local names would be rendered by combining the values of name:fr=* and name:nl=*. The order of the two names could be alternated on featuers such as highways and waterways where the names would appear more than once, to maintain neutrality.
Multilingual names should be displayed in the standard script and orientation for the language; for examples Arabic names would be in Arabic script, written right to left.
"Clickable" maps, such as those used on smartphone apps, could show a list of all locally used names when a feature is selected, in addition to the name in the user's own language.
Routing services, navigation apps and maps for people with visual disabilities can use the default language format tag to improve the automated pronunciation of names.
- It is proposed that the Wiki pages above be edited to recommend that the default language format should be specified on the feature or on the enclosing administrative boundary. A short description of use of these tags in rendering and by database users will also be included.
It is recommended that the editors suggest adding a name with language code, in form name:code=*. Ideally, the editor application should suggest the most likely name:code=* language code based on the default language format for the region being mapped, or based on the encoding used.
This proposal has been discussed on the September 2018 Tagging mailing list: https://lists.openstreetmap.org/pipermail/tagging/2018-September/039048.html
A similar discussion previously occured in April 2018: https://lists.openstreetmap.org/pipermail/tagging/2018-April/035855.html
Name of tag changed to default:language=* on 7-October-2018
It was found that the tag language:*=* has been used in the form language:en=main to describe the main language taught at a language school. It's not a very common tag, but there may be a small risk of confusion in using the same "namespace." Also, there is a proposal to have all defaults (eg maxspeed for highways) in the form default:*=*. A small number of people expressed support for changing the tag to this order, several others did not have an opinion (including the initial author of this proposal), and no one has yet spoken against the change, therefore the proposed tag has been changed.
Please comment on the talk page if you disagree with this change or if you have any other comments.
Summary of previous discussions prior to creation of this proposal:
This issue has been discussed on the Tagging mailing list in 2017 and twice in 2018. Several people opposed the creation of new boundary relations for languages, in April 2018 and in September 2018. This would be an unnecessary proliferation of relations that are hard to maintain and difficult to verify on-the-ground. However, there was near-consensus that it would be useful to have a default language format for regions, to help solve the problem of interpreting the meaning of name=* when there is no name:language code=* tag.
Christoph (@Imagico) suggested tagging the official language information on administrative boundary relations on his blog: http://blog.imagico.de/you-name-it-on-representing-geographic-diversity-in-names/
"In case of Germany the admin_level 2 boundary relation (51477) would get something like language_format=$de – and there would be no need for further format strings locally except maybe for a few smaller areas with a local language or individual features with only a foreign language name."
And Yuri Astrakhan wrote: "[Consider] an Italian user viewing  a feature in China with two tags: "name" and "name:fr". In this case, "name:fr" tag is preferred because "name" is likely to be in Chinese ...  Same tags, but the feature is in Italy -- now "name" tag is the better choice because the name is actually in the same language as the reader. ... [But] without knowing the language of the "name" tag, we cannot use it..."
I reviewed the Wikipedia documentation of language classification. It all looks too debatable and not useful for mapping or this database, except for Official language which is clearly defined by law. See Wikipedia articles on: Official Language , Regional Language (still based on Country boundaries), Minority Language (only defined on admin_level=2 again), Heritage Language (controversial, not verifiable for OSM); Indigenous Language (terminology and definition debated); First Language / Second Language (Well defined in linguistics, but only for individuals, not for places). There were comments against using native/indigenous/aboriginal for local languages, and support for "local"
Because this proposal is focused on the use of language in names, I believe following the current name classification system is useful. For example, there are official_name=* and loc_name=* and name=* is defined to normally be the most common local name. Therefore, besides the default language format, the two suggested additional tags are language:official and language:local. This will accomodate languages that have local importance or official status even if they are not used as the default language for naming geographic features and places in a settlement or region.
The biggest issue was picking a good key and value for the tag, and deciding to focus on one tag or a range of options.
Local and Official Languages?
I initially included language:official=code and language:local=code in the draft proposal, but removed them to simplify the discussion. These are the reasons to consider including these two sub-tabs in a future proposal:
While the name=* should indicate the common local name or names, it does not indicate if this is also the name in the official national language or languages. The official_name=* may contain this information, but this has the same problem of lacking information about the languague format.
local:language=code A local language that is not the majority language used for naming local features, but is used for naming a number of local features, should be tagged with local:language=code. Database users should assume at default:language=code represents local language format(s) as well.
The additional tags official:language=* and local:language=* would allow creation of maps that contain names in all official languages within a country or region. This will make the OSM database more useful for government ministries, and NGOs working in developing countries. Furthermore, a "clickable" map, such as those used on smartphone apps, could show a list of all locally used names in addition to the name in default language format and in the user's own language, when a feature is selected. For example, internationally renowned features such as the world's highest peaks, or the names of famous cities, may have a name:code=* in dozens of languages, but only those used locally or on official signs would be relevant to specific users.
If name tags commonly include additional local languages in a region, in addition to the name in the default language format, the boundary could be tagged with local:language=* or official:language=* to specifiy the other languages commonly used. More than one official or local language can by specified in the value by use of a semicolon as a divider.
However, the default language should always be specified when verifiable, to reduce the need to tag individual map features and to speed up searches of the database. Therefore, at this time these two tags are not included in this formal proposal, but they could be proposed separately in the future.
It is not recommended to use "historic" language values for languages that were used in the past, such as in the old_name=* tag, because these lack verifiability on-the-ground, but they would be appropriate additions on http://www.openhistoricalmap.org . The exception would be historic languages, such as Latin and Sanskrit, which are still used for names of places and features, despite no longer being spoken.
If a language used in a name is not local or official, it is presumed to be a foreign language, therefore a separate :foreign subtag is not required. Many internationally-known features, such as the highest mountains and largest rivers, have long lists of names with language codes, but the vast majority are foreign-language names not used locally.
Please comment on the discussion page.
- I approve this proposal. --Jeisenbe (talk) 13:08, 15 October 2018 (UTC)
- I approve this proposal. An overdue tag IMO. Thanks for your efforts! --SelfishSeahorse (talk) 15:12, 15 October 2018 (UTC)
- I oppose this proposal. The idea is good, but the proposal seems to cause a lot of confusion when discussed with other mappers. It also does not match how POIs are mapped in bi-lingual areas such as Brussels. Only official names (towns, streets, government, big tourist attractions) are mapped with a name tag containing both French and Dutch. All other POIs are only mapped with a name tag. --Escada (talk) 13:58, 16 October 2018 (UTC)
- I approve this proposal. For monolingual areas, this is the most sensible solution for mapping names' languages. I strongly prefer it compared to duplicating the value of name as name:??, or requiring the use of name:?? for features with only a single name. --Tordanik 16:45, 16 October 2018 (UTC)
- Note that this proposal (as I understand it) is not supposed to define language of a name=*. It is supposed to make replacing use of name=* tag by language-specific name tags easier and reduce need for putting multiple names into name tag in regions that use many languages Mateusz Konieczny (talk) 10:03, 17 October 2018 (UTC)
- I approve this proposal. I am not sure is it going to work, but it is an interesting idea and in the worst case some unused tags will appear on some administrative boundaries. Mateusz Konieczny (talk) 10:04, 17 October 2018 (UTC)
- I approve this proposal. following Mateusz' lead. Seems a no-risk proposal --Javbw (talk) 10:36, 17 October 2018 (UTC)
- I approve this proposal. The idea is good, however the proposal is very hard to read and needs a lot to reading to get to the point. It would be splendid if we could clear this up a bit before publishing. --Bkil (talk) 17:35, 18 October 2018 (UTC)
- I oppose this proposal. As if there weren't enough editwars over disputed territories in OpenStreetMap already, this will create another source of endless wheelwarring that we really do not have the community to overwatch and deal with. For example, should Tibetan be a standard language in China, and if so in which regions? Same with other minority languages, e. g. in Russia. -- Prince Kassad (talk) 19:57, 18 October 2018 (UTC)
- I oppose this proposal. There are many places in the world where value of this tag is questionable and ambiguous Murcik (talk) 08:24, 19 October 2018 (UTC)
- I oppose this proposal. Not convinced.--US Woods (talk) 09:01, 19 October 2018 (UTC)
- I oppose this proposal. This should be made with a settings by icon on the page of openstreetmap.org, not by default. LB3AM (talk) 10:22, 19 October 2018 (UTC)
- I oppose this proposal. This would mean, that OSM defines the language in wich an object likes to be named. I don't see any additional information regarding a name=* tag which will be used by default.--klik (talk) 12:47, 19 October 2018 (UTC)
- I oppose this proposal. Agree with US Woods Johnparis (talk) 20:13, 21 October 2018 (UTC)
- I oppose this proposal. I find it confusing. The proposal is not clear in what would be the real improvements (while some big downsides are known as above peoples already commented). --Anakil (talk) 07:24, 22 October 2018 (UTC)
- I approve this proposal. This does not make it easier to vandalize or start an edit war (those can already be easily done by changing name=* on highly visible objects). Realistically, the worst-case scenario is that another tag ends up in the database, and the best-case scenario is that it becomes less technically challenging to handle language fallbacks and to correctly display labels in multiple languages. (For example, in Hong Kong it would become possible to show labels with a line break between Chinese and English, which would look nicer than the current separation using a normal space character.) However, it might be a little difficult to tie this to administrative boundaries, especially in places like Chinatowns. Jc86035 (talk) 15:34, 22 October 2018 (UTC)
- I approve this proposal. --Romanf (talk) 05:57, 23 October 2018 (UTC)
- I oppose this proposal. I see no real improvement. name=* is already default and also conformable or disputed like this tag would. Let the user decide which language he want to see on his map. --Robybully (talk) 06:29, 23 October 2018 (UTC)
- Not that the purpose of default:language=* is not force map renderers to display names in a particular language, but to store the information what language is used in a particular area, that is, giving the information which name:* is the local name. --SelfishSeahorse (talk) 09:47, 23 October 2018 (UTC)
- I approve this proposal. Can help a lot and definitively does no harm. Sommerluk (talk) 09:34, 24 October 2018 (UTC)
- I approve this proposal. Very useful! Jotam (talk) 09:44, 25 October 2018 (UTC)
- I oppose this proposal. While I agree that this is probably sufficient for areas with only one, main language, it doesn't sufficiently specify and explain how to deal with areas with multiple languages, and in particular, how to present these to the map reader (IMHO a missed opportunity, compared to Christoph's proposal of a formatting tag). The critical areas with regard to languages are the multilingual areas, and if they are not solved the proposal is not of much practical use.--Dieterdreist (talk) 09:59, 25 October 2018 (UTC)
- I think it's clear enough that two language tags would be applied separated by semicolons; and maps would probably present the labels however they would want (although I'm guessing the table implies that they would be rendered separated by line breaks). To me, it makes more sense to let the map maker decide with the software than to have users introduce inconsistent formatting (space, hyphen-minus, slash, ...). The proposal for alternation is a bit too vague, but this should probably be left to another tag instead of being shoehorned into default:language=*. If the proposal fails (not unlikely right now, with 10 supports to 11 opposes) then I think it could be resubmitted with some more of the details filled in (e.g. how to indicate language importance; how alternation would work; whether multilingual labels would be rendered on ways separately or as one continuous label and how this interacts with the language importance thing). Jc86035 (talk) 10:12, 25 October 2018 (UTC)
- I oppose this proposal. While there are considerable merits to the proposal, I don't remember seeing sufficient discussion covering the wider multilingual geographic areas that might be affected. In particular that of Indian regions, where two official languages are the minimum, and more than two in most places. --Indigomc (talk) 16:52, 25 October 2018 (UTC)
- I oppose this proposal. Although this would be a step into the right direction it needs further discussion. Tagging an administrative boundary with a default language can be a very political action. Sometimes such boundaries are used to divide minorities. --Nacktiv (talk) 22:18, 28 October 2018 (UTC)
- I oppose this proposal. name:<lang> tags are enough to make multilingual maps Wowik (talk) 11:27, 29 October 2018 (UTC)
- I oppose this proposal. In multilingual areas it brings a lot of conflicts with language used from the Hotel/B&B/Restaurant by default or what is the default language in the region where the hotel is. Also it is hard to implement into street names or addresses. --Luschi (talk) 16:27, 30 October 2018 (UTC)