Talk:Taginfo/Parsing the Wiki

From OpenStreetMap Wiki
Jump to: navigation, search

Wrong analysis

The following are generally NOT errors and you should not instruct users to alter the content of the wiki, when these are just limitations of the TagInfo website, not even justified technically! — Verdy_p (talk) 09:06, 26 November 2016 (UTC)

description parameter should only contain plain text

"The description parameter containing the short description of this key, tag, or relation type should only contain plain text, not wiki syntax. This is important so that taginfo, but also other software outside the wiki, can use this text properly."

This analysis is COMPLETELY wrong.
A description DOES need to contain basic markup for various languages, and semantic markup such as "code", "br", "sup", "sub", or sometimes even small images/icons/diagrams.
Drop this. Taginfo should not have any problem with this markup as the description is really intended to be displayed in HTML (including ion the Wiki pane of Taginfo).
If you need plain text in some summary table showing only one line, use HTML code filtering (but be aware that this will break descriptions or even some languages: not all text can be encoded in HTML only as plain text.
Nobody wants to drop this basic markup, except the TagList site itself (even if really does not need this "requirement" for its "wiki" information pane) !!! At least you should allow inline markup (including coloring, bold, italic, sub, sup, external links and wikilinks, line breaks, and some description also need numbered lists and bulleted lists, symbols not encoded in Unicode such as small road signs).
I've seen people dropping markup on the wiki and then creating meaningless descriptions. — Verdy_p (talk) 04:54, 26 November 2016 (UTC)
One of the longstanding problems with OSM is that there is no "one" description of tags that everybody can use in their software. Most software dealing with OSM tags has their own description for each tag (and needs also translations for that into every language). This has often been seen as a problem and many people have asked for a single source they can use. The only source for this that makes sense to me is the wiki. But using the descriptions from the wiki is very difficult if we don't constrain the format a bit. First, it is difficult to get this description out, but even if we could, all the markup, links to images, etc. will not work in every context. So it totally makes sense to restrict the description (and we are only talking about the one-sentence description in the infobox) to plain text. I can see no reason why this description should have markup and I see many benefits as described. Taginfo is only the "intermediate" goal here. But if taginfo can parse more of these descriptions, more programs can use them easily through the taginfo API. Again, this concerns only the one-line description in the infobox. For everything more, we have to link to the full text in the wiki anyway. Joto (talk) 10:19, 27 November 2016 (UTC)
I disagree, basic inline markup is also useful in single line description (and frequently needed for some languages).
If you just want to format datatables with only plain text (which may become non meaningful as this is destructive), it is very trivial for you to parse inline HTML or wiki markup, not a lot of them are permitted on the wiki (br, b, i, em, var, sub, sup, code, tt, span, all supported on all websites and wikis, and only three Mediawiki markups for italics, bold, and links).
Notably the italics and code/tt are frequently needed for critical semantic and linguistic distinctions (they are essential in description lines where they should not be deleted), as well as interwal wikilinks or external links with URLs.
Note also that some languages will need the use of some HTML character entities (such as nbsp, or for facilitating the input or edit) and some <!--comments-->. Here also this is basic HTML markup that no website whould have problem to parse correctly. Only data forms may seem "polluted" if these are not parsed but rendered as is. These markups are safe (no security problem), except possibly external links (you may want to check the URLs an restrict them, or place a warning alert box before going to random external sites, but this wiki has a policy on the usable URLs to avoid spammers that would post polluting links going to rogue sites). — Verdy_p (talk) 16:06, 27 November 2016 (UTC)

has positional parameter

"In general, wiki templates can have positional parameters and named parameters. The description templates only use named parameters. When you see this error, it usually means that the taginfo parser got confused. Try to clean up the template parameters."

Here also the analysis is almost always broken: you only detect pipe characters within wikilinks present in descriptions or braces used when calling a formatting template (e.g. links to wikipedia or wiktionnary).
For description fields, keep a large freedom of markup, it has never been meant to be only one-line plaintext, even if it is intended to be a short summary.
In other words: fix your wiki code parser, don't convince random users to change the wiki and break many contents. — Verdy_p (talk) 05:31, 26 November 2016 (UTC)
Yes, this error is often a result of the description or some other field containing some wiki syntax. Unfortunately this is difficult to detect correctly, so this error message is difficult to interpret. Joto (talk) 10:41, 27 November 2016 (UTC)

invalid lang parameter

"The lang parameter should have the format xx (for example de for the German language) or xx_XX (for example pt_BR for Brazilian Portuguese)."

Wrong. the format should use hyphens (the OSM and BCP47 standard). BCP47 accepts underscores, but only because of legacy Java locale codes. So "zh-Hans" is the correct and standard form, just like "fr-CA" ! You don't need to force the old broken Java locale codes (still used in its old ResourceLoader) for everyone: even Java now supports the BCP47 standard! Note that on the OSM wiki, all locales codes are using BCP 47 conforming codes (with only "DE,FR,ES,IT,JA,NL,RU" locale codes using uppercase letters, for legacy reasons in these 7 wiki namespaces, and other codes starting by a single uppercase letter, all other letters being lowercase only, including in "De-ch" which is not a wiki namespace even if it is still German) — Verdy_p (talk) 05:02, 26 November 2016 (UTC)
You are right, we should use BCP47 here. I have fixed the description and will fix the taginfo code. Joto (talk) 10:39, 27 November 2016 (UTC)

wrong lang format

"The language in the wiki page name should be of the format xx (for instance de for the German language), or xx_XX (for instance pt_BR for Brazilian Portuguese). Capitalization doesn't matter."

Wrong! the language codes can already have 6 forms on the wiki: ll (e.g. "FR" or "Ca"), or lll (e.g. "Vec"), or ll-cc (e.g. "De-ch" or "Ro-md" or "Pt-br", not recommended and in fact deprecated), or lll-cc (e.g. "Tzm-ma", not recommanded too), or ll-ssss (e.g. "Zh-hans"), or lll-ssss (e.g. "Shi-latn").
More details in Template:Langcode that parses language codes currently admitted in wiki page names.
BCP47 (and OSM data as well) allows for longer codes, but they are still not used for naming translated wiki pages; however all valid BCP47 standard codes are accepted in various language parameter values to be used in pages that are partially multilingual in their listed examples or citations).
However legacy non-standard language codes still used by Wikipedia are not accepted in OSM data and the wiki (such as "roa-rup"; or "nrm" which is completely wrong and conflicting in Wikipedia and Wikidata where it should be "nrf"; or "en-simple", or "de-formal" which are also invalid; as well "sr-ec" and "sr-el" still used in Wikipedia are conforming syntaxically, but completely wrong semantically as they should be "sr-cyrl" and "sr-latn"). Note that "zh-yue" is both conforming and valid in BCP47, but deprecated and should be replaced by the preferred value "yue"; and "zh-classical" is both invalid and non-conforming and MUST be replaced by "lzh".
Wikipedia also supports a single "zh" language code for naming its wiki (merging "zh-hans" and "zh-hant" into a single Wikipedia edition), but only because it locally supports an automatic Hans/Hant transliterator, rarely supported elsewhere in applications and not supported on the OSM wiki; so "zh-hans" and "zh-hant" are distinguished on the OSM wiki and in OSM data. — Verdy_p (talk) 05:19, 26 November 2016 (UTC)
Looking at my code this was already checking the hyphen and not underscore. I have corrected the description. Joto (talk) 10:47, 27 November 2016 (UTC)