Talk:Taginfo/Parsing the Wiki

From OpenStreetMap Wiki
Jump to navigation Jump to search

Please don't use this page to report bugs or other issues with the taginfo software, they will just get lost. Please report them on the bug tracker at https://github.com/taginfo/taginfo/issues .

Wrong analysis

The following are generally NOT errors and you should not instruct users to alter the content of the wiki, when these are just limitations of the TagInfo website, not even justified technically! — Verdy_p (talk) 09:06, 26 November 2016 (UTC)

description parameter should only contain plain text

"The description parameter containing the short description of this key, tag, or relation type should only contain plain text, not wiki syntax. This is important so that taginfo, but also other software outside the wiki, can use this text properly."

This analysis is COMPLETELY wrong.
A description DOES need to contain basic markup for various languages, and semantic markup such as "code", "br", "sup", "sub", or sometimes even small images/icons/diagrams.
Drop this. Taginfo should not have any problem with this markup as the description is really intended to be displayed in HTML (including ion the Wiki pane of Taginfo).
If you need plain text in some summary table showing only one line, use HTML code filtering (but be aware that this will break descriptions or even some languages: not all text can be encoded in HTML only as plain text.
Nobody wants to drop this basic markup, except the TagList site itself (even if really does not need this "requirement" for its "wiki" information pane) !!! At least you should allow inline markup (including coloring, bold, italic, sub, sup, external links and wikilinks, line breaks, and some description also need numbered lists and bulleted lists, symbols not encoded in Unicode such as small road signs).
I've seen people dropping markup on the wiki and then creating meaningless descriptions. — Verdy_p (talk) 04:54, 26 November 2016 (UTC)
One of the longstanding problems with OSM is that there is no "one" description of tags that everybody can use in their software. Most software dealing with OSM tags has their own description for each tag (and needs also translations for that into every language). This has often been seen as a problem and many people have asked for a single source they can use. The only source for this that makes sense to me is the wiki. But using the descriptions from the wiki is very difficult if we don't constrain the format a bit. First, it is difficult to get this description out, but even if we could, all the markup, links to images, etc. will not work in every context. So it totally makes sense to restrict the description (and we are only talking about the one-sentence description in the infobox) to plain text. I can see no reason why this description should have markup and I see many benefits as described. Taginfo is only the "intermediate" goal here. But if taginfo can parse more of these descriptions, more programs can use them easily through the taginfo API. Again, this concerns only the one-line description in the infobox. For everything more, we have to link to the full text in the wiki anyway. Joto (talk) 10:19, 27 November 2016 (UTC)
I disagree, basic inline markup is also useful in single line description (and frequently needed for some languages).
If you just want to format datatables with only plain text (which may become non meaningful as this is destructive), it is very trivial for you to parse inline HTML or wiki markup, not a lot of them are permitted on the wiki (br, b, i, em, var, sub, sup, code, tt, span, all supported on all websites and wikis, and only three Mediawiki markups for italics, bold, and links).
Notably the italics and code/tt are frequently needed for critical semantic and linguistic distinctions (they are essential in description lines where they should not be deleted), as well as interwal wikilinks or external links with URLs.
Note also that some languages will need the use of some HTML character entities (such as nbsp, or for facilitating the input or edit) and some <!--comments-->. Here also this is basic HTML markup that no website whould have problem to parse correctly. Only data forms may seem "polluted" if these are not parsed but rendered as is. These markups are safe (no security problem), except possibly external links (you may want to check the URLs an restrict them, or place a warning alert box before going to random external sites, but this wiki has a policy on the usable URLs to avoid spammers that would post polluting links going to rogue sites). — Verdy_p (talk) 16:06, 27 November 2016 (UTC)

has positional parameter

"In general, wiki templates can have positional parameters and named parameters. The description templates only use named parameters. When you see this error, it usually means that the taginfo parser got confused. Try to clean up the template parameters."

Here also the analysis is almost always broken: you only detect pipe characters within wikilinks present in descriptions or braces used when calling a formatting template (e.g. links to wikipedia or wiktionnary).
For description fields, keep a large freedom of markup, it has never been meant to be only one-line plaintext, even if it is intended to be a short summary.
In other words: fix your wiki code parser, don't convince random users to change the wiki and break many contents. — Verdy_p (talk) 05:31, 26 November 2016 (UTC)
Yes, this error is often a result of the description or some other field containing some wiki syntax. Unfortunately this is difficult to detect correctly, so this error message is difficult to interpret. Joto (talk) 10:41, 27 November 2016 (UTC)

invalid lang parameter

"The lang parameter should have the format xx (for example de for the German language) or xx_XX (for example pt_BR for Brazilian Portuguese)."

Wrong. the format should use hyphens (the OSM and BCP47 standard). BCP47 accepts underscores, but only because of legacy Java locale codes. So "zh-Hans" is the correct and standard form, just like "fr-CA" ! You don't need to force the old broken Java locale codes (still used in its old ResourceLoader) for everyone: even Java now supports the BCP47 standard! Note that on the OSM wiki, all locales codes are using BCP 47 conforming codes (with only "DE,FR,ES,IT,JA,NL,RU" locale codes using uppercase letters, for legacy reasons in these 7 wiki namespaces, and other codes starting by a single uppercase letter, all other letters being lowercase only, including in "De-ch" which is not a wiki namespace even if it is still German) — Verdy_p (talk) 05:02, 26 November 2016 (UTC)
You are right, we should use BCP47 here. I have fixed the description and will fix the taginfo code. Joto (talk) 10:39, 27 November 2016 (UTC)

wrong lang format

"The language in the wiki page name should be of the format xx (for instance de for the German language), or xx_XX (for instance pt_BR for Brazilian Portuguese). Capitalization doesn't matter."

Wrong! the language codes can already have 6 forms on the wiki: ll (e.g. "FR" or "Ca"), or lll (e.g. "Vec"), or ll-cc (e.g. "De-ch" or "Ro-md" or "Pt-br", not recommended and in fact deprecated), or lll-cc (e.g. "Tzm-ma", not recommanded too), or ll-ssss (e.g. "Zh-hans"), or lll-ssss (e.g. "Shi-latn").
More details in Template:Langcode that parses language codes currently admitted in wiki page names.
BCP47 (and OSM data as well) allows for longer codes, but they are still not used for naming translated wiki pages; however all valid BCP47 standard codes are accepted in various language parameter values to be used in pages that are partially multilingual in their listed examples or citations).
However legacy non-standard language codes still used by Wikipedia are not accepted in OSM data and the wiki (such as "roa-rup"; or "nrm" which is completely wrong and conflicting in Wikipedia and Wikidata where it should be "nrf"; or "en-simple", or "de-formal" which are also invalid; as well "sr-ec" and "sr-el" still used in Wikipedia are conforming syntaxically, but completely wrong semantically as they should be "sr-cyrl" and "sr-latn"). Note that "zh-yue" is both conforming and valid in BCP47, but deprecated and should be replaced by the preferred value "yue"; and "zh-classical" is both invalid and non-conforming and MUST be replaced by "lzh".
Wikipedia also supports a single "zh" language code for naming its wiki (merging "zh-hans" and "zh-hant" into a single Wikipedia edition), but only because it locally supports an automatic Hans/Hant transliterator, rarely supported elsewhere in applications and not supported on the OSM wiki; so "zh-hans" and "zh-hant" are distinguished on the OSM wiki and in OSM data. — Verdy_p (talk) 05:19, 26 November 2016 (UTC)
Looking at my code this was already checking the hyphen and not underscore. I have corrected the description. Joto (talk) 10:47, 27 November 2016 (UTC)

Tracking the count of errors

Over the past few days, I've tried to address a bunch of these, and I've got it down just over 500 as of today. I thought it'd be interesting to keep a running total here. So I'll do so, below. JesseFW (talk) 17:08, 14 May 2023 (UTC)

2023-05-14

137	wrong lang format
128	has positional parameter
67	slash in key
65	parsing failed
36	slash in value
32	description parameter should only contain plain text
31	invalid lang parameter
14	value in key page
9	no value for tag page
6	invalid osmcarto-rendering parameter
1	lang is en
----
112	en
526 total

2023-06-15

Same.

2023-05-16

137	wrong lang format
128	has positional parameter
67	slash in key
59	parsing failed
36	slash in value
34	description parameter should only contain plain text
31	invalid lang parameter
14	value in key page
9	no value for tag page
6	invalid osmcarto-rendering parameter
1	invalid image parameter
1	lang is en
----
115	en
523 total

2023-05-17

137	wrong lang format
124	has positional parameter
67	slash in key
42	parsing failed
37	slash in value
34	description parameter should only contain plain text
29	invalid lang parameter
14	value in key page
9	no value for tag page
6	invalid osmcarto-rendering parameter
1	invalid image parameter
1	lang is en
----
113	en
501 total

2023-05-18

137	wrong lang format
124	has positional parameter
67	slash in key
39	parsing failed
37	slash in value
34	description parameter should only contain plain text
29	invalid lang parameter
14	value in key page
9	no value for tag page
6	invalid osmcarto-rendering parameter
1	lang is en
----
112	en
497 total

2023-05-19

137	wrong lang format
124	has positional parameter
67	slash in key
40	parsing failed
37	slash in value
34	description parameter should only contain plain text
29	invalid lang parameter
14	value in key page
9	no value for tag page
6	invalid osmcarto-rendering parameter
1	lang is en
----
112	en
498 total

2023-05-20

137	wrong lang format
124	has positional parameter
67	slash in key
37	slash in value
36	parsing failed
34	description parameter should only contain plain text
29	invalid lang parameter
14	value in key page
9	no value for tag page
6	invalid osmcarto-rendering parameter
1	lang is en
----
110	en
494 total

2023-05-21

137	wrong lang format
122	has positional parameter
67	slash in key
37	slash in value
34	description parameter should only contain plain text
32	parsing failed
29	invalid lang parameter
14	value in key page
9	no value for tag page
3	invalid osmcarto-rendering parameter
1	lang is en
----
103	en
485 total

2023-05-22

137	wrong lang format
122	has positional parameter
67	slash in key
37	slash in value
34	description parameter should only contain plain text
33	parsing failed
29	invalid lang parameter
14	value in key page
9	no value for tag page
1	lang is en
----
105	en
483 total

2023-05-23

137	wrong lang format
106	has positional parameter
67	slash in key
37	slash in value
34	description parameter should only contain plain text
33	parsing failed
29	invalid lang parameter
14	value in key page
9	invalid osmcarto-rendering parameter
9	no value for tag page
1	lang is en
----
91	en
476 total

Missed a couple of days...

2023-05-26

137	wrong lang format
96	has positional parameter
67	slash in key
37	slash in value
33	parsing failed
31	description parameter should only contain plain text
29	invalid lang parameter
14	value in key page
9	invalid osmcarto-rendering parameter
9	no value for tag page
1	lang is en
----
91	en
463 total

Excluding the unsolvable ones, that just leaves the following:

96	has positional parameter
33	parsing failed
31	description parameter should only contain plain text
29	invalid lang parameter
----
189 total

2023-05-27

137	wrong lang format
96	has positional parameter
67	slash in key
37	slash in value
33	parsing failed
31	description parameter should only contain plain text
15	invalid lang parameter
14	value in key page
9	invalid osmcarto-rendering parameter
9	no value for tag page
1	lang is en
----
90	en
449 total

Excluding the unsolvable ones, that just leaves the following:

96	has positional parameter
33	parsing failed
31	description parameter should only contain plain text
15	invalid lang parameter
----
175 total

2023-05-28

137	wrong lang format
88	has positional parameter
67	slash in key
37	slash in value
32	parsing failed
31	description parameter should only contain plain text
15	invalid lang parameter
14	value in key page
9	invalid osmcarto-rendering parameter
9	no value for tag page
1	lang is en
----
89	en
440 total

Excluding the unsolvable ones, that just leaves the following:

88	has positional parameter
32	parsing failed
31	description parameter should only contain plain text
15	invalid lang parameter
----
166 total

2023-05-29

137	wrong lang format
88	has positional parameter
67	slash in key
37	slash in value
33	parsing failed
31	description parameter should only contain plain text
14	value in key page
9	invalid osmcarto-rendering parameter
9	no value for tag page
5	invalid lang parameter
1	lang is en
----
89	en
431 total

Excluding the unsolvable ones, that just leaves the following:

88	has positional parameter
33	parsing failed
31	description parameter should only contain plain text
5	invalid lang parameter
----
157 total

2023-06-02

137	wrong lang format
67	slash in key
64	has positional parameter
37	slash in value
33	parsing failed
31	description parameter should only contain plain text
14	value in key page
9	invalid osmcarto-rendering parameter
9	no value for tag page
4	invalid lang parameter
1	lang is en
----
89	en
406 total

Excluding the unsolvable ones, that just leaves the following:

64	has positional parameter
33	parsing failed
31	description parameter should only contain plain text
4	invalid lang parameter
----
132 total

2023-06-30

Haven't updated these in a while, but made a LOT of progress, including taginfo fixes and corrections on the wiki.

65	slash in key
37	slash in value
20	has positional parameter
12	description parameter should only contain plain text
10	wrong lang format
9	non-file osmcarto-rendering parameter
6	parsing failed
3	invalid lang parameter
1	lang is en
----
74	en
163 total

Excluding the unsolvable ones, that just leaves the following:

20	has positional parameter
6	parsing failed
12	description parameter should only contain plain text
3	invalid lang parameter
----
41 total

2023-07-27

This is pretty much at the minimum (until/unless we get the slash ones addressed). If you notice any higher than this, they can probably be easily fixed.

65	slash in key
38	slash in value
14	has positional parameter
12	description parameter should only contain plain text
10	wrong lang format
9	non-file osmcarto-rendering parameter
6	parsing failed
2	invalid lang parameter
1	lang is en
----
77	en
157 total

2023-08-04

65	slash in key
38	slash in value
12	description parameter should only contain plain text
10	has positional parameter
10	wrong lang format
9	non-file osmcarto-rendering parameter
6	parsing failed
1	invalid value for onRelation parameter
1	lang is en
----
74	en
152 total

2023-08-09

65	slash in key
38	slash in value
11	description parameter should only contain plain text
10	wrong lang format
9	non-file osmcarto-rendering parameter
7	has positional parameter
6	parsing failed
1	lang is en
----
74	en
147 total

2023-09-04

65	slash in key
39	slash in value
11	description parameter should only contain plain text
10	wrong lang format
9	non-file osmcarto-rendering parameter
7	has positional parameter
6	parsing failed
1	lang is en
----
74	en
148 total

Sadly, we've got one more (unfixable, because it doesn't distinguish between redirects and not) "slash in value" case.

Solving slash in key / slash in value issues?

How should "slash in key" and "slash in value" issues be addressed? If it ignored redirects, that would be a way to fix it, but it doesn't -- which just leaves deleting the pages, which seems like excessive, and unhelpful. @Joto:, any thoughts? JesseFW (talk) 20:53, 14 May 2023 (UTC)

The value in key page and no value for tag page problems are similar to this, in that it would be useful if this only counted non-redirects, and until it does, these are pretty much unsolvable by non-admins. JesseFW (talk) 20:00, 26 May 2023 (UTC)

Added to the issue tracker, here: https://github.com/taginfo/taginfo/issues/416 -- JesseFW (talk) 22:38, 17 June 2023 (UTC)

"wrong lang format" -- bug in taginfo

Most of the "wrong lang format" issues are due to a bug in taginfo. It assumes all language codes can only be two letters, while the actual standard allows for some 3 letter codes, too. (Described here among other places.) This is the source code line that needs to be updated. I can make a PR if desired. JesseFW (talk) 02:16, 15 May 2023 (UTC)

Added to issue tracker: https://github.com/taginfo/taginfo/issues/417 -- JesseFW (talk) 22:40, 17 June 2023 (UTC)

invalid image/osmcarto-rendering parameter -- needs to support non-images

As has been documented on the wiki template page since 2016, the osmcarto-rendering and osmcarto-rendering-size parameters (and the type-specific versions) can include non-image links to more detailed descriptions. taginfo should be modified to accept this, and not warn about it. @Joto: I'll see about making a PR if desired. JesseFW (talk) 23:39, 22 May 2023 (UTC)

Added to issue tracker: https://github.com/taginfo/taginfo/issues/418 -- JesseFW (talk) 22:41, 17 June 2023 (UTC)