Talk:Namespace

From OpenStreetMap Wiki
Jump to navigation Jump to search

The Problem

The current tagging methods produce data which does not contain very obvious contextual information since the object type and object attributes are mixed together. For example, consider the following two ways:

  • natural=cliff, rock=limestone, height=25
  • climbing=route, rock=limestone, height=25

Are both these objects of the same type (rock=limestone), or is one describing a cliff and one describing a climbing route? Without prior knowledge of the tags used, it is impossible to tell which tags are describing the object type, and which are merely describing attributes of the object.

Additionally, a tag does not necessarily have a single globally defined meaning, which means you must know the context in order to understand it. This is especially important when doing things like looking up the meaning of tags in the wiki.

It causes even bigger problem for database queries, such as using Overpass API, because in some cases, the only way to query objects with desired properties is to include very complex context definition in query, such as geometry conditions (say, "nodes only") and key list conditions. It affects the performance of query pretty much.

Possible Solutions

Prefixes

One option is to prefix tags with some context information (so called "namespacing"). For example, you could have:

  • climbing=crag, climbing:crag:name=Great Tor
  • climbing=route, climbing:rock=limestone, climbing:height=25
  • piste=lift, piste:lift:occupancy=5

This means that each tag will have a single defined meaning rather than being dependent on the context in which it is used.

Pros

  • The tag name contains all the context data you need, making searching for a specific tag easier.
  • Software for viewing/editing the data can "fold away" groups of tags which the user has no interest in - e.g. for a building, the user probably cares about tags such as building:name= but not the architectural information in building:architecture:*=.

Cons

  • More typing (although auto-completion on the editors mitigates this to some extent).

A type= Tag

Another option is to define a tag which always describes the object type for every object. e.g.:

  • type=climbing:crag, name=Great Tor
  • type=climbing:route, rock=limestone, height=25
  • type=piste:lift, occupancy=5

The value of the "type" tag would identify the schema of all tags on the object. As in the example, above, some kind of namespacing within the value of the type tag may be beneficial since it can make the tag more descriptive and groups together related data.

Pros

  • Tag names are short
  • Relations already tend to use a "type" tag, so this fits in with those.

Cons

  • Similar to the old class system used in OSM's infancy - this was eventually replaced with the current system, although I am not clear on the reasons why (maybe someone could write an explanation?)
  • Type term in names or suffixes doesn't bring any new information. It's useful to adopt more precise terms as to avoid confusion in possible values

A Hybrid Solution

A third option is to combine the above ideas: have a "type" tag and keep the important attributes un-namespaced. Less important attributes can be grouped into namespaces to allow the viewers/editors to fold them away. For example, information about a building's architecture can be put in tags prefixed with "architecture:".

Pros

  • Common tag names are short whilst allowing less important data to be grouped together and "folded away" by viewers/editors.

Technically correct solution

Namespace is not just a prefix - it's a tool of abstraction. The most profitable case for it is in new tagging schemes, when we proposing not only a single tag, but system of tags. In this case, we could use namespace for isolation of keys within a single scheme (or several specific schemes).

For example, there is usage=* tag, which has completely different meaning in several schemes (see Taginfo for it). There is no any sense in querying objects by this tag only, because in terms of abstraction, it describes several different classes. To avoid situations like that, we still can introduce this tag in any new scheme, but we have to use namespace for it. So, it will look like newscheme:usage=*

In the same time, as it described above, it's important to avoid excessive use of namespace, when new key describes similar properties as it did before.

For example, it would be completely wrong to propose "newscheme:operator=*" - operator=* key has pretty universal meaning. Therefore, it must not be isolated within new scheme.

Comments

I currently consider the hybrid option to be the best idea, but let the discussion commence :) -- Steve Hill 19:45, 16 May 2008 (UTC)

I like the namespacing idea but the type-tag idea has the problem that you would often have to create the same object several times if it belongs to more then one "type", and to avoid a lot of duplicate entries you would then need to regroup these objects, e.g. with relations. The hybrid concept would IMHO lead to endless discussions which tags are "important" and which are better kept in namespaces. I'd go for prefixes (we already do this) -- Dieterdreist 23:25, 13 January 2011 (UTC)
I do like the prefix method. I found this page after publishing this mail:

Since I updated the Dutch Feature Page I analyzed the tagging system and found out a typically evolution in the way OSM is tagging map objects.

1) <Key>=<Value>: Simple tagging system. Every new Tag has to be created. Every Tag and every value has to be documented. You vcan not be creative in making new combinations.

2) <Key>=<Value>, <Key> : <Subkey> = <Value>: Advanced tagging system. Besides the old fashion Key=value tags different Sub Keys are approved. Those subkeys are Telling something about the value of the main tag. In our feature list a lot of sub keys already defined, mostly unknowing it was in fact a subkey.

Examples:

  • Main Key’s: Highway, waterway, Historic, Traffic_calming, shop
    • Highway=unclassified
    • cycleway=lane
  • Sub key: maxspeed, right, left, surface, width
    • Highway:surface=asphalt
    • cycleway:surface=paving_stones
    • cycleway:right=fanced

In advanced tagging there is a specific difference between main and sub tags and therefor can solve problems like:

  • Telling something about the right and left lane of the road.
  • Telling something about night and day time access rules.

These are just two examples.

By introducing "NameSpacing" / "Advanced Tagging" more different elements of the object are taggable. --ZMWandelaar 06:51, 14 January 2011 (UTC)

noun order is a mess

Okay, so what is the order? <key>:<subkey>, you say, right?

How do you tag a seasonal barrier, like a huge bush?

Is the seasonality the subkey of the barrier? Or is the barrier is one subcase of seasonality?

Oh, I see you have an answer. So does it match abandoned:shop=*,source:maxspeed=* or access:bicycle=*? Enlighten me. ;-) --grin 12:04, 5 July 2014 (UTC)

Technically, namespace is always a prefix, giving a context for key. And it's not key and subkey, it's namespace and key. Say, we are going to propose two imaginary tagging schemes with same "type" key. These two "types" have completely different meaning and values. So, we should use scheme1:type=* and scheme2:type=* respectively. Namespace is a technical term from XML technology. See namespace tutorial at W3Schools for details.--BushmanK (talk) 20:21, 14 December 2014 (UTC)

This is not about XML namespaces. Namespaces in OSM keys can be prefix or infix and not always easy to decide which one it should be. For a few established namespaces it has already been established by convention. RicoZ (talk) 12:33, 27 March 2015 (UTC)

Nomenclature/" infix is used only once in Tags."

Anyone recalls what this was supposed to say? RicoZ (talk) 21:25, 2 August 2015 (UTC)

Is it clearer now?--Jojo4u (talk) 19:41, 4 August 2015 (UTC)
Yes, that is clear. Not sure if it is true and if it makes sense to make the postfix/infix distinction. Isn't it so that every suffix could end up as an infix if there are 2 or more suffixes useful/required to describe a special situation? RicoZ (talk) 09:33, 5 August 2015 (UTC)

Over-namespacing and Prefix-fooling

I'd like to backup the section "Over-namespacing" and even add "Prefix-fooling".

IMO values in namespaces/prefixes/suffixes should be "considered harmful", especially those having values yes/no like "service:bicycle:retail=yes/no". See https://lists.openstreetmap.org/pipermail/tagging/2018-December/041650.html .

There are already many bad examples around - like the mentioned "service:bicycle" or recycling - and there are now many Tag proposals under way which need to be stopped if possible in a constructive way. —Preceding unsigned comment added by Geonick (talkcontribs) 27 December 2018‎

I read your post and it has some good points but I am not seeing any that would convince me that the practice should be avoided at any cost. We have problems with tools that don't have much support for namespaces and we have problems with tools that have zero support for multi-values and the k/v data model is simply too restricted to have an elegant solution for everything. RicoZ (talk) 23:31, 27 December 2018 (UTC)
I have listed about six points and tried to explain that these are technical no-go's. This prefix-fooling is not "key and value" anymore, it's "value in the key"! These are not just "problems" with tools, but plain wrong data structuring in computer science. --Geonick (talk) 22:31, 4 January 2019 (UTC)
I think rather than a list of negative points it would help to have a list of proposed alternative solutions with detailed examples. RicoZ (talk) 19:44, 28 December 2018 (UTC)
It's always good to point to options - given enough time. But having said this, remind that these are in fact not negative points, these are no-go's. --Geonick (talk) 22:31, 4 January 2019 (UTC)
Just for your info : service:vehicle [[1]] entries introduced by the ID admins, documentation quite "hidden" Key:service:vehicle. First noticed May 2018 [[2]] Meanwhile quite a mess of different "phantasy" values due to lack of structure / documentation.
rtfm Rtfm (talk) 22:49, 4 January 2019 (UTC)
Thnks for the hints. So I had to contemplate again and refined my concerns about namespaces. See again the thread I mentioned above. See https://lists.openstreetmap.org/pipermail/tagging/2019-January/041884.html . --Geonick (talk) 23:29, 6 January 2019 (UTC)
Seems like much of it is concerning the ID editor, bad coordination and bad documentation. Oh well.. some time ago one of these editors invented tags like name_1 .. _17 and alt_name_1 and so fort and that was later fixed by using namespaces instead;) Obviously the data model is very bad at making good solutions intuitive.
Again, the best way to move forward is to document good alternatives. RicoZ (talk) 21:04, 11 January 2019 (UTC)
Geonick, I believe that calling these downsides "no-gos" is too strong. It's not a black-and-white situation, as using semicolon-separated lists of values also has its share of downsides, for example:
  • There's no canonical representation: foo;bar;baz, foo;baz;bar, baz;foo;bar and so on all mean the same thing.
  • One cannot easily set "no" values. While I'm not a fan of tagging defaults, there isn't always a clear default – and then both "yes" and "no" ought to be available.
  • It's harder to create usage statistics. If I want to find out how many recycling facilities are tagged as accepting batteries, I can look on the taginfo page for recycling:batteries=yes. This wouldn't be possible with semicolon-separated values. (Yes, that's can be solved with smarter tools, but so can many of the downsides you mention.)
When there's a potentially unbounded set of values such as with brands in that thread a few months ago, I'm in full agreement with you. But for a small, finite set of "flags", I do not believe that it's incorrect to model them as boolean attributes rather than a list of values. --Tordanik 16:17, 28 March 2019 (UTC)
It seems hard to explain the consequences of modelling a "set of values" as a "set of separate keys with boolean - mostly yes - values", which look like they belong together when they are namespaced - but only for the human. So, unless somebody proves me wrong that some common-place operations on those are impossible (searching a "value"-substring in namespaced key), unpredictable (regex) or order by magnitude slower (regex), I'm calling this a show-stopper. This hurts, because we have an proven alternative with semi-colon separated values. Let me answer to the 3 cons you mentioned above:
  • "There's no canonical representation: foo;bar;baz, foo;baz;bar, baz;foo;bar...". Since this is about unordered sets, there is in fact no mandated order (although editors and consumers easily can order these sets). Any SW in the OSM ecosystem has predicates and operators to compare sets in a single function call that confirms that "foo;bar;baz" is equal to "foo;baz;bar". Now, how would one do this comparison in the yes-valued namespaced version (somekey:foo=yes, somekey:bar=yes, somekey:baz=yes)? One has to lookup some preset file (someone had to update before hand) which tells us what yes-valued namespaced keys belong together, then doing the comparison on each three times. Or one tries a regex on "somekey:*" - which is definitely order by magnitude slower than the semi-colon-separated alternative. Not to forget the unnecessary risk catching unwanted keys with the regex.
  • "One cannot easily set "no" values." I think we agree, that mapping something inexistent is a rare situation. In semi-colon separated values one can model this as value "no_foo". And defaults belong to user interfaces, if at all: defaults in data models are bad practices.
  • "It's harder to create usage statistics.". To stick to the example of "foo;bar;baz": It's harder to get at a statistics of "recycling:paper=yes; recycling:batteries=yes; recycling:aluminium=yes" or even a statistics of all 139 recycling-key occurrences. --Geonick (talk) 21:34, 28 March 2019 (UTC)

;-lists vs namespacing and yes/no values

A vote has recently been expressed on Power substation functions proposal against over-namespacing the proposal would introduce.
To sum up, it is proposed, among other points, to use tags like substation:transformation=no on any power=substation (facility) with no power=transformer (device) seen inside.
It was raised it's better to not use yes/no values and replace all keys by a single substation=* key getting a ;-list of values depending of what is installed inside the perimeter.
This logic is not restricted to power stuff, you may do the same for running tracks inside stadiums for instance. Some have and some haven't.

I didn't get how a ;-list solve the indetermination of missing features in OSM. A missing feature may not actually exist or exist and not be mapped.
How should a value missing in the list be interpreted? The corresponding feature may be missing (and should be mapped) or may doesn't exist at all.
Explicit no values on surrounding perimeters may solve this issue like no other paradigm might.

Here is a summary of possibilities and use cases. Tags are given for surrounding perimeter, not inside features
Use case ;-list with no namespace Namespaces and boolean flags Comments
Two different features exist in the perimeter domain_key=feature_category_A;feature_category_B domain_key:feature_category_A=yes + domain_key:feature_category_B=yes Apparently no differences and namespaces seem useless and overkill
Two different features exist in the perimeter and an expected third is not seen in place domain_key=feature_category_A;feature_category_B domain_key:feature_category_A=yes + domain_key:feature_category_B=yes + domain_key:feature_category_B=no ;-list is equivalent to first situation and the third feature is implicitly missing. Does it exist or not?
A given perimeter contains no feature while 2 categories are expected domain_key=- (empty) domain_key:feature_category_A=no + domain_key:feature_category_B=no Again, only yes/no and namespaces are suitable to set that expected features don't exist in place.

Not to mention yes' values may be default and avoided if corresponding features are casual in observed perimeters.
My point deals with 2 independent concepts, namespaces and yes/no values. Namespaces are used because there are several subcategories to deal with and yes/no values are introduced to explicitly give indications to mappers that they shouldn't look for any expected feature in place.
No point to deal with all OSM features on every perimeter, only expected ones obviously. The proposal from which the discussion started deals with around 10 subcategories Fanfouer (talk) 20:49, 20 March 2019 (UTC)

Just some answers on this data modeling discussion:
  • First from a conceptual point of view, starting to map what is NOT there, opens the pandora's box of tagging discussions. It's about starting an endless argumentation what else could be mapped what is expected by someone and should be tagged (according to his expectaions) NOT to exist at an object.
  • Second from a technical point of view of data modeling: Putting values in keys, like in over-namespaced proposals (especially those with yes/no), is just a no-go in data-modeling. Just compare this given a tag on an object like this "substation_group=valueA;valueB;valueC" versus the three tags needed in over-namespaced version "substation:pseudogroup:valueA=yes", "substation:pseudogroup:valueB=yes", "substation:pseudogroup:valueC=yes". Now lets try to filter objects tags containing valueA and valueC of substation_group. In the properly modeled version you can simply say "if 'substation_group' contains '[valueA,valueC]'". In the over-namespaced version one has to write "if 'substation:pseudogroup:valuea=yes' and if 'substation:pseudogroup:valuec=yes'" (look at the smell that people tend to think it's substation:pseudogroup:valueA with upper case which is a no-go for key names!). In addition one has to lookup before if these tags belong together. And even worse: You can't search if any key from "substation:pseudogroup:*=*" exists without exhausting coding and lookup. I wrote "pseudogroup" because in db management systems it's crazy to think it's possible to easily group by a substring of a key (=attribute) name. That's why the idea of over-namespacing is a no-go in data management and detrimental to OSM.
  • Third from a users perspective: JOSM shows these just below each other in an endless list. The iD editor shows both versions in a user-friendly way - but in case of the over-namespaced version only those, where someone (i.e. the maintainer of iD) adds to the iD config file by hand(!) that substation:pseudogroup:valueb, substation:pseudogroup:valueb and substation:pseudogroup:valuec belong together!! (all values from the ";" separated list naturally and automatically belong to the same key).
  • Fourth: Another consequence of over-namespaced keys (which can be avoided by a single key with semi-colon separated values): Namespaced keys need to be configured by hand in the preset file by an editor (a maintainer, in iD as "yes/no-multicombo fields") in order to show up as a single array of check boxes. As opposite to semi-colon separated tags, which need no special preset entry. So, these "over-namespaced yes/no-multicombo field entries" lead to unnecessary "noise" in the preset file. In fact it's because namespaced keys group values baked into namespaced keys - they don't group different keys (as presets are supposed to do). --Geonick (talk) 18:30, 22 March 2019 (UTC) (edited)
Citing from a recent conversation on OSM IRC: "Nearly every name spaced key/tag could be replaced with a simpler tag which would work as well. See discussion on "generator:place" on the forum. We could just use the location tag; name:etymology replace with etymology (cause it's not a name & etymology applies to names & so forth)". And "We have lots of structured values in OSM and there is no reason _not_ to use semi-colons. But their use should be documented in a half consistent way." --Geonick (talk) 20:41, 22 March 2019 (UTC)
you mix 3 issued in one and the result is horrible.
* namespace : yes indeed name:etymology may be shortened into etymology and it's better to change generator:place into the common location key (it's sad that you weren't there when several users proposed removing of unneeded namespace for hydrants... the same argument leads to contradictory results between the proposals). So with this vision, yes, probably a lot of substation: could be removed but if this had been done first, perhaps there would have been a large number of opponents with the argument of grouping subkeys together.
* =no : calling "pandora's box" is very light and the box is already open. see toilets=no. If you can expect something and it is absent, some contributors inform it. this is valid for exceptional cases as well as for more frequent cases (e. g. sidewalks=no). how 'll you encode it with a "; list" that a restaurant doesn't have toilets without any key=no ?
* coding, preset, db lookup : it depends so much on the tool used that the same argument can be used in the opposite direction. try to search the key "foo" for the value "b" with taginfo (= return objects with foo=b or foo=othervalue;b or foo=b;othervalue but not foo=bar) and compare with foo:b=yes. I'm not saying that we should favour one or the other for this reason, I'm just saying that the same argument works both ways depending on the context. so let's ignore this variable argument and focus on the 2 previous points.
Marc marc (talk)
Marc marc, Ad namespace: "Grouping" subkeys together with sub-namespaces is a pointless attempt to "tag for the user" (similar to "tag for the renderer), since sub-namespaces are just strings of an attribute and do not group in a machine-readable way - unless you (i.e. the iD/JOSM maintainer etc.) tell it explicitely. And again, this grouping requirement ist not because namespaced keys are "real keys", but just because in fact these yes/no-tags share the same "(sub-)key".
* Ad no: I'm referring to yes/no-namespaced tags. And yes, Pandora's Box is open - and it's up to us to close it again. At least don't open it even more.
* Ad coding, preset, db lookup: I'd like to bring in some informatics basics here: 1. Searching for a substring of a key (where substring is part of an attribute name) is a problem for _all_ computer languages in the OSM eco system (from Python, C, C++, PHP, .NET, LUA, to DBMS - you name it). 2. Searching key "foo" for the value "b" (as part of a semi-colon separated value, like values of key "cuisine") is a well-known exercise in computer languages where one uses "predicates". In a DBMS (e.g. PostgreSQL) this is a one-liner: "SELECT osm_id FROM osm_point WHERE string_to_array((tags->'foo'),';') @> array['b']" which is even supported by indexes. --Geonick (talk) 22:28, 24 March 2019 (UTC) (edited)
Ad coding, preset, db lookup: Note to ourselves: we need to patch TagInfo so it can count semicolon'd entries after splitting. This should probably be shown on a new tab next to "Values" with a name to the effect of "Multivalues" or "Semicolon split values". Bkil (talk) 06:33, 25 March 2019 (UTC)
* Ad no: How can we be freed from indetermination expressed at the start of the discussion? e.g: groups=groupA;groupC, groupB is missing. Should I look for it or it doesn't exist on ground? No solution is provided so far. Fanfouer (talk) 23:47, 24 March 2019 (UTC)
Fanfouer: As already mentioned above, tagging what's not there is a "data modeling smell". Some, like [3], even say it's fundamentally wrong. So, given you can give enough evidence, that the non-existence of something is clear and understandable worldwide, like sidewalk=no (which 1. has no namespace(!) and 2. helps not being urged to duplicate the geometry of a highway), then one can think of how to tag this. One could e.g. tag it key=valueA;valueB;novalueB;valueC where of course valueB and novalueB must not exist in same semi-colon separated values (just like with the overnamespaced version, where such inconsistencies are even harder to find!). --Geonick (talk)
I'm not happy that you consider only the fundamentaly wrong point of view without looking to usecase I provide before and after. Explicitly tag that there is no transformer to look for in a substation is useful as for preventing QA to warn you (or volunteer mapper to look for) that 99% expected power=transformer devices are missing in a given facility (enough evidence: won't be seen on aerial imagery, explicitely stated in public documentation or directly on ground). The same as sidewalk=no. Regarding key=valueA;valueB;novalueB;valueC, how to guarantee that novalueB is the exact negation of valueB and not another independant value? Fanfouer (talk) 00:21, 25 March 2019 (UTC)
How to guarantee that novalueB is the exact negation of valueB? With exactly the same means as with the namespaced value and key: by documenting it in the wiki. And even better: with chances are smaller that people write nonevalueB, as with the yes/no-namespaced version. And be aware that this discussion is about yes/no-namespaced tagging attempts - and sidewalks are neither namespaced nor yes/no only. I BTW just found "authentication:none=no", which shows how absurd yes/no-over-namespaced tags can get. -- Geonick (talk) 20:43, 25 March 2019 (UTC)
I guess we could introduce a negation symbol that does not occur in values as a prefix (for example, minus (-), exclamation point (!) or tilde (~)). We could define certain tags on the given wiki page such that we mandate indicating the nonexistence of these as well. This should be introduced gradually, supporting both form in consumers. Bkil (talk) 06:33, 25 March 2019 (UTC)
You mean something like key=valueA;-valueB;valueC? Sound interesting, though I'm sceptic and have to think about this. --Geonick (talk) 20:43, 25 March 2019 (UTC)
I think we can come to a conclusion now: We have two proposals, of how to handle the edge case of negated values. And searching for a substring of a yes/no-valued key (where substring is part of an attribute name, like "valueb" in key "substation:pseudogroup:valueb") is a problem for all computer languages in the OSM ecosystem (while searching of semi-colons separated values is part of modern languages and databases). So, there are _fundamental_ reasons to avoid namespaced keys, and there is _no_ reason not to use semi-colons (if the key and it's main values are documented - like every key should be). --Geonick (talk) 20:43, 25 March 2019 (UTC)

List of namespaced tags with yes-values

The following is a list of yes/no-(over-)namespaced tags currently mentioned in this wiki and/or currently in use in OSM. The list was originally entitled "What about these?" and compiled by user Bkil.

Although this one had a reasonable purpose of describing performance, most values seem to be "yes":

Some correctly documented ones are misused quite often:

TagInfo reveals many more combinations for the above documented ones, but here exist undocumented ones too:

Bkil (talk) 11:02, 24 March 2019 (UTC)