User:Arne Johannessen/Evaluation of tunnel name tagging practise

From OpenStreetMap Wiki
Jump to navigation Jump to search

In a discussion on the [tagging] list regarding the use of name tags on road tunnels, a recent message quotes some TagInfo numbers. The author seems to imply some particular conclusion, but none is explicitly mentioned.

In context of the discussion thread, those particular numbers don't appear to be meaningful. This post describes an attempt to get numbers that are more relevant to the discussion.

Analysis shows that both of the tagging variants under discussion are very common, and that significant regional differences exist. Consequently, neither variant should be dismissed out of hand.

Context of the Discussion

Questions under Discussion

Questions being considered in the discussion are:

  • Is it typical for a road tunnel name to be in common use?
  • How should a commonly used road tunnel name be tagged in case the road is unnamed and there is no man_made=tunnel object?
  • How should a commonly used road tunnel name be tagged in case the road is named and there is no man_made=tunnel object?
  • Should common names be duplicated in name=* and tunnel:name=* in order to help avoid ambiguity?

The phrase "common name" (and variants of it) is used a lot during the discussion, and also in this text. It is meant to be understood in the context of the wiki page for name=*, which currently reads (emphasis mine):

Note that OSM follows the On the Ground Rule. Names recorded in name=* tag are ones that are locally used, […]

It should be the most prominent signposted name or the most common name actually used to refer to a given object, […]

It follows that according to OSM policy, a tunnel name that is actually signposted at the tunnel entrance is in "common use" by definition, and should normally be tagged using name=*. Where no tunnel name is signposted, local knowledge is required to determine whether or not a common name for the tunnel exists (which may or may not be the case).

Significance of TagInfo Usage Numbers

A recent message in the discussion on [tagging] mentioned the following TagInfo usage numbers:

highway=* – 178 000 000
highway:name=* – 188
tunnel:name=* – 12 815

In context of the discussion thread, I don't believe these numbers are that meaningful.

At the very least, when arguing how frequently those *:name tags are used on tunnels, one would need to consider tunnel objects, not highway objects, if one even wanted to entertain the notion that these numbers were meaningful.

tunnel=yes – 940 000

Even then, 12 815 uses of tunnel:name=* are little more than 1 % of all tunnel objects, which certainly doesn't bear out a position that use of this tag is common.

Furthermore, editor presets have a significant impact on usage numbers. At least one editor has a preset for road tunnels that includes tunnel:name=*, but not highway:name=*. This means that the disparity in usage numbers between these two tags are to be expected and can't easily be used to argue that using one of them is correct or incorrect.

(I don't want to use this post to judge such editor presets. While several instances of problematic presets are well known to have occurred in the past, whether or not this is another one of them is out of scope here.)

So, if there is a conclusion to be drawn from the TagInfo numbers at all, then I don't atm see what it is. There appears to be a need for more relevant numbers.

Methodology

Questions for Analysis

Rather than looking at just how often certain tags are used, it seems more relevant to consider how they are used. To that end, let's ask the following questions:

  • For how many road tunnels is the tunnel's name actually tagged in some way (as opposed to just tagging the road's name)?
  • How many such road tunnels have the tunnel's name placed into name=*, and how many have it in tunnel:name=*?
  • How many of the latter have no name=* at all?
  • How many of all road tunnels have no name=* at all?
  • For how many road tunnels is the value of name=* duplicated in tunnel:name=*?
  • How many road tunnels have the road's name tagged, and how many of these have the tunnel's name as well?

To answer these questions, we first have to make a number of assumptions.

Counting Criteria and Assumptions

It turns out that counting road tunnels is not easy. Obviously, we look for highway=* and tunnel=yes. But that also includes features such as underground parking. Excluding highway=service as well as paths etc. probably solves that issue in most cases. Furthermore, some tunnels are broken into several ways, e. g. because of intersections or changing speed limits. While these cases probably could be accounted for by analysing the topology, they are probably sufficiently rare that they don't affect the overall numbers too much.

To simplify things, tunnel relations are not considered. (There are less than 600 of those globally.)

This leaves us with the following criteria for a road tunnel:

  • is a way
  • highway=* matches residential|unclassified|tertiary|secondary|primary|trunk|motorway
  • tunnel=yes

That a tunnel has no name=* at all could mean that OSM is incomplete, or that it is part of a tunnel relation. It could also mean that there is in fact no common name for it at all. In theory, the following criteria should cover the latter scenario:

Identifying a name as a tunnel name, specifically, should be relatively easy, because many tunnel names actually include the term "tunnel", or some variant of it in the local language. In particular, testing with OSM data seems to confirm that the following regex matches almost every tunnel name in Scandinavian and German languages:

[Tt]unn?el|[Pp]ort(?:en)?\b|lokket\b|[Gg]alerie\b|Unterführung\b

Identifying Names as Road Names

Unfortunately, identifying a name as a road name is much harder, because of the large number of different generics (road, street, ave, lane, way etc.), some of which are quite rare.

A naïve approach might simply consider all names that can't be positively identified as tunnel names to be road names. However, some tunnel names in OSM actually leave out the phrase "tunnel" intentionally (e. g. Kirchenwald(tunnel)). (Many of these might be automatically detectable tagging errors, but it's not advisable to fix these without a new survey; see below.)

In the case of Germany, the principle of Grundwortanalogie suggests that "Straße" should be the most common generic for roads of some importance; based on the author's personal experience, this seems plausible. At the same time, given the cost of tunnelling, unimportant roads are probably less likely to be tunnelled.

Consequently, it might be fair to limit tunnel road names to "Straße" for German. For the Scandinavian languages, the same considerations result in "vej", "vei", "veg" and "väg". While this certainly will not catch all cases, it should give us a decent sample. Testing with OSM data confirms that this hypothesis is probably correct, with around 75 % of names in question successfully identified as road names.

Testing also shows that of the remaining generics, a few appear more often than others. When the following list of generics (including some spelling variants) is used, the success rate for Scandinavian and German increases to around 90 %, which seems good enough for our purposes:

  • veg, gate, strand, dal, bakke, allé, sti
  • Straße, Autobahn, Weg, Gasse, Graben, Allee, Ring, Umfahrung, Tal, Tangente, Bogen, Mühle

With this set of criteria, we are in a position to get some answers.

Results

The criteria defined above were used to evaluate two regions: One a large portion of southern Scandinavia (primarily including parts of NO+SE), the other an area centred on the northern Alps (primarily including parts of DE+AT+CH). Both of these areas are known to include a large number of both major and minor tunnels in various urban and rural settings.

The evaluation was performed using a short Perl script written for this purpose.

These are the results for the southern Scandinavia area:

Total number of road tunnels:     3080
Road tunnels with tunnel name:    1130 (37 %)  /  1950 without (63 %)
   Tunnel name in name=*:          846 (75 %)
   Tunnel name in tunnel:name=*:   308 (27 %)
   Tunnel name in both:             15 (1 %)
Road tunnels with road name:      1142 (37 %)  /  1938 without (63 %)
   Both road and tunnel name:      190 (17 % of named roads)
Road tunnels with name=*:         2100 (68 %)  /  980 without (32 %)
   No name=*, but tunnel:name=*:    86 (8 % of named tunnels)
Road tunnels with noname=yes:        2
Road tunnels with name that couldn't be matched:  111 (4 %)

These are the results for the northern Alps area:

Total number of road tunnels:     3002
Road tunnels with tunnel name:     624 (21 %)  /  2378 without (79 %)
   Tunnel name in name=*:          265 (42 %)
   Tunnel name in tunnel:name=*:   486 (78 %)
   Tunnel name in both:             64 (10 %)
Road tunnels with road name:      1371 (46 %)  /  1631 without (54 %)
   Both road and tunnel name:       66 (5 % of named roads)
Road tunnels with name=*:         1825 (61 %)  /  1177 without (39 %)
   No name=*, but tunnel:name=*:   273 (44 % of named tunnels)
Road tunnels with noname=yes:       13
Road tunnels with name that couldn't be matched:  180 (6 %)

Aggregating both regions yields:

Total number of road tunnels:     6082
Road tunnels with tunnel name:    1754 (29 %)  /  4328 without (71 %)
   Tunnel name in name=*:         1111 (63 %)
   Tunnel name in tunnel:name=*:   794 (45 %)
   Tunnel name in both:             79 (5 %)
Road tunnels with road name:      2513 (41 %)  /  3569 without (59 %)
   Both road and tunnel name:      256 (10 % of named roads)
Road tunnels with name=*:         3925 (65 %)  /  2157 without (35 %)
   No name=*, but tunnel:name=*:   359 (20 % of named tunnels)
Road tunnels with noname=yes:       15
Road tunnels with name that couldn't be matched:  291 (5 %)

Analysis and Discussion

Going through the results, the following points stand out:

  • Two thirds of road tunnels in the test areas have name=* tags, one third don't. noname=yes is very rare.
    According to the description of name=* on this wiki, lack of both name=* and noname=yes indicates that it is not currently known whether or not these road tunnels have common names or not, and that further surveying is required. However, the very fact that there are so many objects without noname=yes indicates that these may be incomplete or tagging mistakes of some kind. This is supported by common knowledge that noname=yes isn't used everywhere it should be used, as it is a slightly more advanced tag. (It may not be advisable to try and fix these without a new survey; see below.)
  • Only two in five road tunnels have a road name tagged in some way.
    This fact is not surprising, as tunnels are known to be often used for motorways, major rural thoroughfares etc. for which people tend to primarily use route numbers rather than road names.
  • Between 21 % and 37 % of road tunnels have a tunnel name tagged in some way. Tunnels in Scandinavia are almost twice as likely to have their name tagged than tunnels in the Alps region.
    This fact is surprising, considering that OSM data often seems more complete in Germany than it does in e. g. Norway. It's possible that tunnels are generally considered more important features in Scandinavia and therefore more often are officially named, but this is pure speculation.
  • The practise of tagging the tunnel name in tunnel:name=* while leaving name=* empty is common in the Alps region (44 % of named tunnels), but unusual in Scandinavia (8 %).
    • Since names that are in common usage are supposed to use name=*, this would appear to support the earlier supposition that tunnels are more often considered important in Scandinavia than in the Alps region.
    • However, given the context of this analysis, it is also quite possible that these numbers are misleading due to high numbers of tagging errors (e. g., not tagging common names using name=*). This supposition is supported by the fact that even the Neuer Elbtunnel, which is located outside the study area but is also one of the most well-known road tunnels in all of Germany, doesn't use name=*. This is clearly a tagging error (but it may not be advisable to try and fix it without a new survey; see below).
  • 75 % of tunnel names that are tagged in the Scandinavia region use name=*.
    78 % of tunnel names that are tagged in the Alps region use tunnel:name=*.
    Given these high numbers, it doesn't seem appropriate to dismiss either method out of hand.
    I may have done so in one of my messages to [tagging], and I'd have to retract that in light of this new information. --Arne Johannessen (talk)
  • Duplicating tunnel names into both name tags is unusual, with 1 % (Scandinavia) to 10 % (Alps).
    While rarity by itself is not enough to conclude that a particular tagging practise shouldn't be used, these numbers do suggest that so far, few mappers seem to find it necessary to duplicate the name in both tunnel:name=* and name=*. Coming from the assumption that almost all tunnel names include the term "tunnel", it is only logical that typically, no ambiguity is perceived.
  • It is rather unusual for a road tunnel to be tagged with both the road name and the tunnel name (overall just 10 % of those tunnels with road names). It's more unusual in the Alps region (5 %) than in Scandinavia (17 %).
    • The first one of these two facts is somewhat surprising. It might indicate that roads inside named tunnels rarely have their own unique name (or vice-versa). It might also indicate that in cases where both the road and the tunnel do have unique names, mappers tend to find one of these names to be of negligible importance.
    • The second fact makes some sense given that every single tunnel on Norwegian highways has an official name, which is often signed at the tunnel entrance. However, it still seems surprising that tagging both names is so rare in the Alps region.

Conclusions

Results of the Analysis

We've learned that:

  • Both tagging variants (tunnel name in name=* vs. tunnel:name=*) are very common.
  • There are significant regional differences regarding which of these variants is more common.

Consequently, neither of these two tagging variants should be dismissed out of hand.

Furthermore, we've learned that tunnel names are more frequently tagged (be it with name=* or tunnel:name=*) in southern Scandinavia than they are in a region centred on the northern Alps. We don't know the reason for this, but we can surmise that it is in part due to tagging mistakes and in part due to differences in the way that names are actually used for road tunnels in these different regions.

We've also learned that it's unusual for a road tunnel to be tagged with both a tunnel name and a road name. When we combine this fact with the tiny number of highway:name=* tags globally, we can surmise that mappers often consider the names of roads unimportant inside named road tunnels. This seems very plausible, given that road names are of very little practical use inside of tunnels (no addresses etc.).

Answers to Questions under Discussion

In the Context section at the top of this article, the following questions were noted as being under discussion on [tagging]:

Is it typical for a road tunnel name to be in common use?
For the first question, our analysis is inconclusive. However, we can conclude that according to OSM data, it isn't untypical for a road tunnel name to be in common use.

At this point, I feel obliged to point out again the OSM policy of ground truthing: If a name appears on the ground, for example on a road sign, then that is the preferred name to use. In other words: If a tunnel name is signed at the tunnel entrance, then it's in "common use" per definition and therefore normally belongs into name=*. (However, lack of a sign does not necessarily mean that there is no common name. Local knowledge is normally required to resolve such cases. Trust the local community!)

How should a commonly used road tunnel name be tagged in case the road is unnamed and there is no man_made=tunnel object?
This question has already been answered in the Context section: Commonly used names should be recorded in name=*. Our analysis has shown that this type of tagging is in fact frequent.
How should a commonly used road tunnel name be tagged in case the road is named and there is no man_made=tunnel object?
This question cannot be easily answered based on our analysis. Generally, common tunnel names should be recorded in name=* (see above), but local knowledge would be required to determine how a tunnel is most commonly referred to locally. The analysis does suggest that it may not necessarily be wrong to allow for some regional variation in tagging practise.
Should names be duplicated in name=* and tunnel:name=* in order to avoid ambiguity?
We've learned that so far, it's unusual for a road tunnel to have its name tagged in both name=* and tunnel:name=*. If such a practise were to be recommended, it should probably be limited to cases where it is not immediately obvious that name=* is in fact the tunnel's name (as opposed to the road's name).

Further Study

  • It would be interesting to apply this same approach to more regions. However, mappers with local knowledge would be needed to adjust the regular expressions that match on tunnel names and road names accordingly.
  • The analysis provides several statistical facts that could not immediately be explained to satisfaction. It might be useful to take a closer look at these.
  • The analysis suggests that certain types of tagging errors may be common in OSM. However, as noted in the description of name=*, when dealing with names "common usage" must be considered. This requires local knowledge of some form, which armchair mappers often might not have.
      For example, consider the Neuer Elbtunnel. At time of this writing, it doesn't have a name=*, even though it clearly should have one. However, it's not immediately clear whether the correct value would be "Neuer Elbtunnel" or simply "Elbtunnel". Given that this particular tunnel name appears on road signs, first-hand knowledge of that signage should ideally be used to obtain an opinion (ground-truthing). This may typically be more of a job for the local community, rather than one for armchair mappers, even for a tunnel as high-profile as this one. For smaller tunnels, this is all the more true.
      Another example would be the Kirchenwald(tunnel). It's tagged with tunnel:name=Kirchenwald, which at first glance may appear to be an abbreviated from of "Kirchenwaldtunnel". However, it's fully possible that this tunnel is in fact locally known simply as "the Kirchenwald". It's also possible that signage at the tunnel entrance never actually spells out the word "Kirchenwaldtunnel". Only a person with some local knowledge would be in a position to determine the correct tagging here. Just that it looks like a mistake doesn't mean it is one.
      Trust the local communities, and remember that local knowledge is one of the strengths of OSM: We have mappers everywhere.