Talk:Proposed features/Waterways classification

From OpenStreetMap Wiki
Jump to: navigation, search

Comment from Carnildo

None of the proposed methods of classifying waterways seems appropriate for OSM:

  • "Classic" can be determined from an incomplete data set, which makes it possible to add it to OSM. But by the same token, an end-user can easily compute it from the OSM dataset, which means we shouldn't include it. (We don't include things like the lengths of roads or the areas of farms for exactly this reason.)
  • "Strahler" and "Shreve" require a complete set of a river's tributaries to compute. The vast majority of OSM's river networks are incomplete, so unless a mapper has an ODbL-compatible outside source, they can't add correct information. In the cases where the network is complete, the same objection to the "classic" ordering applies.

Yes, we need a way of differentiating the Mississippi River from the Elm River, or distinguishing a stream you can cross without breaking stride from one where you're going to get your feet wet, but this isn't it. Posted by User icon 2.svgFanfouer (on osm, edits, contrib, heatmap) on behalf of System-users-3.svgCarnildo (on osm, edits, contrib, heatmap) message on @tagging mailing-list.

Thanks for your comment (and for posting it here). I hope you can get through the captcha, because otherwise it will be hard to discuss.
1. I doubt that computing would be "easy", but it's better if we check it. We can compute many things, like centroid of shape, but for example city labels are sometimes placed manually (like Sidney) for some reason (because it's more predictible, precise etc.). I bet it's not always possible to tell if the river goes to the sea through a bay or lake and human decision would be needed.
2. None of this method require complete data set. In classic you can start with rivers going to the sea, with Strahler/Shreve you can start with the sources. This is still a data that one can use.
3. For some of them we already know the numbers and guess what happens if somebody comes with such complete data set for one river she knows well? She will create ad hoc tags, other than somebody else, so we will have no way to compare them all. That's why having a tagging scheme is useful even for optional data.
--Kocio (talk) 14:50, 14 August 2017 (UTC)


Why require mappers to calculate a river order, while this information can be added with a script on a PostGIS database? I'd prefer something subjective, like splitting "river" into three subtypes. Like river=small, river=big and river=major. --Zverik (talk) 10:33, 14 August 2017 (UTC)

1. I believe that it's not that easy, but would be happy if somebody make an actual implementation of it to test our intuitions. In case of osm-carto, we have lua preprocessing available, but I'm not a programmer.
2. The hardest part is defining what is small/medium/big river. That's why I gave an example so we could start with something reasonable, but there are many people who require to have objective criteria, so we might end up discussing it for years and see no gain.
--Kocio (talk) 10:52, 14 August 2017 (UTC)
I'll answer 2, since 1 would require a better programmer, like User:Komяpa. We can start with a precise numeric limits for these, ranking by length (e.g. 200 and 1500 km) and width (e.g 10 km from the estuary, 100 m and 1 km). Mappers haven't complained about such limits for setting a place=* tag. --Zverik (talk) 11:23, 14 August 2017 (UTC)
This is when using the complex measures gets tricky:
1. What if river has medium length, like 1000, but big width, like 15 km (or the other way around), how should we decide then? Maybe it's logical OR (any value that meets this limit)?
2. People expect these values to be somehow significant. I was proposing to define a hill with something like max 300 m, and the response was - "why 300 m limit?"
Roads classification was done so early, that we just accept it, as highly subjective and complex as it is, and make agreements how these rules apply to local road system (see my analysis of primary road definition). --Kocio (talk) 12:56, 14 August 2017 (UTC)
It is to be decided. I'm suggesting a general way of tagging, and we can refine it in a proposal. Regarding significance, everyone was okay with place types defined as round numbers (see history). I chose 200 km because of some rivers I know, and 1500 km after reading wikipedia:List of rivers by length. --Zverik (talk) 14:19, 14 August 2017 (UTC)
I'm ready to take part in developing your proposition in parallel to this one. Could you start with a draft, so we could discuss details further? --Kocio (talk) 14:23, 14 August 2017 (UTC)
I can, but next week: got to prepare for SotM now. --Zverik (talk) 14:39, 14 August 2017 (UTC)
Great, it will still be as important by then as now. =} You can also discuss it there to know what people think about it. --Kocio (talk) 14:42, 14 August 2017 (UTC)
Did that: Proposed features/Rivers Classification. --Zverik (talk) 12:17, 29 August 2017 (UTC)

Consistency requires avoiding manual input

Systematic stream order schemes should be calculated automatically regardless of whether or not it is easy or even possible. Manual input runs the risk that inconsistent stream order numbers will be used (e.g. due to different data sources). As a result, an upstream segment might be rendered even if a downstream river segment is not, which is clearly undesirable. If the inconsistent number is not on adjacent segments, then it might not even be noticed. Even if the inconsistent numbering is on adjacent segments, a large number of segments may need to be changed as a result, and it may be unclear which order number is correct/which data source should be used.

In answer to other issues:

  • Determining where a river flows involves testing whether its end node is part of another object. This has notably been done in: Depending on how the river network is segmented, the classic order corresponds to the 'level' column in e.g.
  • Shreve and Strahler could of course be computed using any available data at a point in time. The issue is that the stream order will change every time new tributaries are added. That makes the data difficult to use. We don't know whether a river is order=1 because it really has no tributaries and does not need to be rendered, or because data is missing, and it is actually an important river to render.
  • If the only reason for doing this is to allow cartographic generalisation, then I don't think small/big/major will necessarily be effective either. I think the necessary information is already (incompletely) in OSM in the form of network structure, naming, width, and links to wikidata to allow use of other properties. It might be worth looking further into best practice in cartographic generalisation.

--JoeG (talk) 17:12, 20 August 2017 (UTC)

"Systematic stream order schemes should be calculated automatically regardless of whether or not it is easy or even possible." - it sounds very strange for me. We're using different QA tools which check consistency of data, so maybe this is what we would need software for - as a helper.
  • Thanks for the link to osm-tests, but for me it's still not the final data I could use for rendering for example. Some proof of concept is needed to know if we really don't need manual tagging.
  • It's absolutely possible to have complete third-party data about waterways which are not yet completely drawn in OSM.
  • This is how we use roads classification already, even if in theory we could determine it automatically using their network structure and other properties.
--Kocio (talk) 20:22, 20 August 2017 (UTC)

Width vs Stream order

For comparison: Annual discharge of US rivers

I fully support the idea, that a more fine-grained waterway classification system is needed. But I don't think, that the stream order is well-suited for this task.

For example the Missouri River has a Strahler order of 9 and the Ohio River just 8 ([1]), despite the Ohio has a much higher average discharge than the Missouri (7,957 m3/s vs 2,478 m3/s). And the Ohio River is even wider than the Mississippi at their confluence!

I'm not sure, but aren't rivers in mountaineous rivers with a dendritic river network quicklier getting higher stream orders than rivers originating in lowlands with less confluences?--Ethylisocyanat (talk) 18:22, 18 August 2017 (UTC)

All of these systems are used for different things. Looking at the definitions, Shreve is better correlated to discharge volumes and Strahler is meant for other uses. It's good to have more of them to cover more needs.
But the fact is we don't know too much yet and I hope we'll try different approaches, including computing order by software. --Kocio (talk) 21:57, 18 August 2017 (UTC)
@Kocio It would be great to have a map with annotated stream orders as example. I've tried with NHDplus, but wasn't able to process it. Do you have any example ready?--Ethylisocyanat (talk) 12:08, 19 August 2017 (UTC)
That would be great, indeed, but I'm not aware of such map. --Kocio (talk) 21:27, 19 August 2017 (UTC)

GeoKitten's Opinion: This is unnecessary and would not contribute to OSM

I'm against this proposal because I don't see it as necessary. As far as stream order is concerned, this could easily be calculated from existing data, assuming it's reasonably complete. For tagging the importance of rivers as big, medium, or small, I don't agree with this because it's highly subjective and not verifiable. For determining the importance of rivers, I speculate that existing data like length of the river relation or width=* could be used.

This proposal comes from a discussion about rendering, and I don't see this tagging system as useful for much other than rendering. I certainly appreciate the hard work of the OSM-Carto developers but I don't agree with this tagging proposal because it seems like a whole tagging system designed for the renderer. GeoKitten (talk)

Thanks for your comment!
Well, this is a feeling that we have no proof for yet and I feel the other way around (but there's no proof for my feeling too). Completeness is not the only thing that matters:
  • There is an example of a network (in the middle) which can be flawed for automatic calculation, but closer to reality - of course we don't tag boundaries of lakes and rivers as waterways, but split waterways and waterways not continued within lakes are popular problems. A Dutchman is warning about waterways complexity in his country (see here).
  • Do you mean width at which point? Only river mouth width is quite interesting, but it's not a property of the whole river.
  • Rendering is where the problem is - well... - easily visible, but I hear that other people would like to have rivers classification. Even if osm-carto finds a way to render rivers, it will still be the same struggle for the others (see this opinion). It can be useful for any hydrological analysis beside rendering maps (showing map is of course the most popular usage, and for a reason), but you can ask what for are classifications of roads and cities? Beside, "tagging for rendering" is a name of general problem, which means cheating (in short: "Don't deliberately enter data incorrectly for the renderer") and classification is about generalization, not fake data.
--Kocio (talk) 01:11, 25 August 2017 (UTC)

Where would the information be coming from?

Where would the average mapper retrieve the stream order information to add to OSM, and how would other people verify it? They can't get it from other maps because of copyright reasons. Getting it from the geometry in OSM would not make any sense, because then it would be better to calculate automatically. It's definitely not verifiable on-site. --Pbb (talk) 14:06, 28 August 2017 (UTC)

Some persons claim that automatic calculation would be better or easier, but we have no proof of this yet and there are serious doubts about it (look here for some arguments). I'm not sure who can collect such data, but scientific data looks quite likely to be a source with no copyright restrictions. --Kocio (talk) 11:52, 29 August 2017 (UTC)
BTW: Paul Norman (who is one of the main coders behind osm-carto) also thinks that automatic order calculation wouldn't be easy. --Kocio (talk) 19:50, 29 August 2017 (UTC)
I agree that automatic calculation won't be easy, and has many pitfalls. That's why I specifically looked at the issue from the opposite direction, what are the challenges when entering the data manually. And the main problem to me seems to be that there is no way to observe or validate this information on-site. Do you have any information about for how many waterways scientific data about the order is available? If this is for, say 1% of all the worlds waterways, then that would be too little to be of use. --Pbb (talk) 14:47, 30 August 2017 (UTC)
I don't know, I'm just guessing. But whatever the numbers are - even 1% - it would be "too little" for what? Of course for rendering world rivers, but we don't know what use may people make with the data. I can imagine full data about one river that can be interesting for analyzing that particular river, because the stream order is used for hydrology in general. --Kocio (talk) 15:00, 30 August 2017 (UTC)