Proposal talk:GTFS Tagging Standard

From OpenStreetMap Wiki
Jump to navigation Jump to search

Identifying feeds

Resolved
Resolved: The proposal now states that they are managed on a dedicated wiki page

I don't understand how feed codes are managed. Where is the association between a feed code and its URL stored, and who manages the creation/updates of feed codes? --Hypo808 (talk) 10:03, 30 November 2023 (UTC)

Currently the feed codes are created by ToniE, and each feed in PTNA has a shell script to get the URL (to allow for redirects / web scraping).
With this proposal I want to move this into OSM, where the feed relation contains both the URL and the feed code to refer to that feed.
Spaanse (talk) 10:18, 30 November 2023 (UTC)

Neccessity

Resolved: cascading membership seems to address most of these concerns
Resolved: the feed relation has been removed from the proposal, replaced by the GTFS feeds wiki page

Can you explain why a relation is needed? If anything, there is already the also dubious type=network , unless there are different GTFS feeds for the same transit system.
—— Kovposch (talk) 09:08, 9 November 2023 (UTC)

A relation is needed because it provides a single location for information of the feed. I have added a section detailing more reasons. A GTFS feed does not necessarily cover the same objects as a type=network. For example the Dutch GTFS feed covers the entirity of the Netherlands, of which the concessions mentioned as examples of type=network are only a small part. Spaanse (talk) 09:42, 9 November 2023 (UTC)
The main problem is you seem to be suggesting to include every transit object in your type=gtfs_feed . This will not be welcomed. Can you at least limit it to including type=network as an intermediate level, which is further nesting the type=route_master ?
The infrastructure should be contained in public_transport=stop_area and public_transport=stop_area_group . I don't have any ideas on how to relate them, if at all. Or they are already related in the type=route inside type=route_master inside type=network . (You can see why both methods have issues)
—— Kovposch (talk) 09:47, 9 November 2023 (UTC)
There is the alternative of listing this data on a wiki page , or file somewhere externally. This is metadata.
—— Kovposch (talk) 09:52, 9 November 2023 (UTC)
I have added cascading membership to the proposal. Now elements only need to be included if they are not part of any relation that is already included. Spaanse (talk) 11:23, 9 November 2023 (UTC)

Size of relation

Resolved: added cascading membership
Resolved: the feed relation has been removed from the proposal

On Relation#Size they mention that there is a technical limit on the size of a relation at 32000 elements. I was not aware of this limit, which could be an issue for a GTFS feed relation. OVApi-nl already has double that number of stops. Spaanse (talk) 09:48, 9 November 2023 (UTC)

Please see the above.
—— Kovposch (talk) 09:49, 9 November 2023 (UTC)

Link and system format

Resolved: feed url tag changed, expanded characterset to printable ASCII, further format discussion considered not relevant for the proposal (descriptive instead of prescriptive)

First of all, let's use colon *:url=* instead of underscore *_url=* . This is again more standard.
Would it be possible to ask PTNA (sorry do you represent them, or are communicating with them?) etc to consider changing the format? It's not good style to mix the hyphen from the ISO subdivision standard with the delimiter for the system. It is confusing on how to interpret the suffix reliably. Surely there may be national, trans-subdivision, or unassigned subdivision systems with the commonly recognizable code or first-level component being 2-char around the world? How are they assigned?
And where does ref=ovapi-nl come from?
—— Kovposch (talk) 09:51, 9 November 2023 (UTC)

I chose gtfs:feed_url=* because it was already used (Winnipeg). I am fine with gtfs:url=*.
I do not represent PTNA, though have been talking with ToniE on the forum who seems to maintain PTNA.
I did not know there was a ISO subdivision standard, will change that.
National systems can use the national code, so DE for germany. I suppose that trans-subdivision tags can use the country tag. Cross-country can use standardised codes in some cases (like EU), but otherwise maybe use the country code that has the majority of the service. I want to note that I do not see the format as part of the proposal, only that it should only use A-Z, a-z and dashes.
The tag ref=ovapi-nl comes out of my back pocket. If I were to actually make that relation I would leave it out since it is not an official ref. It is mainly to exemplify that it may be different from gtfs:feed=*. Spaanse (talk)
-
Yep, I'm maintaining PTNA. IIRC gtfs:feed_url=* can be seen only with CA-MB-WP (Winnipeg, Manitoba) where they coded the URL into gtfs:feed=* which was already suggest by PTNA for other purposes.
PTNA makes some suggestions on what to tag on route_master and route relations (example for a bus in Winnipeg).
I did choose the ISO 3166-2 Standard for naming the GTFS feeds in PTNA, plus a freely chosen, but reasonable suffix like "WT" (CA-MB-WT), "MVV" (DE-BY-MVV). However, that does not apply for: CH-Alle, NO-Hele, NL-Nationaal, ... where the GTFS data describes all traffic in that country; and AT-* where it does not make sense to include the 'state', the GTFS data describes all traffic in that state. However, this mapping/naming can be changed easily.
--ToniE (talk) 11:30, 9 November 2023 (UTC)
Oh sorry I forgot to explain clearly. I wanted to suggest using colon eg DE-BY:MVV=* . This avoids mixing the ISO and OSM standard. My motivation is from route=road network=* and cycle_network=* , where it is very disorganized with the opposite situation of only colons being used. It becomes"impossible" to tell what each part is for without knowing the codes already.
Is it really necessary to add a suffix if it is national? *=CH , *=NO , *=NL can do? As someone who doesn't know the language and data structure, I will have to look up "Alle", "Hele", "Nationaal" means first.
Indeed, the numeric subdivision codes defined in the ISO standard is one of the ugly parts. So what I believe is using colons for the systems will allow dropping them clearly.
—— Kovposch (talk) 13:57, 9 November 2023 (UTC)
Yeah, I struggled with the hyphens also (DE-BY-DING-SWU, ...) and where to stop mapping '-' to a '/' in the file/directory structure. ':' would be possible to use but might be mapped internally again to '-' for file storage.
"CH" instead of "CH-Alle" would be OK. Just moving some files from CH/Alle/* to CH/* and some minor adjustments in the code.
I don't think that we can drop '03' from 'NO-03-Ruter' just because it is numeric. There might be a second 'Ruter' in a different state/part of Norway. Or do I misunderstand you?
--ToniE (talk) 14:45, 9 November 2023 (UTC)
Hmm, "Ruter" may be unfortunate if it is too generic. Can you follow Linz AG to use "Ruter AS"? But see my observations below. NO:RuterAS , NO:Ruter_AS , or something else.
—— Kovposch (talk) 14:54, 9 November 2023 (UTC)
For the record, comments for some issues as I glanced through your list. I'm not sure whether this is considered the scope of this proposal, but they can be considered if GTFS is to be used more widely in OSM, and in the development of future systems.
  • Spaces: AU-QD-South-East-Queensland doesn't fit to use hyphens. Throws off the hierarchy. Many alternatives. Otherwise, gets worse in FR-IDF-transdev-ile-de-france-conflans , FR-PAC-Alpes-de-Haute-Provence , FR-IDF-transports-daniel-meyer , and FR-PAC-Bandol-et-Sanary-sur-mer due to the language.
  • System vs municipality, abbreviating : You have both CA-MB-WT and CA-ON-Burlington-Transit . I suppose not using "BT" is avoiding Brampton Transit also in Ontario? But then depending on what it means, "Transit" doesn't add much. Is it spacing "Burlington Transit" (see spacing issue above) , or somehow the "Transit" system in Burlington? Either case, if there is only one system in the metropolitan area, you could consider CA-ON:Burlington , and CA-MB:Winnipeg hypothetically. Same thing to consider in AU-SA-Adelaide-Metro .
  • Single GTFS region: I have the above question after seeing both US-AK-Anchorage-PTD and US-CA-SantaCruz . Ok I suppose one is highlighting aspecific department, and another including everyone in Santa Cruz. But the result is the same if you use US-AK:Anchorage . Similarly, the "All" in US-UT-All is quite redundant, and not the same logic as US-CA-SantaCruz when US-CA-SantaCruz-All is not used. (ignoring the issue of Santa Cruz County vs City here)
Only a few examples. Very pedantic. But there is a need to avoid repeating the arbitrary encodings.
—— Kovposch (talk) 14:48, 9 November 2023 (UTC)
I have extended the allowed characters to all printable ASCII.
That should give enough room for discussions on what format you want.
The only reason I constrain this value is that UTF-8 has no standardized way of lowercasing. (Example FFI to ffi or ffi (U+FB03))
Apart from the technical constraint, my proposal only seeks to describe current usage.
Spaanse (talk) 17:44, 9 November 2023 (UTC)

Other tags

Resolved: clarified distinction between network:guid=* and gtfs:feed=*

Is *:guid=* considered part of GTFS tagging? I was confused at first as to whether it is a UUID. Can it be improved to a more common word that's not an abbreviation? I don't have good ideas yet. This is of interest to aforementioned route=road network=* and cycle_network=* alongside.
—— Kovposch (talk) 12:30, 10 November 2023 (UTC)

network:guid=* is not part of my proposal. It does have a lot of existing usage, which seem to be for PTNA. It's purpose is discoverability of feeds, but requires a list of feeds and their GUID - which only PTNA seems to have. I propose to have feed information stored within OSM - with the GTFS feed relation. This relation is discoverable from any object of the feed by climbing up the tree of relations. In the other direction - we can discover all objects of a feed by searching down the tree. This is not possible if the feed is some listing on a wiki page or on a PTNA page. Spaanse (talk) 09:16, 11 November 2023 (UTC)
Another note: the purpose of gtfs:feed=* is similar - it should be globally unique among all feed relations. However it's main purpose is making sure that elements found by climbing down the tree indeed correspond to that feed (have the same value in gtfs:stop_id:(feed_code)=*. Spaanse (talk) 09:19, 11 November 2023 (UTC)
network:guid=* was probably introduced first by PTNA at least made visible by PTNA, just to have a unique identifier for 'network'. Some mappers set 'network' to a short form, where 'AVV' is not unique inside Germany ("Aachener ..." and "Augsburger ...") and 'VVM' not even inside Bavaria. The long form of 'network' often showed problems with Overpass' "sketch-line (example)" at that time. --ToniE (talk) 09:43, 13 November 2023 (UTC)
<sidenote> sketch-line is still not perfect: [1](example)</sidenote> Spaanse (talk) 10:24, 13 November 2023 (UTC)
sketch-line does not handle PTv2 correctly, where highway=bus_stop is set on the public_transport=platform --ToniE (talk) 10:42, 13 November 2023 (UTC)
Thanks for the clarification. I think this indicates a distinction between network:guid=* and gtfs:feed=*. The first is a unique identifier for a network. The second is a unique identifier for a feed. This can cause confusion, so I will clarify in the proposal that they are separate concepts. As such network:guid=* is not considered when handling a GTFS feed. In particular - the uniqueness constraint does not extend to cover both tags. Spaanse (talk) 10:23, 13 November 2023 (UTC)
Examples:
- several GTFS feeds for New York City but a single network
- single GTFS feed for Switzerland/Liechtenstein and the Netherlands and multiple networks - with the problem of having multiple routes with "route_short_name" being the same in the GTFS feed for different networks
--ToniE (talk) 10:42, 13 November 2023 (UTC)

Link contents

Resolved: relax requirement - now should only link to a permanent URL

The gtfs:url=* value is supposed to end with '.zip'. However, this cannot be guaranteed and is out of reach for us. The feed owner decides on how to make the data accessible. Many owners provide a perma-link as suggest in the GTFS specification but w/o the '.zip' suffix. Instead, their servers respond with "HTTP 302" pointing to the actual location of the ZIP file via the "Location:" header in the response. Examples, there are many. PTNA knows 22 examples as can be seen in some "get-release-url.sh" files of the "gtfs-feeds" repository. Link is https://www.data.gouv.fr/fr/datasets/r/01c10bc3-da1a-4a3d-b72a-beadff6e9fef for FR-ARA-Montelibus. So, in this proposal, I would not require that the gtfs:url=* value ends with '.zip'. P.S.: there are some GTFS feeds, where PTNA has to scan the contents of HTML data to find the latest, valid URL. Example: https://www.dtpm.cl/index.php/noticias/gtfs-vigente --ToniE (talk) 14:31, 18 November 2023 (UTC)

Removed the requirement that the URL ends in .zip
Links that 302 redirect are now also allowed.
I don't think your last example is a good fit for the url tag, since it is not a perma-link.
In such a case I think a mapper should contact the operator and point to the GTFS recommendation and ask if there is a perma-link to the feed
-- Spaanse (talk) 08:20, 19 November 2023 (UTC)
agreed for the last sentence --ToniE (talk) 13:34, 19 November 2023 (UTC)

Encouraging gtfs:* tags over normal tags?

Resolved: Added a section on why I do not think it is a good idea to mix normal tags and gtfs tags

Hi, to me it seems like a bad idea to suggest using tags like gtfs:stop_code=*, gtfs:stop_name=*, gtfs:name=*, gtfs:route_short_name=*, gtfs:route_long_name=*, gtfs:platform_code=*, etc.. OSM already has tags for this purpose:

  • ref=* for stop_code and route_short_name
  • name=* (or to=*+from=*+via=*) for route_long_name and *_name
  • local_ref=* for platform_code

Do you have an example when it would be valuable to use one of the gtfs:*=* tags above, instead of a normal OSM tag? --Kylenz 08:02, 24 November 2023 (UTC)

Although the GTFS specification seems to be quite precise on what to set as value for a specific field, owners/providers of feeds often have some different understanding.
* I've seen weird usage of route_long_name and route_short_name, where route_short_name is not appropriate for using as OSM's ref
** the GTFS data for regional trains in DE is an example:
*** route_short_name (1st column) is not appropriate for 'ref' in OSM
*** route_long_name (6th column) is the head-sign of the train and is appropriate for 'ref' in OSM, but sometimes empty
--ToniE (talk) 11:25, 25 November 2023 (UTC)


On a separate note, this proposal seems to suggest tagging gtfs:stop_id=* and gtfs:route_id=* even if it always the same as ref=*. Wouldn't it be better to use ref=* by default, except in cases where the on-the-ground ref doesn't match the GTFS code?

Thanks and appologies if I have misunderstood something --Kylenz 08:02, 24 November 2023 (UTC)

My aim is to make this as easy for a machine to parse as possible.
Having these tags in the gtfs namespace helps in this regard:
1. It clearly delineates where a computer should look for the fields to match with a feed
2. It is not prone to editing to make it more understandable for humans - it can remain an exact match with the value in the feed.
3. It keeps the feed suffix within gtfs namespace, so we don't have to care about clashing with existing suffixes for ref or name
To elaborate on (1):
I suspect that these tags will mostly be set, changed and read by computers
Mechanical edits have a high standard of care - you have to be really certain that it does not change or remove correct data
This is much easier to achieve if it is entirely separate of on-the-ground/human tags
An example of (2): lettered platforms at a bus station could be either lowercase or uppercase.
Then local_ref=* might contain the version as it is on the platform, and gtfs:stop_code=* as it is in the feed.
Also name=* has a fixed format which may not match the format and capitalisation used in route_long_name
Suppose that we used your policy of using gtfs:stop_id=* only if it differs from ref=* on the ground.
Imagine the following scenario:
- Bus stops are imported / mechanically edited to add the stop_id in ref=*
- A mapper walks by a bus stop, and notices that the ref=* does not match the value on the ground
- They edit ref=*, unknowingly removing the ref=* that was referring to the feed.
I hope this gives more elaborate reasoning on why I suggest these tags (besides their current use)
Spaanse (talk) 09:19, 24 November 2023 (UTC)
Sorry I'm still a bit confused. What's the point of importing GTFS data if you want to make it "separate" from the tags that everyone else uses? i.e. How does these duplicate tags improve OSM?
I see the obvious benefit of adding refs/IDs, but why import gtfs:route_long_name=* or gtfs:platform_code=* into separate tags if you know that it will duplicate existing data in OSM and become out of date as soon as anyone else edits the normal tags?
--Kylenz 09:44, 24 November 2023 (UTC)
The aim is to reference GTFS objects; tags in the gtfs namespace act as foreign keys
Ideally we only need gtfs:stop_id=*, gtfs:route_id=*, gtfs:trip_id(:sample)=* and gtfs:shape_id=*, but depending on the feed they may change in each version.
In such cases we can construct a foreign key using stop_code, platform_code, names, ... - any combination that is more stable.
Since the GTFS tags are used as a foreign key, they must match exactly with the GTFS feed
This is too rigid for normal tags which are intended to be displayed and match what is on the ground.
For an import from GTFS, there are two aspects
1. The geometry, names, platform codes, ... → they are still put in the normal tags
2. A way to reference back to the object of the GTFS feed → they are put in the GTFS tags
This proposal standardises how (2) is done, it does not change anything about (1)
An import may do only (1), but I believe it is also useful to do (2).
A particular field can be used for both (1) and (2) - in which case it is duplicated across different tags
But the presence of the tags also indicates that they are needed for some reason
If a stop has local_ref=* and gtfs:stop_name=*, then we know that we only need to match stop_name
If a stop has local_ref=*, gtfs:stop_name=* and gtfs:platform_code=*, then we know we need to match both stop_name and platform_code
> but why import gtfs:route_long_name=* or gtfs:platform_code=* into separate tags if you know that it will duplicate existing data?
Basically: it is useful to have different tags for different purposes (display or lookup).
It unburdens a mapper interested in displaying public transport from knowing how the GTFS lookup works.
It unburdens a mapper interested in GTFS lookup (for timetables) from knowing how things should be displayed
And as mentioned before - these tags are for machines to use (and change).
I think this proposal would get a lot of push-back if it encouraged automated edits of normal/human tags.
> and become out of date as soon as anyone else edits the normal tags?
The point of separate tags is that this is not the case.
If you share tags - then if the ref of a stop changes on the ground but not in the feed, it is always incorrect in some way.
If you do not share - you update ref=* and leave gtfs:stop_id=* and they now are again both up-to-date.
I have a question for you: why do you believe duplicating data - for different purposes - problematic?
If you want shared tags - how do you ensure that editing for one purpose does not make it wrong for another purpose?
Spaanse (talk) 11:23, 24 November 2023 (UTC)
A good analogy may be opening_hours:url=*
From website=* and potentially a ref/adress/... we could in theory find the page of the website that has the opening hours
Yet it is still useful to have a direct link, so we don't complain about duplicate data
Can we define something like a "default" mapping from OSM tags to GTFS fields, just avoiding double tagging the same value with different keys?
* if gtfs:route_short_name=* is not set for a route_master/route relation, ref=* shall be used instead
* if gtfs:stop_id=* is not set for a gtfs:public_transport=platform or gtfs:public_transport=stop_position, ref=* or ref:IFOPT=* or ref:*=* shall be used instead
* similar for other fields/keys
--ToniE (talk) 11:25, 25 November 2023 (UTC)
"Can we define something like a "default" mapping from OSM tags to GTFS fields, just avoiding double tagging the same value with different keys?"
"if gtfs:route_short_name=* is not set for a route_master/route relation, ref=* shall be used instead"
I think this would be a great solution, thanks ToniE. Anyone consuming GTFS data via OSM should ideally support the standard tags that many cities are already using, if gtfs:*=* tag are not defined. Otherwise it would be strange to say that gtfs:*=* tags are "approved", without mentioning that the rest of the OSM ecosystem uses a different set of tags. The table below might be a good starting point:
GTFS Field Normal OSM Tag
stop_name name=*
route_long_name name=* or to=*+from=*+via=*
route_short_name
trip_short_name
stop_code
stop_id
ref=*
platform_code local_ref=*
wheelchair_boarding wheelchair=*
route_color colour=*
bikes_allowed bicycle=*
pickup_type
drop_off_type
relation relation roles stop_exit_only or stop_entry_only or stop
--Kylenz 00:41, 26 November 2023 (UTC)
@Kylenz, I have explained multiple times why I believe that defining defaults is a bad idea for maintainability - a section that is now included in the proposal as well.
I would like to hear if you have any counterarguments?
"Anyone consuming GTFS data via OSM should ideally support the standard tags that many cities are already using"
This is really complex, that's why I propose the gtfs:*=* tags.
Take the example given in the Proposal
Tags on OSM object
Key Value
lat 52.1085606
lon 5.2565860
name Prins Alexanderstichting
public_transport platform
ref:IFOPT NL:Q:50201120
zone 5020
Which of the following 5 GTFS objects matches best?
Actual rows in stops.txt of OVApi - only taken those with stop_name = Huis ter Heide, Pr. Alexanderstichting
stop_id stop_code stop_name stop_lat stop_lon location_type parent_station wheelchair_boarding
1329483 50201090 Huis ter Heide, Pr. Alexanderstichting 52.108713 5.256557 0 (platform) 0 (false)
1329998 50201120 Huis ter Heide, Pr. Alexanderstichting 52.108569 5.256645 0 (platform) 1 (true)
2338264 50201090 Huis ter Heide, Pr. Alexanderstichting 52.108596 5.256659 0 (platform) stoparea:368830 0 (false)
2338266 50201120 Huis ter Heide, Pr. Alexanderstichting 52.10848 5.25666 0 (platform) stoparea:368830 1 (true)
stoparea:368830 Huis ter Heide, Pr. Alexanderstichting 52.108538 5.256659 1 (station) 0 (false)
Among these 5 options, I would say 2338266 matches best.
Here I have already matched based on name (which requires abbreviating, maybe a change of capitalisation and adding the placename)
This could possibly also be done by looking up nearby GTFS objects (based on lat/lon)
Based on ref:IFOPT=* we can reduce to 1329998 and 2338266; though they do not match exactly (we need to remove the NL:Q: part)
The only reason I would choose the second is that it is more complete. (I have no clue why the first two rows exist in the GTFS)
And there are many scenario's where matching is even more difficult. How would you have determined the right side of the road if ref:IFOPT=* was not present? (70% of public_transport=platform in NL)
I do not want to assume the lat/lon in the GTFS is precise enough to distinguish this.
So do we have to look at the routes the stops belong to - and the direction they travel?
Take a look at the pictures in https://community.openstreetmap.org/t/gtfs-in-nederland/105300/6
There you can see that the GTFS platforms are a chaos at a bus station (OSM has them neatly lined up)
And there are many GTFS stations in the African sea, which is definitely not correct.
I hope that I have demonstrated why I strongly believe that double tagging is the better 'evil'
Spaanse (talk) 09:53, 26 November 2023 (UTC)

I agree to the section "Why not use common tags like ref, name, ... ?" as of Nov. 25, 2023, 14:19, it sounds reasonable to me. --ToniE (talk) 13:34, 25 November 2023 (UTC)


I have added an explanation of my reasoning not to mix with normal tags at the end of the proposal. There are genuine benefits to this double tagging, while there are barely any downsides Potential downsides:

- storage: at most 5095359 [objects] * 3 [unnecessary gtfs tags / object] * 512 [bytes per tag] / 10^9 [bytes per GB] = 7,8 GB. The monthly cost for this amount of storage: $ 0.10

- more work to edit two tags: automated edits can safely keep the gtfs tags up to date, so no extra work for a human. furthermore, a human can be alerted to normal tags that are potentially wrong and verify if they need to be changed. this is less work than visiting all stops in a city for example.

Have I missed any potential downsides? Spaanse (talk) 13:42, 25 November 2023 (UTC)

At which real world object can the GTFS stub relation be verified?

I value the map what's on the ground / verifyability principle of OSM very much. That's why I would like to see in the proposal to which real world object this new relation type is primarily attached. If I am an unsuspecting mapper, what do I see on the road that makes me think "aha, let's map or check this"? As an example: Node Network routes are not in themselves physical objects. They are constructs defined by the two Nodes, e.g. Node A25 and Node B53 are physical objects carrying the Node ref and referring to each other by arrows pointing to the other Node. If I see this object I know there is a route, I know the ref (from-to), so I know which relation to check or add. Is there a comparable object for the GTFS-feed stubs? --Peter Elderson (talk) 10:47, 30 November 2023 (UTC)

Not really. The closest I could give you is a posted timetable. If practically feasible, I would prefer timetables to be included in OSM directly instead of GTFS references. However, that is unmaintainable - too much data that changes too often. Instead I suggest that we should let operators maintain the timetables in the GTFS feed, and we maintain the reference to those. One scenario that can cause someone to check the references; if the posted timetable does not match the one that their OSM app gives (if that gets implemented). But the timetable does not say what the reference should be, for that you really have to look at the feed.
Spaanse (talk) 11:20, 30 November 2023 (UTC)
Timetables -or narrowcasting devices displaying time-table information, or statements/codes referring to timetable or service information- are always present at stops and stations, I think. Would these be a required member of the feed stub relation? Then that would be the geographical real world object link. If that is the case, could you somehow make that clear in the proposal? Because now it seems like a way to dump an external service into OSM, and without the geographical link I just wonder why you would need OSM for this at all.
--Peter Elderson (talk) 13:57, 30 November 2023 (UTC)
Ignoring the cascading membership for a second; all stops/stations/routes, that reference the GTFS object for their feature, should be a member of the feed relation. (Cascading membership is a mechanism to reduce the number of direct members).
Spaanse (talk) 15:17, 30 November 2023 (UTC)

License requirement

As of right now the proposal page says

I think that we should require that feed relations only exist for feeds that are OSM compatible.

The reason for this is that linked feeds are likely going to be used to maintain routes in OSM (likely as a diff tool).

This leaves open if the GTFS feeds should be attributed somewhere if their license is open but requires attribution.

Another option is to present the feed license in a machine-readable form (SPDX license identifier, license URL, attribution string, ...)

Can you confirm if a license requirement for the GTFS feed is part of the proposal, or not? Further, under "OSM compatible" do you mean "licenses that are approved for import to OSM" or "licenses that feel compatible"? See osmf:Licence/Licence Compatibility and Import/ODbL Compatibility.

Requirement to be OSM-import-approved would exclude a _lot_ of feeds. There is a lot of "open data" which is releases under fairly permissive licenses but the licenses are not import approved. Off the top of my head, Metrolinx (GO Transit) data is licensed under "Open Government Licence – Ontario – Metrolinx", while TTC data is licensed under "Open Government License - Toronto", neither of which are approved licenses.

Is the goal of this proposal to improve QA tools, to allow autogeneration of OSM data, or to make it easier for end-user applications to look up routes and departures from a stop? I would guess autogeneration of OSM data needs an import-compatible license, but the other uses might not. --Jarek Piórkowski (talk) 14:28, 30 November 2023 (UTC)

It is not part of the proposal, but I want to hear people's thoughts.
The core question here is: should licenses be checked when adding references to OSM or when using those references?
I do not know enough about licenses to know what is considered allowed, good practice or forbidden.
Some relevant questions:
- Is adding a URL to a GTFS feed subject to license terms?
- Is adding ID's of objects in the feed subject to license terms?
- How would data consumers figure out if they are allowed to use data in the GTFS feed?
- When should data consumers attribute feeds directly or does a reference to the contributors page suffice?
- Can we tag license information on the feed relation to help with this?
My main motivation for the proposal is to allow end users to look up timetables.
But I know that existing uses of the GTFS tags are from imports, and used for QA.
I presume that in those cases the license were determined to be import-compatible.
Spaanse (talk) 17:52, 30 November 2023 (UTC)
Background/disclaimer: I am not a lawyer; I once worked on a commercial transit app, but not on the legal side. IMHO:
- Is adding a URL to a GTFS feed subject to license terms?
If you are only linking to a URL, then no. There is plenty of precedent for that, beginning with website=*
- Is adding ID's of objects in the feed subject to license terms?
This depends. Generally, facts cannot be copyrighted, so the mere fact that stop at latlon X,Y has ID zzz1 is not subject to license terms. (Some agencies have transit stop IDs printed on the physical stop signs, so we can walk around and survey them, same as you survey house numbers.) However, taking an entire dataset of IDs and copying it into OSM is murkier, since you're much more obviously using that particular dataset - there's a concept called "database rights" recognized in some jurisdictions.
- How would data consumers figure out if they are allowed to use data in the GTFS feed?
Honestly, for the "display timetable" or "display upcoming departures" use cases, in practice no one cares. The approach of basically every transit app is that a GTFS feed that's publicly available on an open-data-ish site is allowed to use. GTFS specification doesn't even have a license field - neither Google nor any other major transit apps found it necessary in practice. Other types of users (particularly OSM imports) would have to think about licenses, but I think it is easier to have as a manual step for each feed/region supported.
- Can we tag license information on the feed relation to help with this?
AFAIK there's no standard for encoding licenses and their compatibility with OSM, and I would really advise against trying to create one here unless really necessary for your purposes. Just think, how would you indicate a dataset's compatibility with OSM license? Even if you make a tag like gtfs:licence=ogl-ca-bc + gtfs:licence:osm_compatible=yes, then the data users still need to manually verify everything, because there's nothing stopping someone from adding a falsehood to OSM and that's a potential legal issue for the data users.
--Jarek Piórkowski (talk) 03:44, 1 December 2023 (UTC)

Section for "Best Practice"?

Resolved: best practices should be discussed on the forum

Should be have a place where we can discuss and describe "best practice" in using GTFS data from external source (non-OSM sources) in OSM? GTFS data from GTFS providers are so different.

  • For some feeds, updates are available nearly every day, for others on a quarterly basis only
  • Some feeds include multiple data sets for the same line, with overlapping dates
  • Some feeds include a single data set for a line, ignoring deviations caused by constructions
  • Some feeds include a single data set for a line, just the valid one, valid until the next GTFS data update is available
  • Some feeds keep the route_ids, trip_ids stable (or quite stable) between GTFS feed versions, others mix them: trip_id=123 can be a trip of a different route in the next update

--ToniE (talk) 14:19, 2 January 2024 (UTC)

I think discussion of best practice for usage can happen on the forum and is not that relevant for this proposal.
For linking to GTFS feeds from OSM I already included the recommendation to look at which combination of tags is the most stable.
Spaanse (talk) 11:17, 3 January 2024 (UTC)
Agreed, let's discuss there and find a place in OSM wiki to store the outcome --ToniE (talk) 15:20, 3 January 2024 (UTC)

Finding the feed

Resolved: not relevant for tagging

I'm currently working on an API for PTNA allowing to query feed specific data which returns JSON, structured like JSON of Overpass-API

--ToniE (talk) 15:18, 1 February 2024 (UTC)

Cool!
I am not going to include this in the proposal, but feel free to add it to the GTFS wiki page
Spaanse (talk) 10:59, 11 February 2024 (UTC)

Example

Resolved: Updated feed codes

More precise: a comment to List_of_GTFS_feeds

  • As the feed names are mostly derived/based on ISO, I'd suggest to have capital letters at least for the ISO part.
    • DE-BY, NL, FR-PAC, ...
    • also because 'DE' specifies the country whereas 'de' specifies the language

--ToniE (talk) 15:18, 1 February 2024 (UTC)

I have updated the feed codes to align with those used in `gtfs:feed` or PTNA
Spaanse (talk) 10:59, 11 February 2024 (UTC)