GTFS

From OpenStreetMap Wiki
Jump to navigation Jump to search

The  GTFS (General Transit Feed Specification) is a data format that was created for sharing public transportation information such as bus stops and bus routes and timetables.

It is useful for potential users of OSM data to provide routing using public transport as in many cases timetables change so often that representing them in OSM is de facto impossible. In some cities timetables can be expected to change daily due to road/track closures/renovations. And in some areas timetables are massively changes multiple times during the year, for example as holidays and school start/end.

In such cases data consumers can use OSM data for roads and stop positions and footways and take available trips on public transport directly from the transport organization.

It was originally called the Google Transit Feed Specification, and was developed by Google. It is now maintained by the MobilityData organization which also maintains tools for using GTFS data, a database of GTFS data, and the General Bikeshare Feed Specification.

Structure of GTFS

A GTFS feed is a (stable) URL which publishes a ZIP file containing multiple CSV files.

It is often updated regularly, in which case a particular zip file is referred to as a version of the GTFS feed.


The file stops.txt contains information about physical locations.

  • stop_id - the identifier for this location
  • stop_code, stop_name - public-facing identifier and name for the location
  • stop_lat, stop_lon - GPS coordinates, for stops on the pole
  • platform_code - the identifier for a specific platform

While the file is called stops.txt, it also contains information about stations, entrances, boarding areas. These are distinguished using location_type, and linked to a larger structure using parent_station

location_type Description Parent GTFS Description PTv2 concept
0 GTFS stop station Place where passengers board/disembark nodewayarea public_transport=platform (preferred)

node public_transport=stop_position

node highway=bus_stop

node railway=stop, node railway=platformnode railway=tram_stop

node amenity=ferry_terminal

1 GTFS station - Physical structure or area with one or more platforms nodearea public_transport=stationrelation public_transport=stop_arearelation public_transport=stop_area_group

node railway=station, node railway=halt

nodearea aerialway=station

2 entrance/exit station A location where passengers enter/exit a station node railway=subway_entrancenode railway=train_station_entrance
3 generic node station location in a station, used to define pathways
4 boarding area platform location on a platform


The file routes.txt contains description of a route.

A GTFS route corresponds with a relation type=route_master

  • route_id - the identifier for this route
  • agency_id - identifier for the agency running the route, which can be looked up in agency.txt
  • route_short_name - short identifier of the route, like a bus number
  • route_long_name - full name of route, often with destinations
  • route_type - what sort of public transport is used; bus/train/metro/...


The file trips.txt contains descriptions of trips - a sequence of stops visited at a particular time.

The actual sequence of stops is found in stop_times.txt, which references trip_id.

A GTFS trip roughly corresponds with a relation type=route, except for the inclusion of timing information.

  • trip_id - the identifier for this trip
  • route_id - route the trips belong to
  • service_id - days of operation (calendar.txt)
  • trip_headsign- displayed destination of trip
  • trip_short_name - public facing text to identify the trip
  • direction_id - direction of trip, whether it is inbound or outbound / clockwise or counterclockwise / ...
  • shape_id - if present, the shape of the trip between stops (shapes.txt)


The file stop_times.txt describes the time that a trip stops at a stop.

A GTFS Stop time is defined by the combination of a trip_id and a stop_sequence.

It contains the stop_id of the stop it visits, along with a arrival_time and departure_time.

Additionally it can contain information on whether you can board/exit at that stop or along the route to the next stop.


The file shapes.txt contains the paths that vehicles travel, like a GPX route.

A GTFS shape roughly corresponds with a relation type=route, except for the exclusion of the sequence of stops.

shapes are often referenced by multiple trips that follow the same path.

Tags

General rules

There are currently two namespaces (gtfs_* and gtfs:*) in use for GTFS-related tags.

In Proposal:GTFS Tagging Standard it has been decided to use the gtfs:* namespace.

Therefore, using tags in the gtfs_* namespace is discouraged.

Any existing tags are interpreted as the same tag in the gtfs:* namespace.


When a tag references a column of a GTFS file, it should use the full name of that column.

For example, use gtfs:route_long_name=* instead of gtfs:name=*, and gtfs:stop_id=* instead of gtfs:id=*.

This makes it easier for a data consumer to find the right file and column.


Tags for linking to a GTFS object should use the gtfs:* namespace instead of standard tags like name=* and ref=*/ref:IFOPT=*.

To find the matching GTFS object we need an exact match with the value in the feed.

This differs for the requirements of standard tags, which are aimed at humans.

Using standard tags for matching means the link breaks whenever capitalisation is changed, abbreviations are added/removed, ... .

Instead use both standard tags and GTFS tags, even if they are exactly the same.

Linking to a GTFS object

In order to look up timetables for a particular stop or route, we want a way to find the corresponding GTFS object.

To establish this follow the following steps:

Step 1: Analyse the feed

Different versions of the feed can have different ID's for the same object.

To ensure that the link to the object does not break for a new feed version, look through historic versions.

Determine which columns have a stable value, and which do not.

Use of a value outside of the GTFS feed is also an indication that the value is unlikely to change.

Step 2: Document the feed

Look at List of GTFS feeds.

If the feed the object belongs to is not listed there, add a new entry using the GTFS feed template.

The feed code can be anything, but you are encouraged to adhere to the following two rules:

  • Start the feed code with the ISO 3166-2 region code for the region the service operates in.
  • Do not include colons (:) in the feed code, as they are used as a separator in keys.

Remember the feed code for the feed.

Step 3: Reference the object

Find a combination of the following tags to reference a GTFS object:

Type Tags
stop gtfs:stop_id=*, gtfs:stop_code=*, gtfs:stop_name=*, gtfs:location_type=0, gtfs:platform_code=*
station gtfs:stop_id=*, gtfs:stop_code=*, gtfs:stop_name=*, gtfs:location_type=1
entrance gtfs:stop_id=*, gtfs:stop_code=*, gtfs:stop_name=*, gtfs:location_type=2
route gtfs:trip_id=*, gtfs:trip_id:sample=*, gtfs:shape_id=*
route master gtfs:route_id=*, gtfs:route_long_name=*, gtfs:route_short_name=*

The combination of tags should match exactly one object in the feed.

Additionally, try to avoid tagging the columns that you found to be unstable.


Note: tagging other columns is possible as well, but are not considered for matching.

Consider putting these in standard tags so that they can be read by applications that don't process the GTFS tags.

Example: colour=#008080 instead of (or in addition to) gtfs:route_color=#008080.

Default value for location_type

We sometimes need location_type to distinguish platforms and stations.

Instead of tagging it directly, it can be inferred from the type of PTv2 object using the table under "Structure of GTFS".

Because of these default values location_type is always used in matching of locations.

In the rare case that no value can be inferred (no or multiple matches), gtfs:location_type=* should be added.

Interpretation of deprecated gtfs:id=*

Use of gtfs:id=* is discouraged because it is unclear which table of the GTFS feed it refers to.

Instead use gtfs:stop_id=*, gtfs:trip_id=*, gtfs:trip_id:sample=*, gtfs:shape_id=* or gtfs:route_id=*.

However, if the tag is present it should be interpreted as follows:

Interpretation of deprecated gtfs:name=*, gtfs:short_name=*, gtfs:long_name=*

Use of gtfs:name=* is discouraged because it is unclear which table of the GTFS feed it refers to.

Instead use gtfs:stop_name=*, gtfs:route_short_name=*, gtfs:route_long_name=*, gtfs:trip_short_name=*. However, if the tag is present it should be interpreted as follows:

Step 4: Group the tags with the feed code

Add the feed code as a suffix the keys.

This ensures that a data consumer can find the feed and knows which columns it should look at to find the right column.

Using the code as a suffix also means that we can reference objects in multiple feeds, for example for stations near borders.

Interpretation of deprecated gtfs:feed=*

Use of gtfs:feed=* is discouraged, instead use the feed code suffixes.

However, if the tag is present it should be interpreted as if it's value has been added as a feed code suffix to all GTFS keys that do not already have such a suffix.

As such, the combination gtfs:feed=NL-OVApi + gtfs:stop_code=nm is interpreted as gtfs:stop_code:NL-OVApi=nm


Overview of used tags

Keys with wiki pages

  • gtfs:feed=* - deprecated - describe which feed the object belongs to
  • gtfs:name=* - deprecated - the name of the object according to the GTFS feed
  • gtfs:release date=* - the version of the feed used for the
  • gtfs:route id=* - identifier to associate type=route_master relations with routes
  • gtfs:shape id=* - preferred identifier for a route variant -- but not always present, does not provide information about stop positions
  • gtfs:stop id=* - identifier to associate stops with their GTFS counterpart
  • gtfs:trip id=* - alternative for route variants with only one trip
  • gtfs:trip id:sample=* - fallback for identifying a route variant -- but more likely to change, provides information on stop positions and their sequence only
  • gtfs id=* - deprecated - the id of the object in a GTFS feed

Keys by use (over 100 uses as of when this was updated)

Currently unused tags previously mentioned on this page

Alternative for stops

In Europe, for public transport stops, the  European standard IFOPT is defined and in some GTFS-data the stop_code is identical to the IFOPT references. In these situations, tagging both gtfs:stop_code=* and ref:IFOPT=* is encouraged.

Data sources

Visualizing of GTFS

  • PTNA - nice online visualization of aggregated and correctly licensed GTFS data with tag recommendations for route relations and map overlay for shapes.

Conversion of OpenStreetMap and GTFS

OSM → GTFS

  • osm2gtfs - An extendable python script to query OpenStreetMap data about public transport, combining it with time information provided from a different source and convert it into the GTFS format.

GTFS → OSM

  • GO-Sync (aka gtfs-osm-sync) - a desktop tool to synchronize GTFS feeds with OSM
  • GTFS-OSM-Validator - console tool that will read GTFS and output exact problems it finds in OSM
  • gtfs-sql-importer - This tool can convert GTFS to SQL postgis schema where GTFS can be further manipulated. More examples of this tool can be found in GTFS SQL examples.
  • GTFS-OSM-Import - Open-source tool to automate and simplify as much as possible imports of GTFS data into OSM.

Editor support

Software using tags

Discussions

External links