Proposed Import of STL GTFS
This page describes the proposed import process of GTFS data for the STL bus service provider in Laval, Quebec, Canada.
Myself (arthur_d) and other mappers from the area have been discussing for a couple of months on doing the respective imports for operators in the Greater Montreal Area. One can view the respective Wiki pages here.
The current state of the network (before import) is shown in the image to the right. There are very few stops and a total of 4 routes mapped, 3 of which are incomplete. Nevertheless, I corrected the tag information with respect to PTv2 requirements as well as GTFS requirements manually to make the existing data handling during import easier. This was only possible because there was very little already mapped. In the future, when updating the network using a bulk import, there should be no need to correct things manually because everything should be identical or very close to PTv2 and GTFS standards already.
Proposed Import Process
I propose to do the import using a Python script that generates JOSM XML files, handles conflation within the GTFS dataset and with respect to existing OSM data automatically. It doesn't delete any existing data but rather adds action=modify tag to existing data if need be and preserves all existing ids. It also makes sure changes to existing data also cascade to relation members. The script takes a STL GTFS zip file as input and executes the steps summarized below (latest from here). The script is a work in progress and it meant as tool to help and not fully automate everything. It can accessed here.
- Unzip GTFS archive and load data in to memory.
- Using the calendar.txt file/data, it finds the latest service_id in order to ensure only the latest data will be imported.
- Filters the whole GTFS dataset keeping only data that corresponds to the latest service_id.
- Writes the GTFS stops to GeoJSON for visualization (to spot apparent problems and potential problems).
- It downloads existing OSM data using overpy. This includes existing bus stops (nodes), route relations, and route_master relations along with the member nodes and ways. It converts the data into an internal format to be able to work with it in the script.
- Handles conflation within the GTFS dataset stops (explained in detail below).
- Creates new OSM nodes for the stops making sure to not duplicate or delete existing stops (explained in detail below).
- For each trip (trip_id) for a given route (route_id), it finds the longest trip (the one with the most number of stops).
- It creates OSM route and master_route relations using the identified longest trip respecting transportv2 requirements. It handles conflation with existing relations (explained in detail below).
- Write the JOSM XML files (nodes and relations)
Conflation handling within GTFS dataset
in the GTFS dataset, there are stops that are less than 2 meters from each other and have unique stop_ids and unique stop_codes but the same stop_name. It's clear that it is meant as an internal management system for the operator but not very useful for OSM. All stops that are within x meters from each other (distance used was 5 meters) are merged together in the following way:
A new stop is created having the attributes (and corresponding OSM tags) of the first stop. The stop codes are bunched together separated by a semi-colon(;).
Conflation handling with existing stops in OSM
In a similar fashion conflation between already merged GTFS stops and existing OSM stops is handled by checking if there are any existing stops in OSM that are with x meters (distance used was 5 meters) of each GTFS stop. If there are, the tags of the existing OSM stop and coordinates are replaced by the corresponding information of the GTFS stop keeping the existing OSM id and other information intact. In a single case, there was more than one existing OSM stop in the vicinity of a GTFS stop. I handled this case manually and reran the script. This check (checking of existing stops in proximity to GTFS stops) is only done with existing stops that are within the boundaries of the island of Laval. It doesn't check for existing stops on the island of Montreal to not override GTFS specific information for other already imported stops from other operators (for example from the STM). This will potentially create a duplicate stop in Montreal (if STL and STM stops share the same coordinates), but considering that the number of STL stops within the island of Montreal is very low, I consider doing a manual check after the import is good enough. To avoid this problem in future imports of GTFS datasets of operators that have overlapping geographic coverage, maybe the stop ref tag should also be connected to an operator tag.
Conflation handling with existing relations in OSM
Once the route relations are created by the script, it compares them with existing route relations that match the name, operator, and ref tags from OSM when applicable. Any matching existing route relation gets assigned new tags (from the GTFS) dataset and the member nodes are all replaced with new nodes (that are a result of the stop creation and conflation handling from the previous steps). The member ways remain intact as they are better handled manually in JOSM (if there is a need to change/update them).
Route_master relations are handled in a similar fashion where existing route_master relations all get new tags and in this case the member relations are completely replaced by new ones (already created and conflation handled in the previous steps).
During any conflation handling with existing data (nodes, or relations), the existing node, way, or relation, gets a action=modify tag written in the output XML.
The GIF below animates the route_master relations after merging the script generated XML file with the existing OSM data.
Update on the import
I've already done the initial import and there were some problems that arised due to a faulty conflict resolution on my part in JOSM. I tried, to set that the merge accepts all the new values automatically in JOSM but a lot of features turned out to get the same key value tags. Not sure what happened or if I did something incorrectly, but in case, the problem was detected early and I corrected by rerunning the script and inspecting the conflicts on one one. In fact, this problem was a good testament to the re-usability of the script for future bulk updates.