FOSSGIS/Server/Projects/Public Transport Data Integration and QC

From OpenStreetMap Wiki
< FOSSGIS‎ | Server‎ | Projects
Jump to navigation Jump to search

Public Transport Data Integration and QC

Public Transport Data Integration and Quality Control

Main Contact

User:Polyglot

Ich kann Deutsch lesen, bin aber mehr gewohnt proposals auf Englisch zu machen.

Other people involved

None at the moment, although I do fall back on other Belgian contributors sometimes when I get stuck

Short description of your project

Over the past few years I've been developing some scripts. On the one hand scripts to assist with creating/updating route relations and comparing upstream data with what is already present in OSMS. The scripts run in the scripting plugin environment of JOSM.

On the other hand scripts to build a PostGIS DB with data from transport companies (initially for 2 Belgian transport companies De Lijn and TEC, but there is no reason why this couldn't be expanded to other operators who release their data). The service could also be helpful to report problems with route relations of PT companies which don't release their data, but in that case it can't check whether the stops sequences are still up-to-date or whether lines are missing/modified. This kind of defeats the purpose. All it can do for those is check if the ways sequence is continuous and if all stops are served in the correct order. This could be useful, but I'd like to start with a smaller scope (and geographical region), as a proof-of-concept.

The scripts are all Python 3. All the code is released as free software since the beginning. I'm a believer, I guess.

I'm looking for a platform to involve more contributors in adding and fixing the route relations for public transport. It's a community project after all. To accomplish that feat the contributors need to know where their attention is needed though.

So on the one hand I want to show which lines are missing, which route relations are broken and which route relations don't correspond (anymore) to the services the PT operators offer.

The same goes for the stops, but that is conceptually a lot 'simpler'.

The other aspect is that I'd like to be able to provide a web service (SOAP?) which can be accessed from the JOSM script I'm developing. Such that it becomes possible to request data for all variations of a given line. Then that script can transfer this 'new' situation to the existing route relation and calculate a new sequence of ways to fix that route relation. The final step is for the contributor to check that computed route relation. The load on the server will be very small for this.


So, what I want to do on the server, is fetch new data from the PT companies. Load it in PostGIS as a batch job, once a day. De Lijn has new information max 2-3 times a week (more commonly once every 2 weeks), TEC once every 3 months.

Perform an Overpass API query, which, for Belgium, produces 20MB of data, at the moment. And load that data in the PostGIS DB. This step might not be necessary if the dev server has up-to-date production data to compare with.

  • It may make sense to perform smaller Overpass queries in between those batch jobs, fetching what people are actually working on

Alternatively I could continue doing the above on my portable, dump the resulting DB, send it over in zipped form and reload it on the server.

It would make more sense to keep it all together though, such that it's not dependent on 'external factors'.

Compare and produce reports about the stops presented to contributors through a web interface.

Also process all route relations regularly and check whether the stop sequences for all line variations are still correct. At the same time, check whether the ways form an uninterrupted sequence.

Vorschau/Preview

https://www.youtube.com/playlist?list=PLO3wjvbFUESEZ2P-jKzYqtr7HzIs0KXJu

Video 1:

  • 2:36 Resulting relation with stops in the correct order as generated by the script
  • 3:42 Script is run in JOSM to add ways to the route relation

Video 3:

  • 0:00 The Python script is run, the route relations generated and the result sent to JOSM

Warum ist das Projekt für die OSM-Community interessant?/Why is this project interesting to the OSM community?

Broken route relations for PT are quite useless. They also need to be kept up to date, which is an ongoing process.


Welche spezielle Software brauchst Du?/What special software will you need?

PostGIS and Python 3.x (preferably) or 2.7

Welche Ressourcen brauchst Du?/What resources will you need?

5-8 GB disk space. Less if there is a minutely updated DB with OSM data which can be queried. The code is written to be as efficient as possible. For instance I'll prepare data as CSV, then use COPY to get it into dedicated tables. One for each PT company and one containing data about the stops present in OSM.

This slows down the querying a little, but it's a lot more efficient than using UPDATE statements on a single table.

Wo ist der Source-Code?/Where is your source code?

Initially I was documenting it here:

WikiProject_Belgium/De_Lijndata

I recently started using github:

https://github.com/PolyglotOpenstreetmap/OSM_PublicTransportRoutes

I'll convert this to code which runs on Linux. I have ample experience with Linux. The Python code is cross platform, except for some file names. At the moment it sends the resulting data directly to JOSM RC, that will need to be changed, of course to a page on the web server containing a url doing the same.

This is the code which runs inside JOSM:

https://github.com/PolyglotOpenstreetmap/Python-scripts-to-automate-JOSM

  • FindWaysBelongingToRoutesStartingFromStops.jy
  • compareRoutes.jy

If it becomes possible to create a web service, I'll change that code. The whole process can then be improved from a push all route relations at once to a pull the data as needed.

the code for the web service and for the reports which route relations are in need of attention still needs to be developed. That should be relatively straightforward once a platform is available.

Welche Daten brauchst Du?/What data do you need?

OSM data for where the operators operate, at the moment Belgium and a little bit outside the borders, Breda, Aachen, Zeeland province of The Netherlands, Luxembourg, a few villages in the north of France.

The data preferably as fresh as possible, but can also be fetched with an Overpass Query. It's OK if it's a day old too.

Verwandte Projekte/Related Projects

Maybe the relation checker

Status/State

neu