TfL Cycling Infrastructure Database

From OpenStreetMap Wiki
Jump to navigation Jump to search

Transport for London (TfL) have created a database of cycling infrastructure, containing 240,000 assets, covering all of Greater London. This has been released as open data on 1st August 2019.

TfL CID - cycle track.png

This groundbreaking database contains key cycle infrastructure asset within Greater London, including assets on and off-carriageway.

A map of the data has been made available (see below).

TfL wish to conflate this database with OSM to make this data more accessible for the benefit of cyclists.

About the CID database

TfL’s official press release stated:

“The world’s first Cycling Infrastructure Database will be the most comprehensive database of cycling infrastructure ever collected in London. [...] TfL has amassed data on every street in London, cataloguing almost 146,000 cycle parking spaces, 2,000 km of cycle lanes and more than 58,000 cycle signs and street markings. This information will be released as open data alongside a new digital map of cycle routes, will make journey planning and cycle parking much easier, as well as offering valuable information to TfL and the boroughs for planning future investment in cycling.”

Each asset is accompanied by two photos illustrating it, which will considerably enhance the ability of OSM mappers to merge data in remotely.

The collected data is a snapshot in time ranging between January 2017 and May 2018. The data was professionally surveyed by a team of surveyors.

TfL is keen to make this available to the OpenStreetMap community under a compatible open license, to ensure maximum use of the CID. TfL is also potentially willing to consider tool development to help facilitate sensitive merging in of this data.

Goals

To conflate the TfL CID database with OSM  in order to add additional cycling related assets and give richer attributes for existing cycling assets within the Greater London area. The conflation process will make the detailed CID dataset more readily available to the OSM and cycling communities with the hope that it will benefit new and future cycle based users of Open Street Map, for example through improvements to cycle routing.

CID schema

TfL has published the CID Schema.

TfL have a version of the database which adds a further field which associates the asset feature with the relevant OSM Way nearby, using a GIS analysis.

Two images accompany each asset. These have been processed to meet data and privacy regulation.

Licensing

The OSMF Licensing Working Group have confirmed that they "believe that it is unproblematic to use this data in or as a source for OpenStreetMap", as noted in their minutes and as posted on talk-gb

Earlier detail:

TfL has made the data available under its open data license. This is the Transport Data Service, which is "based on version 2.0 of the Open Government Licence with specific amendments for Transport for London".

The OSMF Licensing Working Group has been contacted for their view on the compatibility of this license with the OSM Contributor Terms. We note the LWG's comments about Open Government Licence (OGL) based licences

Discussions during the summer have established the following, as noted on talk-gb:

  • The license is indeed that here: , which is based on Open Government Licence v2 with some changes.
  • The license now contains mention of containing Geomni UKMap data, as of 17th July 2019.
  • The data was collected by the surveyors using UKMap as a background map, and then checking was later performed using aerial imagery from the same supplier.
  • Geomni have confirmed they do not regard themselves as having residual data rights in the released data, because TfL "haven't simply copied features from our data".
  • There is no use of Ordnance Survey data at all.
  • TfL are happy with commercial / non-commercial use of the released data.

Work to Date

CycleStreets were commissioned by TfL to create a report aimed at facilitating re-use of this data within OSM. This was delivered to TfL on 18th November 2019 and may be available from them.

The deliverables of this report were:

  1. Establish a mapping between the CID schema and geography types and the OpenStreetMap tagging system and geography types.
  2. Review the TfL open data licence and provide recommendations on licence compatibility with OpenStreetMap in regards to adding the CID data into OpenStreetMap.
  3. Identify options (e.g. tools, other arrangements) whereby TfL can utilise crowdsourcing to keep the CID up to date without introducing licence restrictions which are incompatible with their own open data licence.
  4. Undertake a comprehensive review of existing specialist OpenStreetMap data import (conflation) and data collection/data update tools. Provide recommendations as to which tool(s) are most suitable and whether any further tool development is required.
  5. If further tool development is required, outline the scope and engage with the relevant parties to provide TfL an estimate of the range of potential cost.
  6. Commence community engagement and report initial findings to TfL. In particular, is the OpenStreetMap community supportive of the data being added and are they likely to engage with the process of adding and maintaining the data? This will give TfL a better view how to proceed (e.g. is it worth proceeding to tool development and will the tools be used by the community or should TfL plan for the tools to be used by paid mappers?)
    TfL CID - cycle parking.png

CycleStreets undertook a full analysis of the CID data and how each asset and field might be converted to OSM and invited comments from the community on the proposed mapping of CID fields <> OSM tags.

A demonstrator map, was created by CycleStreets and comments were sought on data quality and usefulness of this data from the OSM community. Analysis by CycleStreets was that the data is of excellent quality, and very suitable for conflation into OSM, to increase both comprehensiveness and metadata quality.

Demonstrator Map Link

Usage notes: The controls on the right of the map allow the different feature types to be selected. The OSM layer (available at zoom level 19+) also provides a live feed from the OSM API, to enable quick comparisons. The two photos of each asset are shown, which will be particularly useful for OSM to verify; all c. half-million photos have been cleared for GDPR purposes.

Current Work

In February 2022, Sweco and GHD were commissioned by TfL to undertake a programme completing the migration of the CID to OSM.

Utilising a suite of scripts developed by CycleStreets to compare differences between the CID and existing OSM, the outstanding assets are conflated through manual validation and upload using JOSM.

Optimisations to this process have been identified for certain type of assets using the OpenStreetMap api. This has been assessed as low risk and determined to meet the 'acceptable usage' threshold, as it is only applicable for amending asset tags without geometry change. Necessary manual validation is carried out when inconsistent tag value is spotted.

The programme is coordinated closely with TfL who are also supporting through additional quality checks.

As of July 2022, Sweco have commissioned CycleStreets[1] for a small piece of work to resolve the remaining conversion definition issues in the Github repo. On behalf of CycleStreets, User:Richard will implement any determined changes in the conversion script.

Process

Import Data

Background

Data source site

Data license

Type of license: Based on version 2.0 of the Open Government License with specific amendments for Transport for London

OdbL Compliance verified: yes (see above)

OSM Data Files

  • OSM files are generated directly from CID JSON data then manually or semi/manually conflated with OSM (see below)

Import Type

  • One-time import
  • Majority of assets conflated using a manual process involving JOSM
  • Some simple assets to be conflated using a semi-automated script

Data Preparation

Data Reduction & Simplification

Only CID data that can be readily conflated with OSM will be imported. This means that the following CID asset types will not be conflated as part of this project:

  • Advance stop lines
  • Restricted points
  • Signage
  • Signals

Tagging Plans

CID attributes will be converted to OSM compatible tags as described on the project attribute conversion page

Changeset Tags

Changesets will contain conflated data related to a specific asset type and London Borough. The Comment tag will describe the feature type being conflated, e.g. “add traffic calming”. Changesets will be published from individual OSM accounts related to to the individual involved in the conflation process. Accounts have been created soley for and will only be used for the conflation of TfL CID data. These are:

Data Transformation

To transform the CID data to OSM, a Ruby script has been produced. This script is ran on a daily basis and generates candidate OSM entities to be manually inspected and conflated in JOSM. The basic process is as below:

  • Download current TfL CID data
  • Download current OSM data
  • Upload to a local PostgreSQL/PostGIS database
  • Apply a range of tests to compare CID assets with OSM data. Each asset type is then classified for manual conflation:
    • New – Not identified in OSM
    • To Check – Requires further manual checking against OSM
    • Full – Asset is matched to an existing OSM feature for the OSM features entire length
    • Partial – Asset is matched to an existing OSM feature for part of the OSM features length
    • Unmatched – Asset is not matched to an existing OSM feature

Data Transformation Results

The latest OSM output files from the script can be found here.

Data Merge Workflow

Team Approach

Initial conflation work will be conducted by an individual supported by GHD/Sweco/TfL. Once The process is suitably advanced, multiple resources will be brought in to assist conflation. All team members will take part in weekly reviews to consider the progress made and discuss any issues discovered during the previous weeks conflation process.

Process Workflow

Conflation to be conducted asset by asset

  • Obtain the latest osm.xml files from here
  • Import into JOSM
  • Obtain latest OSM data as a JOSM layer
  • Manually parse through CID features and determine if CID feature is either new, existing, or not relevant:
    • New Features - Copy feature from CID layer to OSM layer and inspect tagging for OSM compliance
    • Existing Feature – Check existing Osm tagging against CID and add additional or more detailed tagging where required and where OSM compliant
    • Not Relevant – Mark CID feature so will not appear in later CID exports

Process Optimisation

Optimisation of this conflation process is practicable for a subset of assets where the extent of the transformation is very limited. This is when tags of existing assets in OSM are incomplete or incorrect, and therefore do not include any geometry changes. These assets include CLT Footways Full/ CLT Footways Partial/ CLT Roads Full/ CLT Roads Partial/CLT Roads Separate/ CLT Roads Contra/ Sideroads at Junction/Crossings Tags Changed/Crossings to Check/Sideroads Existing/Crossings Junctions/Tables/Bumps road/Bumps cycleway. The functionality of this script has been tested and validated in the sandbox. This process will also be run on an asset by asset basis.

  • Obtain the latest osm.xml files
  • For the specific feature types identified above, a python script compares the asset in OSM and CID with matching osm_id, to identify instances where associated tags (OSM) are incomplete or incorrect.
  • The script transposes and commits this detail to OSM, with full manual inspection and validation in each instance

Changeset Size Policy

  • Changesets will be geographically local to a London Borough
  • Changesets will be limited to a maximum size of 100 conflated assets per commit

Revert Plans

Existing checks on the quality of CID data and ensuring the conflation task is conducted by well trained and supervised staff should reduce the risk for reversion of commits. However, the need for reverting commits may be generated by:

  • Comments made by OSM users against commits (we will regularly check for comments)
  • Issues encountered during the conflation process
  • Issues identified during QA checks

Should the reversion of commits be required, the Reverter plugin within JOSM will be used on selected changesets

QA

Quality Assurance will be conducted by both GHD/Sweco and TfL during the lifetime of the conflation process:

  • A random sample of 5% of CID conflated features will be independently checked against OSM
  • Checking will confirm asset location and tagging matches original CID data
  • Checking will confirm that existing relevant OSM tagging has not been lost or overly simplified during conflation
  • Checking will confirm that conflation follows OSM guidelines for feature tagging

Feedback

Feedback is very strongly encouraged, as soon as possible in the repo. We are seeking to resolve feedback and questions flagged with the approach and process as quickly as possible.

Please do discuss the data and related aspects noted above on the talk-gb mailing list.

We are happy to provide any clarifications, which will be added to this page, as a central repository of information about the project.