Transport For London Aims to Cut Congestion by 10% and $750 Million a Year With a Digital Twin Powered by Neo4j

How a digital twin of the world’s most intricate transport network, built on Neo4j’s graph solution, boosts incident response time, improves journeys for millions and lays the groundwork for the metropolis of the future.


  • Estimated congestion reduction: 10%
  • Estimated annual economic savings: $750 M
  • Time savings value for every driver: $1,500 per year
  • Platform: Neo4j Graph Database on AZURE

Transport for London (TfL) is in charge of running and maintaining London’s transport network of road, rail, and underground, one of the largest and most complex in the world. Its mission: to ensure that nine million residents and almost twenty million annual visitors can travel safely and easily, moving London forward in a healthy, inclusive and sustainable way. 

That’s no small feat when you consider that around 80% of journeys in London take place on roads, equating to over 3.7 billion trips per year. 

As one of the most visited capital cities on the planet, transport lines are also a vital lifeline for London and the wider country. Managing this network is a deeply complex, intricate challenge; monitoring it alone is a technical feat when you consider there are 65,000 roads alone.

Reacting to incidents on those roads is even harder – but essential. London experiences 20,000 unplanned transport incidents yearly, and each passing minute left unaddressed means traffic jams build exponentially. Congestion in the city costs London $7.5 billion per year in lost labor alone, on top of the stress and inconvenience for road users.

What if you could bring together real-time data on all those roads and spot an incident before someone picks it up on CCTV? Breaking a traffic jam within seconds rather than minutes could save the city countless hours and cut the pollution created by stationary vehicles.

That was what one pioneer at TfL set out to do – with the help of Neo4j.

Using Graph to Power a Digital Transport Twin

“For a long time, TfL took a totally reactive approach to data,” says Andy Emmonds, Chief Transport Analyst at TfL. 

One of the main challenges to using a digital solution to solve London’s congestion problem was the prevalence of low-quality and disparate travel data.

Many journeys are private and multi-modal (you might drive or cycle, then catch a train, then walk), making them hard to track. Meanwhile, TfL’s historical approach was to collect distinct data sets, which meant they could only answer a fraction of the questions the team wanted to ask. 

TfL collects terabytes of data every week, but because of how that data is stored and analyzed – separately – no meaningful conclusions can be drawn based on the relationships between datasets. Insufficient sensors to gather fresh data, like cameras and telematics, often means that TfL only gets insight into traffic incidents once they’ve been visually spotted as well.

“We were effectively using this disparate data through Excel sheets,” explains Andy. “None of this data was aligned or real-time, and what we needed was to be a real-time operator – to do that, we needed a digital twin.”

A digital twin is a computer replication of physical phenomena – in this case, of London’s transport network – in which if/then scenarios can be tested before the system is deployed in the real world. And it’s exactly what Andy identified that TfL needed to deal with its congestion challenge. 

He quickly understood that using a graph would be the most efficient, cost-effective, and performant way to power such a model. TfL needed a way to uncover hidden relationships and patterns across billions of data connections to make the decisions needed to predict and handle traffic incidents. Graphs enable people to store and examine the connections between data points as data itself, much in the same way commuters think about the routes and connections in their daily travel. 

“We found that real-time data can only be solved by a graph database because a graph database is an agile and adaptive way to interpret granular data at scale,” says Andy. 

A road link is a node – it’s a route from A to B that has many properties, intrinsically suited to graph for this reason, compared to those cumbersome spreadsheets TfL previously relied on.

“Trips and routing can only be efficiently managed through such a database,” adds Andy. For his team, Neo4j’s graph solution was the way forward.

When Every Minute is Worth $14,000

TfL’s goal is to dramatically improve its ability to detect and address incidents on London’s road network in as close to real-time as possible, which has massive financial implications – every minute of delay creates negative outcomes. 

Currently, it takes TfL between 14 and 17 minutes to detect an incident. By the time it’s spotted and interventions put in place, an average of 27 minutes have been lost in terms of traffic buildup. That means every minute of delay from an incident’s occurrence is worth $14,000.

“What we’re trying to do here is reduce that intervention curve. If we can bring that intervention window back to a minute or two minutes, then that delay curve for the whole incident is much, much reduced,” says Andy.

“Congestion costs London £6 billion ($7.5bn) a year, we can make a big dent in that through managing this operation in real-time.”

Making Real-World Decisions in a Virtual Environment

So what does TfL’s transport digital twin look like, and how exactly does graph power it? The twin consists of five layers:

  1. Digital twin data: the first level of the model, where input data is aligned with the business challenge
  2. Framework: the data is organized to solve the challenge
  3. Graph database: the data is set up so it mirrors the physical network it is modeling 
  4. Visual layer: The data is sent to TfL’s control room for interpretation
  5. Plug and play layer: The data is used to solve different road problems

With Neo4j’s graph solution, TfL could connect and feed those data sets into the digital twin. To try out its new solution and to see what real-time insights it would provide them, TfL set up a stage rehearsal – which yielded results almost immediately. 

Says Andy: “We set up a test product which was fed data powered by graph that could tell us in near real-time if there was a problem on the road. On the day of the test, the system detected five incidents that the control room didn’t pick up. That was the proof in the pudding for us.”

What was working with Neo4j like for Andy and his team? “We worked really closely with the Neo4j team, and they became close collaborators. Ultimately, what we’ve created is a holistic product. It has enabled us to go back to the drawing board, establish new networks and new ways of thinking, and unlock efficiencies.”

How TfL Will Cut Congestion Costs by $750 Million

TfL hopes its digital twin will also play a crucial role in its vision to cut congestion by 10% – a result worth $750 million per year to the capital and over $1,500 in time back per driver per year according to its own estimates.

Andy and his team are looking to the future too. Using the new solution, Andy hopes to build an optimizer for peak traffic days, for example, when a stadium event is happening, to best plan and control routes across the network driven by data from the digital twin.

Further down the line, Andy and his team expect to use their solution to build emission reduction strategies for London and even lay the foundation for an autonomous vehicle network. 

“The great thing about a solution like this is that the architecture is open and agile,” explains Andy. “There’s nothing stopping us from using it to build and understand the metropolis of the future, and to me, that next step is making London’s roads autonomous and green.”

Get in Touch

Uncover hidden insights with Neo4j’s graph database and analytics. Let’s talk.