Building a Digital Twin of the Largest Railroad in the Eastern U.S. with Neo4j
When you look at a visualization of information stored in a graph database, you’ll see a network of nodes and edges and all the ways they’re connected. Then, if you look at a map of a railway network, you’ll immediately start to notice how similar they appear to be.
This similarity is not lost on the IT team charged with tracking, tracing, monitoring, and reacting to what’s happening along the largest railroad in the U.S. eastern seaboard. CSX uses Neo4j to graph a railway system that services about two thirds of the U.S. population, with over 20,000 route miles, about 21,000 employees, and $12.5B of revenue.
As you can imagine, CSX’s operations are extremely asset heavy, with cargo getting hauled in and out of terminals day in and day out. The relationship between those highly fluid assets is vital to CSX’s core business. Neo4j helps CSX make sense of its network.
By the numbers: CSX
- Assets and equipment in Neo4j:
- 24,000 locomotives
- 6,700 train crew members
- 1,600 trains per day
- 263,000 rail cars
- 1,100 CSX mileposts
- Platform: Neo4j Enterprise
“If you think about the graph database itself, how its nodes and edges, the rail network is almost symbolic of that itself. If you think about all the rail lines, those are the edges, and each of our yards and terminals are nodes as far as how you transpire through the network, and how you send your goods and services,” said Dave Rich, head of enterprise architecture in IT services at CSX. “It’s extremely complex.”
Creating a Digital Twin of the Physical Network
As a nearly 200 year old company, CSX has its share of challenges related to legacy systems and manual reporting.
Rich and his team are tasked with modernizing the technology that keeps the railroad running and evolving over time. To achieve this goal the team focuses on reducing the amount of human bias that goes into its systems and developing a single source of truth for day-to-day operations.
Putting this data into a Neo4j graph has been “a great technology to help us improve the business of how we look at things,” Rich said.
The CSX team uses Neo4j to run a centralized repository of shipment data represented through a near-real-time graph in order to provide more accurate and granular asset tracking. To do this, the group effectively created a digital twin of its physical network, including locomotives, rail cars, customers, shipping orders, mileposts on the train track, and all the relationships between those elements that exist over time.
The Far Reaching Impact of Good Data
Having more visibility and an accurate, single source of truth is obviously helpful for CSX. But the impact of good data extends beyond internal operations. When a customer wants to know the whereabouts of a shipment, or a team needs to get a granular view of what is really packed in all those railcars, CSX can leverage its graph database to get the correct information.
“It’s increasing the frequency of information and leveraging the graph to extend APIs that can be interconnected into other business systems and to other transportation management systems that before would’ve been very complicated,” Rich said.
“This helps us unlock, from a commercial standpoint, how our customers can really integrate and become more a part of our supply chain and how we can frankly become a big part of their supply chain in the future.”
Growing with the Graph
As CSX comes to better understand its data, it has had to rethink its relationship in the graph. One challenge the team faced along the way was writing the business rules that fed into Neo4j. Because of the inconsistent flow of certain event information (and some manual reporting), the CSX graph was getting populated with duplicate nodes. The software engineers working on the project realized their approach was causing duplicates and they were able to solve the problem by rethinking how certain events were defined in their system.
We didn’t fully understand the data when we initially created it,” said Dean Schaefer, a software engineer with CSX, who presented at Neo4j GraphConnect 2022 in Austin. “And then we go back and you see these edge cases coming in and we recognized we needed to pivot our implementation.”
It was the depth and flexibility of the Cypher graph database query language that provided a solution through MERGE statements, essentially a put operation for a specified pattern.
“We essentially moved away from our state-dependent insertions to an immutable insertion strategy,” Schaefer said. “So instead of querying for your active trip looking up your equipment nodes and your milepost nodes, we pivoted our graph schema to create a new trip node for every event. Instead of trying to cumulate the events onto a specific, loaded or empty event to create that logical cohesion of a trip, We just create a new trip node that represents a point in time, an action in time for that piece of equipment.”
A Logistics Game Changer
The graph is well-suited for CSX. “Our rail network is almost a mirror image of our graph. If you think about the technology of the graph itself, again, that node and edge,” Rich said. “So it really helps us efficiently track and report and visualize hundreds of thousands of assets and the interrelationships over time and how they transpire.”
To put that into perspective, CSX’s Neo4j instance contains 24,000 locomotives that at some point have traversed through its database, 3,500 of which belong to CSX. The rest are locomotives from partner railroads. The system captures 1,600 trains per day, and over a quarter million rail cars, and more than 1,100 stations throughout the CSX network.
“So if you think about how complicated the relationships are between all of those different assets and all of the different events that are happening over time, the graph is really well-suited for us to kind of help understand the relationships and the events that are occurring day in and day out from a logistics standpoint on the railroad,” Rich said.