The Australian Government’s Department of Infrastructure helps the country make informed transportation policy decisions for roads and vehicles by developing transportation and communications legislation and planning new infrastructure development.
A key part of this work for David Mitchell, Director of Transport Research and Modelling, is using modern data analytics to forecast trends in road trade activity. His transportation team has historically relied on transportation surveys, but recently they’ve decided to harness other sources of data, such as telemetry, to improve their answers to key Australian transportation questions. Richard Green, Senior Economist and Assistant Director of Transport Research and Modelling, took the lead on this project, and his data explorations led the Department of Infrastructure to Neo4j.
The Department of Infrastructure is joining together two sets of data to improve their analytics of the Australian transportation system. The first dataset is GPS pings from IoT devices in tracked vehicles, providing timestamped information about a vehicle’s identity, location, and speed. The second dataset is the national map of roads, rest stops, and refueling locations.
Combining these two datasets together creates network effects that increase the richness of both datasets. A vehicle’s series of GPS pings now becomes a journey on a particular path of roads, with slow-downs for traffic, stoppages for fuel and rest stops, and, in unfortunate cases, accidents. Multiply these journeys across all tracked vehicles in Australia and a full picture emerges of how Australian commuters and road freight move around the country.
This allows the Department of Transport to ask and answer key questions: What portions of the road network create the greatest safety risks for drivers and would benefit from infrastructure investment? Can drivers be put on more efficient journey paths? What rest stops and refueling stations are used the most?
Observed changes in this set of transportation data proved to be an early warning sign of the economic downturn caused by the COVID pandemic. Early in 2020, the team noticed how commuting traffic congestion in the data set disappeared, especially in urban freight routes during what were previously peak commuting hours. During Australia’s second lockdown, the team saw another decrease in urban commuter traffic, but the reduced commuter traffic created positive benefits for freight traffic – fewer commuters on the road meant faster travel times for freight drivers.
Unsurprisingly, tracking the entire Australian transportation system creates a lot of data. Right now, the data takes up 3.7 terabytes on disk, and that’s before adding 350,000,000 (350 million) new GPS pings per month! The total volume of data stored currently is 4,900,000,000 nodes (4.9 billion) and 14,600,000,000 relationships (14.6 billion). It’s all but impossible to store this on a single database AND make it performant.
The Department of Transport chose to use Neo4j as their database for this project for two reasons: massive scalability and ease of development. Neo4j’s ability to infinitely shard data across distinct databases and machines – then seamlessly join these distinct databases together in unified queries through its Fabric feature – created an easy solution for their huge data challenge.
Despite having no in-house Neo4j expertise at the beginning of this project, Richard Green, the Senior Economist and Assistant Director of Transport Research and Modelling, found getting started developing with Neo4j to be easy. Part of this is due to the intuitive Cypher query language, which is designed to mimic the patterns of nodes and relationships users search for in their data. The other great benefit to getting started with Neo4j is its rich documentation.
Richard described getting started: “Generally, it was quite easy… I just needed to read documentation. And, where I’ve needed to have other users come in, it was relatively easy to explain what needed to be done and to explain the logic of the query language.”
Analytical opportunities abound in this rich and gigantic dataset, and the Department of Transport is certainly making the most of it. This dataset is going to contribute to Australia’s National Freight Data Hub initiative, a combined government and industry project to share information about the domestic freight industry. One of the key features of the Freight Data Hub is providing easily digestible visualizations that make sense of this huge and dynamic dataset.
This set of data is especially helpful at answering questions about potential future government policies, including the possibility of charging vehicles for their use of the road network. If such a policy is implemented, the tracked GPS pings would allow the Australian government to see a vehicle’s actual use of the road network, creating an audit trail when it is compared to a vehicle owner’s self-reported use of the network.
Graph technology has opened up new analytical possibilities for Australia’s Department of Infrastructure, Transport, Regional Development and Communications. Not only does graph technology provide a way to seamlessly meld disjoint datasets into something intuitive and useful, but they are simple to get started with and can scale to cover the transportation network for an entire continent.