COVID-19 Contact Tracing with Neo4j
As COVID-19 surged through Belgium, data analysts in the densely populated Brussels-Capital region found themselves in a predicament.
There was a critical need to understand where the clusters of COVID infections were occurring, but they had so much data coming in from so many places that it was increasingly difficult to see how it all connected.
By the Numbers: COCOM COVID-19 Contact Tracing
- Reduction in triage time for COVID-19 response: 90%
- COVID-19 infection clusters identified: 5,000+
- Graph scale: 618,332 nodes, 587,475 relationships, 6,231,160 properties
- Platform: Neo4j Enterprise Edition
The Commission Communautaire Commune (COCOM) is responsible for health prevention matters in the Brussels area, for all citizens and collectivities in the territory.
In this sense, while the Flemish and French speaking communities are responsible for several of the collectivities, they are obliged to report potential clusters to the Preventive Medicine team of the COCOM. This also implies that the data of all Brussels citizens is sent to the Brussels MP team for analysis.Because uniform data was collected across all the subregions, COCOM analysts had enough information to create one large COVID data cluster for contact tracing.
The Pandemic Brings a Flood of Data
In early 2020, a small analytics team at COCOM used Excel macros to track a huge volume of data. Like others around the world, the team quickly became overwhelmed as more data poured in during peaks of infection. Every day the Excel file would get updated with new data from the labs and the call center, said COCOM analyst Ilona Hendrix.
“But as you can imagine, after a while, the file gets really big and it's too much for Excel to handle,” she said. “We needed to get a new solution.”
It’s All About the Clusters
The data team had insight into individual COVID-19 cases with its data in Excel, but to get outbreaks under control they needed a way to detect clusters of infection and track the spread of the disease.
A tabular data format like Excel is much like a list, and the connections between cases were buried in cells in that data. The analysts needed a fast, scalable way to track infections, identify clusters, and stop the spread.
They needed to visualize the connections and trace positive cases to identify whether, for example, a person who was on a flight might have brought it back to work or school.
The COCOM team had never faced anything like COVID-19. They had tracked infections before, and built contact tracing tools, but the scale – and urgency – was nothing like this.
Contact Tracing Is a Graph Problem
Hendrix, the COCOM analyst, said a colleague advised her to use graph technology for tracing connections across the large dataset she needed to analyze.
“We had a developer working for us who really recommended [Neo4j]. He's a brilliant guy,” Hendrix said. “When he showed us their website, I remember we were quite impressed, so we said, well, let's give it a go.”
Hendrix does not describe herself as a deeply technical person, but she knows basic coding and said that was all she needed to learn Neo4j’s Cypher query language.
Neo4j made it easy to share information with the medical teams. She could query the database in Cypher, but get the results delivered as a list, ready to share with the medical teams. “If they need some more personal information, or the ages of a certain cluster, you can make your lists like that as well,” she said. “It's a very flexible tool.”
Catching COVID Clusters
The COCOM team rapidly created a COVID-19 contact tracing system using Neo4j.
By identifying positive COVID-19 cases in overlapping time frames and locations, and then connecting exposure points, the team could plot out a visualization of COVID activity in a given area.
It’s all fueled by data. The first data source received were lab results for each positive test. Some results would include cycle threshold (CT) scores, a measure of infectiousness, as well as the disease variant, such as Delta or Omicron. From there, the data clusters were enhanced by adding a number known as the cycle threshold (CT) value, which indicates how much virus an infected person harbors. Adding CT values to the cluster helped medical teams prioritize response.
Call center agents then collected data from each positive case, including details like the flight a person had taken. Passenger Locator Forms (PLF) could then be requested, and all of this data connected at scale in the Neo4j graph database.
Here the Neo4j Browser shows a flight cluster that is linked to a household cluster. The pink bubbles show all the other passengers who were on the same flight as the people with COVID-19 in the cluster, shown in blue.
A flight cluster (pink) linked by a COVID positive passenger to a household cluster (blue)
The next graph shows two data clusters: one COVID-19 cluster and one work location cluster. Note that Person 8 (P8) is both in the COVID-19 cluster and in the work location cluster.
Person 8 links the COVID-19 cluster and the work location cluster
10x Faster Pandemic Response with Connected Data
The COCOM team created an effective COVID-19 contact tracing system using Neo4j.
Having so much data in one place and being able to analyze it quickly to identify clusters of infection and respond by investigating or dispatching medical personnel compressed overall triage time for COVID response – reducing it by 10x, Hendrix said.
“Because all the information is linked already and we have it readily available, it reduces the work of data analysts significantly,” she said. “Before I would have to spend 10 minutes or so per cluster, and now it's about one minute.”
The Future of Contact Tracing
With COVID-19 creating a real-life crisis-response situation, the data team had a very good proof of concept exercise with Neo4j.
“When we got to use the interface of Neo4j, it was really nice to see, ‘OK, this person was linked with this person through this cluster, for example, an organization, a work location, or a flight,” Hendrix said. “The graph made it much easier to understand the links between different people, and how it becomes a cluster in the end.”
And the benefits go beyond COVID-19. “Neo4j also made it very easy for us to investigate all of the data sources that we get,” said Hendrix.
As the COVID-19 pandemic ebbs and flows, the data team is assessing their strategies and processes and making refinements. They’re also looking into ways to use Neo4j to track other diseases.