Driving Preventive Care and Patient Journey Visibility

The Challenge

A large state health insurance company needed visibility into vast amounts of patient data to improve members’ health. Using Neo4j along with Databricks, data scientists rapidly pull together touch points in a graph and analyze patient journeys to determine when to engage with their members.


By the numbers: Patient Journey Insights Project
  • Graph scale: 1.4 billion nodes, 2.8 billion relationships
  • Data scientist adoption: 150% increase
  • Platform: Databricks and Neo4j Enterprise Edition with Neo4j Graph Data Science on Microsoft Azure

In the U.S. alone, healthcare for chronic conditions costs $1.5 trillion annually. But the real-world impact of these conditions on people, their quality of life, their longevity, and their loved ones cannot be quantified.

Preventive care can offer a better quality of life – and a longer one, too.

A large state health insurer serves 3.5 million members. With such a vast population of patients, the company sees opportunities to garner insights. They decided to look at people who manage chronic conditions well for the purpose of sharing those insights with other members who could benefit from them.

The Solution

The business prioritizes data science initiatives, specifying which chronic health conditions they want data scientists to target, such as congestive heart failure (CHF). Then the data science teams get to work.

“The VP of our division came to us with a high-level problem,” said the data science team lead. “Our VP wanted us to figure out the next best action for our members based on where they are in their clinical or non-clinical journey. Essentially, if the member did A, B, and then C, what should D be?”

The team thought about the journeys hidden in insurance claims. What if they could connect all the events associated with a patient?

Journeys Are a Graph Problem


The data science team wanted to see all the claims connected to a single person. “We started hashing it out and thought about how we would do it, at a conceptual level,” said the data science team lead. “We said, ‘This is really a graph problem. We want to connect all the elements of a member’s journey and figure out what the next best step is.’"

Studying individual patient journeys requires the granularity of a graph. “We don’t want to look at columns and aggregate them using GROUP BYS,” said the team lead. “We need to get down to the member level and follow all these members based on the path they're taking and identify patterns in member paths. That's how we came to the conclusion that we have a graph problem.”

The Need for an Enterprise Solution


The data science team then started looking at technology. “Once we identified that the nature of the problem was graph-based, we started looking at which data technologies could best support that scenario. We looked at various graph database solutions out there, and it seemed to us that Neo4j is the name of the game,” said the team lead. “There are other products out there, but Neo4j is at the forefront and is the one blazing the trail. Neo4j also has a much stronger enterprise story than a lot of the others – which, obviously, in a highly regulated industry like ours, is important.”

One example is Neo4j’s role-based access control. The team lead was impressed “by the fact that we can control even down to the node and label level, who can see what.” Sensitive information can be locked down in a granular way.

Sensitive data around members – which includes the insurer’s employees – can be obscured. For example, no one needs to know a member’s name or location to study their patient journey. Neo4j can prevent read, write, and update access to graph data while still empowering data scientists to traverse and analyze the full graph using graph algorithms.

Driving Productivity with Databricks


The data science team was primed for Neo4j; they had already adopted Databricks, a cloud-based platform that accelerates machine learning (ML) workflows. “Databricks has enabled us to do a lot more and move a lot faster than previously,” said the team lead.

Using Databricks has increased data scientist productivity.

“We have the same number of data scientists, but they're able to do a lot more, faster,” he said. Data scientist adoption rose 150 percent as the whole team jumped on board, a marked contrast from the lackluster uptake for their on-premise platform. “Now with Databricks, all the data scientists are really excited. We have the whole group on there. We went from nobody wanting to use it to people even outside our team wanting to take advantage,” said the team lead.

US Health Insurer Architecture

Connecting a Vast Amount of Health Data


The insurer has a wealth of data with potential to help members, including claims and explanations of diagnosis and procedure codes.

Neo4j is the perfect complement to Databricks; the data science team loads their graph using the Neo4j Connector for Apache Spark from within Databricks. “We use the Neo4j Spark Connector inside Databricks to help manage the ETL, because it's a ton of data,” said the team lead. “We're getting data for all of our members, all of their claims, and all the data that goes along with those claims.”

“We're trying to load in as many of the different touch points around our members as possible, clinical and non-clinical, and quickly connect them in a graph so that we can better identify when we need to engage with our members or intervene and how best to do that,” he said.

Using Spark queries via Databricks, the data scientists pull in large amounts of data from the enterprise data warehouse, landing it in the Databricks Delta Lake where they clean and reshape it. Next, also from inside Databricks, they use the Neo4j Connector for Apache Spark to load the data into Neo4j.

Using Neo4j Graph Data Science, the team runs queries and graph algorithms that identify patterns in patient journeys.

Wisdom from the Journeys


The health insurance company is just beginning its graph-based identification and exploration of patient journeys, and applying the insights gained from them. And although their graph scale is already impressive, with 1.4 billion nodes and nearly 3 billion relationships, there is plenty of data yet to be added.

For example, the data science team uses natural language processing (NLP) to harvest health information from provider’s notes, test results, and more. They use Named Entity Recognition to get data about all the relevant entities. The model also brings back relationships between the entities they defined. The team could store both the entities and their relationships in Neo4j.

The data science team is at the forefront of innovation, using state-of-the-art tools and techniques to recommend the next best step for members. The best part is that the insurer is learning from the vast data it already has and making sense of its inherent connections and patterns to support members with chronic conditions.