Care-for-Rare Identifies Rare Childhood Diseases with Neo4j
Dr. von Hauner Children's Hospital uses a multiomic Neo4j graph database and machine learning models to make connections between pediatric patients and 8,000+ rare diseases
3.2 billion
nucleotides of DNA in the human genome
8,000
rare diseases tracked and identified with a clinical knowledge graph (CKG)
2,500
children in the multi-omic database so far, with efforts underway to expand worldwide
More than 2,000 children lose their lives to rare diseases every year in Germany, a number that rises dramatically worldwide. The journey to diagnosis is a race against time: it takes four to eight years on average to uncover the root of a rare disease, and 30% of affected children do not reach the age of five.
The Care-for-Rare Foundation was established in 2009 at the Dr. von Hauner Children’s Hospital in Munich to change this trajectory. The foundation’s global alliance identifies genetic causes of rare diseases and develops targeted treatments. Its work inspires hope for improved access and outcomes for children around the world.
Doctors at the Hospital harness precision medicine to customize treatments that effectively target these elusive conditions. Researchers use a method called deep phenotyping to document rare genetic mutations in pediatric patients, linking detailed disease manifestations to tiny genetic variants known as single nucleotide polymorphisms (SNPs). Each SNP – a minute variation among 3.2 billion DNA nucleotides – can connect a child to one of thousands of rare diseases. Despite these odds, the hospital successfully identifies genetic variants in 30% of its patients, enabling accurate diagnoses and tailored treatment plans. Every month gained for a young patient can be the difference between timely treatment and irreversible decline.
The Foundation’s genomic analysis solution is called the Clinical Knowledge Graph (CKG) and is built on Neo4j’s enterprise graph database, enhancing the likelihood of successfully identifying rare diseases as its data grows. This emerging field of multiomics combines multiple biological datasets – such as genomes, proteomes, and transcriptomes – to pave the way for new, potentially life-saving treatments. The Care-for-Rare Foundation’s ecosystem approach also addresses equity gaps, so that more children in more countries can access vital diagnostic services.
Prior to using knowledge graphs, doctors and scientists struggled to integrate and analyze these diverse datasets, including:
- Transcriptomic data from RNA
- Proteomic data from cellular proteins
- Medical histories
- Blood and urine test values
- Doctor’s notes
- Drug targets, proteins, and other attributes
Each new data source had the potential to unlock breakthroughs; but joining these large and disparate data points in a relational database required complicated SQL queries that were unsustainable to maintain. “As a children’s hospital, we don’t develop software”, says Daniel Weiss, Head of Bio IT for the Dr. von Hauner Children’s Hospital at Ludwig Maximilian University (LMU). “We need technology that minimizes time-consuming manual intervention,”
Relational databases struggled to manage dense, interconnected structures in medical data. These databases require numerous joins between tables, leading to poor performance as data complexity grow. These performance challenges were also at odds with the clinic’s need to expand its datasets.
Dual Graph Approach Keeps Patient Data Safe for Collaboration
Inspired by the Max Planck Institute of Biochemistry, which successfully uses a Neo4j-powered CKG for proteomics, Weiss and his team sought a graph database tailored to their needs. “It was easy for us to get started with Neo4j and connect our biomedical data sources, giving us a short path to creating a knowledge graph,” says Weiss.
Each child in this knowledge graph is represented as a node, linked to nodes for symptoms, proteins, phenotypes, and other relevant data. Researchers use Cypher, Neo4j’s graph query language, to uncover relationships and patterns that can lead to accurate diagnoses. The knowledge graph integrates data from 2,500 pediatric patients, with plans to expand to 5,000 within three years.
Weiss’ team also uses advanced algorithms to establish causal links between data points — a task that exceeds the capabilities of a single hospital. “We want to use our clinical knowledge graph to predict outcomes for every child in our intensive care unit with an unclear diagnosis,” says Weiss. “But often, we don’t know why a child is ill. That’s incredibly challenging. We have the data, but we can’t find the needle in the haystack.”
LMU uses a dual graph approach to overcome this challenge while safeguarding patient privacy. An internal on-premises graph database houses sensitive patient information, while a second cloud-based graph hosts synthetic data to develop AI apps. These apps are sent in docker containers via FeatureCloud to the real world patient data inside of clinics. This Federated Machine Learning service enables multiple partners to connect and run algorithms that learn from all the partner data without sharing live patient data.
“Our strategy is to open the dataset so that more partners can contribute,” Weiss adds. “The classic rule for AI is that you need 10 times more examples than parameters. Graph Data Science and Graph Machine Learning uses context knowledge to overcome this paradigm. These techniques aren’t common, but more access for more teams of AI developers will also make for more breakthroughs.”
Joining Forces to Cure Rare Pediatric Diseases and Save Lives
Weiss led the hospital’s recent participation in a hackathon for AI challenges at the Technical University of Munich. Students developed Python models using machine learning (ML) to make predictions using synthetic patient data. The most promising models were tested against the hospital’s Neo4j clinical knowledge graph to validate the accuracy of those predictions using F1 scores, an ML evaluation metric that measures a model’s accuracy. The event proved the promise of engaging AI developers to accelerate innovation while protecting private clinical data.
LMU’s graph also incorporates data from the DrugBank database, providing insights on drugs approved by the US Food and Drug Administration (FDA), plus detailed data on drug targets, proteins, and other attributes. Researchers can more easily explore if a drug already approved for one purpose might target some of the same proteins involved in another disease.
Weiss envisions that startups and companies in the pharmaceutical and life sciences sector will leverage this growing database of clinical data to develop new medicines. The hospital is also part of the European Children’s Hospitals Organisation (ECHO) to facilitate participation from hospitals in other countries. Meanwhile, LMU is migrating its external graph to AWS for collaboration by public and private partners around the world.
Through global collaboration, the Care-for-Rare Foundation is speeding up time-to-diagnosis and fostering greater hope for families facing rare pediatric diseases. “We’re on the cusp of many medical breakthroughs, says Weiss, “as we use AI and our Neo4j knowledge graph to create a complete personalized medicine ecosystem for rare diseases, enabling hospitals worldwide to better serve children in need.”
Get in Touch
Curious about what insights you could unlock for your business with graph-powered solutions? Let’s talk — reach out, and we’ll get in touch.