Knowledge Graph-Powered Diabetes Research
Challenge
About seven million Germans suffer from diabetes – one of the biggest common national
diseases. In order to better understand its causes, DZD scientists examine the disease from
different angles.
“Our goal was to enable access to the data across locations, disciplines, species, and data types,” said Prof. Dr. Martin Hrabě de Angelis, DZD board member. “At the same time, the more than 450 scientists in the DZD should also be able to access external expertise.”
Solution
Launched in 2017, DZDconnect, the DZD’s knowledge graph built on Neo4j, serves affiliated healthcare and medical professionals. Layered on top of the DZD’s relational databases, DZDconnect links the systems and data silos of the health centers.
Knowledge graphs offer a rich platform for incorporating and connecting more and more data, at scale. DZDconnect is updated with the latest medical research. Natural language processing (NLP) reads in and automatically annotates more than 30 million publications from the PubMed data. Algorithms perform a semantic analysis of the texts, classify relevant entities, and link them to internal information in the database.
“Reading and absorbing information from the latest publications is simply not feasible without assistance from technologies such as NLP,” explains Dr. Alexander Jarasch, Head of Data and Knowledge Management, DZD. “Currently, it still takes about 1.5 seconds to analyze an abstract on a decent machine. While that sounds fast, it would actually take about a year and a half to summarize all 30 million publications. Our approach of using NLP and graph technology runs in parallel and is automated in the background.”
The Neo4j Graph Data Science Library plays an important role. One goal is to identify different subtypes of type 2 diabetes to provide better therapy (precision medicine). With the help of the integrated algorithms, scientists can subdivide the dataset. Based on predefined parameters, the community detection algorithm identifies patient clusters, allowing researchers to investigate them more precisely. Algorithms find attributes of the diabetes subtypes and identify shared characteristics (e.g., height, weight, medication, or genetic defect).