Machine Learning and Graph Technology Accelerate Medical Research
Challenge
For most types of cancer, early diagnosis improves patient survival rates. The problem is that
many tests used to diagnose cancer are invasive and require special equipment.
For example, stomach cancer is still diagnosed with an endoscopy, technology from 1965.
Further, endoscopies are not an effective screening tool: only 2% of patients screened have
stomach cancer. A simpler diagnostic method could eliminate unnecessary procedures and
increase detection.
Miroculus saw the promise of microRNAs for cancer detection, but microRNAs were thought
to be locked in cells. In 2008, one of the company’s advisors discovered circulating microRNA.
Whenever there’s a problem in the cells at the tissue level, they break apart and release
microRNAs into the bloodstream.
Today, detecting microRNAs still requires highly skilled scientists, expensive reagents and
machinery and very complex protocols. Miroculus saw the potential to change the face of
microRNA detection.
Conducting a study to find a microRNA biomarker for stomach cancer meant keeping up with an
explosion in related medical research. Typical research methods involved searching for articles,
selecting a relevant article, attempting to retrieve it and assimilating it.
With the increase in microRNA research, absorbing all the relevant information would take
several lifetimes. Miroculus needed a way to accelerate that process and connect scientists
directly to pertinent research.
Solution
Miroculus wanted to find a microRNA biomarker for stomach cancer. It is a compelling research
area – of 1 million people diagnosed with stomach cancer, 80% survive less than 18 months.
Their work required keeping up with the latest publications connecting genes, diseases and
microRNAs. “In order to make sense of all the newly available microRNA information, we stored
this high volume of data in a searchable graph database,” said Antonio Molins, VP of Data
Science at Miroculus.
The Miroculus team gathered more than a billion articles in Hadoop. Next they used natural
language processing (NLP) to extract specific sentences with keywords for gene, disease and
microRNA. Inferring the relationship between keywords required yet another step. The team
developed an unsupervised machine learning model to classify relationships, which are then
stored in Neo4j.
“We think it’s good to use the right tool for the right problem,” said Molins. “Graph databases are
the right tool if you are focusing on relationships.”
The team created an interactive visualization searchable by microRNA, gene and disease. The
user’s search criteria becomes the central node of the visualization, with surrounding nodes
connecting specific microRNAs. Selecting a particular node pulls up papers that relate them, with
the specific sentence cited and a link to the publication.
With the latest medical research in hand, Miroculus designed a study following FDA guidelines.
The study – conducted in collaboration with the NIH, the National Cancer Institute and experts in
Chile – included 650 people eligible for an endoscopy to diagnose stomach cancer.