Machine Learning and Graph Technology Accelerate Medical Research

The Challenge

For most types of cancer, early diagnosis improves patient survival rates. The problem is that many tests used to diagnose cancer are invasive and require special equipment.

For example, stomach cancer is still diagnosed with an endoscopy, technology from 1965. Further, endoscopies are not an effective screening tool: only 2% of patients screened have stomach cancer. A simpler diagnostic method could eliminate unnecessary procedures and increase detection.

Miroculus saw the promise of microRNAs for cancer detection, but microRNAs were thought to be locked in cells. In 2008, one of the company’s advisors discovered circulating microRNA. Whenever there’s a problem in the cells at the tissue level, they break apart and release microRNAs into the bloodstream.

Today, detecting microRNAs still requires highly skilled scientists, expensive reagents and machinery and very complex protocols. Miroculus saw the potential to change the face of microRNA detection.

Conducting a study to find a microRNA biomarker for stomach cancer meant keeping up with an explosion in related medical research. Typical research methods involved searching for articles, selecting a relevant article, attempting to retrieve it and assimilating it.

With the increase in microRNA research, absorbing all the relevant information would take several lifetimes. Miroculus needed a way to accelerate that process and connect scientists directly to pertinent research.

The Solution

Miroculus wanted to find a microRNA biomarker for stomach cancer. It is a compelling research area – of 1 million people diagnosed with stomach cancer, 80% survive less than 18 months.

Their work required keeping up with the latest publications connecting genes, diseases and microRNAs. “In order to make sense of all the newly available microRNA information, we stored this high volume of data in a searchable graph database,” said Antonio Molins, VP of Data Science at Miroculus.

The Miroculus team gathered more than a billion articles in Hadoop. Next they used natural language processing (NLP) to extract specific sentences with keywords for gene, disease and microRNA. Inferring the relationship between keywords required yet another step. The team developed an unsupervised machine learning model to classify relationships, which are then stored in Neo4j.

“We think it’s good to use the right tool for the right problem,” said Molins. “Graph databases are the right tool if you are focusing on relationships.”

The team created an interactive visualization searchable by microRNA, gene and disease. The user’s search criteria becomes the central node of the visualization, with surrounding nodes connecting specific microRNAs. Selecting a particular node pulls up papers that relate them, with the specific sentence cited and a link to the publication.

With the latest medical research in hand, Miroculus designed a study following FDA guidelines. The study – conducted in collaboration with the NIH, the National Cancer Institute and experts in Chile – included 650 people eligible for an endoscopy to diagnose stomach cancer.

Download Case Study