Elsevier is a Dutch publishing and analytics company specializing in scientific, technical and medical content. The company had started work on a Core Citation Graph (C-Graph) project, based on Neo4j graph technology, to query its huge network of scientific content. Its dataset includes articles, authors, organizations, Journal, books and links between them, forming a network of billions of nodes.
The project was progressing well when the COVID-19 crisis hit.
Elsevier wanted to do everything it could to help researchers working on drugs and vaccines. The company responded swiftly to support researchers, starting by making all its proprietary research papers on COVID-19 in the SCOPUS and ScienceDirect platforms that are freely available to researchers.
Then, Finlay MacLean, an Elsevier data scientist and Neo4j champion, came up with the idea of applying the graph technology being used for the C-Graph project to COVID-19 research. It rapidly became clear that Neo4j technology could help create a knowledge graph based on published and open source data around COVID-19.
Accelerating Time to Market
Getting drugs to patients quickly is key, so for that reason researchers are focusing on drugs that are already FDA approved. It takes over 10 years to get a drug tested and approved through the FDA, but the trials to repurpose an already approved drug are much shorter.
With graph technology, Elsevier was able to build a polypharmacology model and predict which combinations of existing approved drugs COVID-19 patients would be more likely to benefit from.
At the same time as investigating underlying molecular mechanisms that could be a target for treatment, the Elsevier team turned its attention to determining the interactions of drugs with key proteins related to other coronaviruses, such as SARS-CoV-1 and MERS-CoV.
As there was little data available on COVID-19, Elsevier data scientists ran a query that looked for human proteins that have been shown by scientists to be increased or decreased by coronaviruses MERS and SARS.
Elsevier data scientists identified drugs that could downregulate or inactivate the proteins that SARS and MERS activated. The data is stored in a graph database that displays the links between studied drugs, proteins and MERS and SARS diseases. The Neo4j graph platform made it extremely easy to collect the data needed in order to build these machine learning models.
Elsevier scientists were able to produce a simplified graph to help researchers in understanding the mechanistic pathways of COVID-19 in a matter of weeks. Without Neo4j this endeavour would have taken months.
In some cases, the side effects of multiple therapies in combination are sometimes so serious that they outweigh the benefits that the patient receives from the treatment. The ideal combination of drugs would be able to counteract the effects that the virus was having on COVID-19 patients without serious side effects.
A regular relational database could not handle the complex data interdependencies involved in identifying effective and safe drug combinations in real time. Graph technology makes light work of uncovering these dependencies.
Elsevier used identifiers of the proteins linked to external identifiers in reviewed databases, such as HGNC and Uniprot databases. It preserved the provenance of the extracted information by providing a link to the original PubMed identifiers. The dataset is now freely accessible for download and exploration on Mendeley. Anybody who wants to merge Elsevier’s data set with other publicly available databases can do so because the identifiers are shared.
Advancing Scientific Research Beyond COVID-19
COVID-19 is not just a viral disease – it causes all sorts of immunological effects that are not specific to COVID-19. In the future, Elsevier expects to use graph technology to support research into other types of immunological diseases. The Neo4j-based platform replaced a system that was expensive, slow and difficult to maintain. Previously, Elsevier used 90 nodes for processing queries. Only nine are needed for the Neo4j platform to handle 300m data points and one million updates per day.
Graph technology makes linking easier and quicker, driving continuous updates and generating far more relevant notifications of articles of interest to subscribers. Using Neo4j, it is straightforward to change query metrics, or run a different query. There is no need to change the model or the way the data is stored. Users have quickly latched on to the simplicity of the platform and are finding new use cases every day, discovering new links and connections.
Elsevier plans to develop the C-Graph further, expanding the Citation network to research network. And has also begun work on providing more useful graph-based rankings for the search results. This way, it plans to bring more targeted insight to many more researchers, with the ultimate goal of accelerating and enhancing scientific discovery.
Discover Graph-Based Search