Neo4j Life Sciences and Healthcare Network

icon guide theory

Neo4j Use Cases in Life Sciences and Healthcare

If you work in biology, biochemistry, pharmaceuticals, healthcare and other life sciences, you know that you work with highly-connected information. Unfortunately, many scientists still use relational databases and spreadsheets as their daily tools.

Here we want to present you with an alternative. Managing, storing and querying connected information is natural to a graph database like Neo4j. Learn how your research and practitioner colleagues utilized Neo4j to draw new insights or just be more efficient in their daily work.

It started a while time ago in 2012 with a workshop at the University of Ghent bringing together people from the field with graph database experts.

Following that fruitful exchange we started the Neo4j-Biotech Google Group to encourage sharing and collaboration on that topic. If you are not yet a member, please join today.

Now we want to take it to the next level by providing you with a platform to present your projects and papers both here and on our blog, and giving you the opportunity to connect with other Neo4j users in your field.

If you are taking your first steps towards using a graph database, we offer support to jumpstart your efforts.

Why use a Graph Database

Graph Databases in Life and Health Sciences Workshop: Berlin, 21 June 2017

We are very pleased to announce our second workshop for researchers interested in sharing and learning about Graph Databases in Life and Health Sciences.

We are inviting researchers, practitioners and developers to present and attend.

More details, as well as registration information, can be found here.

In our past workshop in Ghent, we had topics covering

  • Neo4j in metaproteonomics

  • Graph databases in cancer research

  • Project collaboration networks and recommendations

  • Detailed studies of citation graphs

  • Connecting protein databases in a large graph model

  • "Reactome" database of human protein interaction pathways

Life Sciences and Healthcare Accelerator Program

The Neo4j Life Sciences and Healthcare Accelerator Program is designed to help researchers and practitioners in life sciences and healthcare-related sciences make sense of their data using Neo4j. Whether you are analyzing genome data, combining protein databases, investigating drug interactions or supporting practitioners with research or clinical information processing we want to help you find insights in connected (meta-)data.

If you are accepted into the program, you will receive 1-on-1 assistance from Neo4j engineers to help you with data modeling, data import, writing Cypher queries or anything else that we can to make you successful with Neo4j.

To get started just tell us about your project and how you think we might be able to help you.

The Hetnet Awakens - Understanding Disease Through Data Integration and Open Science

Daniel Himmelstein

Daniel Himmelstein’s Thesis Seminar for his PhD in Biological & Medical Informatics at UCSF.

Here are the slides and an online adaptation of the PhD Exhibit. Daniel was also interviewed on our Graphistania Podcast and created a fun Graph Gist as live documentation.

Proteomics and Graph Databases, the symbiosis of associations

Alejandro Brenes Murillo

The proteome is the entire set of proteins that are produced or modified by an organism. It is an element that varies with time, stress, environmental conditions or distinct requirements that a cell might have. Join this talk by Alejandro Brenes Murillo to see how graph databases can be useful for proteome analysis.

At the Lamond Lab in the University of Dundee, scientists are interested in modelling and understanding protein behaviour under different conditions and dimensions of analysis.

In order to achieve this goal, they use graph databases to integrate and model the proteomics data, and study its effect on a specific proteome. The dimensions of analysis are multiple, yet be it turnover, localisation, cell cycle, protein complexes or biological response to stimuli, discovering the behaviour of proteins is key to understanding how organisms function, and how disease affects them.

alejandro proteomics

Big Data in Genomics: How Neo4j helps to develop new drugs

Martin Preusse

Biomedical research generates vast amounts of data. New experimental technologies like DNA sequencing, metabolomics and proteomics drive the fast growth of available information and lead to a better understanding of the molecular organization of life.

But with big data comes a big question: How do we transform unstructured data into actionable knowledge? In the case of biomedical research, the key problem is to integrate the large pile of highly heterogenous data and use it for personalized therapies and drug development. Graph databases are an ideal way to represent biomedical knowledge and offer the necessary flexibility to keep up with scientific progress. A well-designed data model and Cypher queries can deliver in seconds what previously took days of manual analysis.

preusse genomics

Building a Repository of Biomedical Ontologies with Neo4j

Simon Jupp

In this lightning talk from GraphConnect Europe 2016, Simon Jupp of the European Bioinformatics Institute discusses the application they built to track ontologies. He also discusses why they chose Neo4j over various RDF and semantic web technologies, and provides some example queries.

Data Management in Systems Biology & Medicine

Irina Balaur, EISBM

An Integrative Framework for Data Management in Systems Biology and Medicine: Strategies for personalised medicine involve integration of large amounts of biomedical data, specific to multiple spatial and temporal scales, (including molecular data and patient clinical data). We have been developing a graph-database approach implemented in Neo4j to facilitate management (integration, exploration, visualisation, interpretation) of diverse types of biological and biomedical data.

Graphs Are Feeding The World

Tim Williamson, Data Scientist, Monsanto

Presentation at GraphConnect SF 2015.

Graph Databases in Life Sciences: Bringing Biology Back to Its Nature

Thilo Muth

Today’s life science research is about genes, proteins, metabolites, relationships, interactions and biological networks. Data storing and mining brings a huge potential for biologists, however classical storage formats such as SQL and Excel involve various issues, such as scalability and performance problems with data growth, complexity and accessibility. Finally, most of the storage models are far from biological reality: Graph databases and Neo4j meet the need in life sciences for an appropriate data and database model.

Open Tree Of Life

opentree final logo

The tree of life links all biodiversity through a shared evolutionary history. This project will produce the first online, comprehensive first-draft tree of all 1.8 million named species, accessible to both the public and scientific communities.

Assembly of the tree will incorporate previously-published results, with strong collaborations between computational and empirical biologists to develop, test and improve methods of data synthesis.

This initial tree of life will not be static; instead, we will develop tools for scientists to update and revise the tree as new data come in. Early release of the tree and tools will motivate data sharing and facilitate ongoing synthesis of knowledge.

Biological research of all kinds, including studies of ecological health, environmental change, and human disease, increasingly depends on knowing how species are related to each other.

Yet there is no single resource that unites knowledge of the tree of life. Instead, only small parts of the tree are individually available, generally as printed figures in journal articles.

This project will provide the global community of scientists who study the tree of life with a means to share and combine their results, and will enable large-scale studies of Earth’s biodiversity. It will also create a resource where students, educators and citizens can go to explore and learn about life’s evolutionary history.

Read more on the OpenTreeOfLife Blog

0606 - Open Tree of Life and Neo4j from Neo Technology on Vimeo.


Title Year Authors Affiliation

The Proteins API: accessing key integrated protein and genome information


A. Nightingale, R. Antunes, E. Alpi, B. Bursteinas, L. Gonzales, W. Liu, J. Luo, G. Qi, E. Turner, and M. Martin

EMBL-EBI, Wellcome Genome Campus, UK

Knowledge.Bio: A Web application for exploring, building and sharing webs of biomedical relationships mined from PubMed


R. Bruskiewich, K. Huellas-Bruskiewicz, F. Ahmed, R. Kaliyaperumal, M. Thompson, E. Schultes, K. M. Hettne, A. I. Su, and B. M. Good

Department of Human Genetics, Leiden University Medical Center, The Netherlands

Recon2Neo4j: Applying graph database technologies for managing comprehensive genome-scale networks


I. Balaur, A. Mazein, M. Saqi, A. Lysenko, C. J. Rawlings, and C. Auffray

European Institute for Systems Biology and Medicine (EISBM), France

STON: exploring biological pathways using the SBGN standard and graph databases


V. Touré, A. Mazein, D. Waltemath, I. Balaur, M. Saqi, R. Henkel, J. Pellet, and C. Auffray

European Institute for Systems Biology and Medicine (EISBM), France

miTALOS v2: Analyzing Tissue Specific microRNA Function


M. Preusse, F. J. Theis, and N. S. Mueller

Institute of Computational Biology, Helmholtz Zentrum München, Germany

An Integrated Data Driven Approach to Drug Repositioning Using Gene-Disease Associations


J. Mullen, S. J. Cockell, P. Woollard, and A. Wipat

Newcastle University, United Kingdom

HitWalker2: visual analytics for precision medicine and beyond


D. Bottomly, S. K. McWeeney, and B. Wilmot

Knight Cancer Institute, Oregon Health and Science University, USA

HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks


X. Dai, J. Li, T. Liu, and P. X. Zhao

Plant Biology Division, The Samuel Roberts Noble Foundation, USA

Representing and querying disease networks using graph databases


A. Lysenko, I. A. Roznovăţ, M. Saqi, A. Mazein, C. J. Rawlings, and C. Auffray

European Institute for Systems Biology and Medicine (EISBM), France

PanTools: representation, storage and exploration of pan-genomic data


S. Sheikhizadeh, M. E. Schranz, M. Akdel, D. de Ridder, and S. Smit

Bioinformatics Group, Wageningen University, The Netherlands

EpiGeNet: A Graph Database of Interdependencies Between Genetic and Epigenetic Events in Colorectal Cancer


I. Balaur, M. Saqi, A. Barat, A. Lysenko, A. Mazein, C. J. Rawlings, H. J. Ruskin, and C. Auffray

European Institute for Systems Biology and Medicine (EISBM), France

cyNeo4j: connecting Neo4j and Cytoscape


G. Summer, T. Kelder, K. Ono, M. Radonjic, S. Heymans, and B. Demchak

Center for Heart Failure Research (CARIM), University Hospital Maastricht, The Netherlands

Towards Implementing Semantic Literature-Based Discovery with a Graph Database


D. Hristovski, A. Kastrin, D. Dinevski, and T. C. Rindflesch

Faculty of Medicine, University of Ljubljana, Slovenia

Using Neo4j for Mining Protein Graphs: A Case Study


D. Hoksza and J. Jelinek

Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic

The MetaProteomeAnalyzer: A Powerful Open-Source Software Suite for Metaproteomics Data Analysis and Interpretation


T. Muth, A. Behne, R. Heyer, F. Kohrs, D. Benndorf, M. Hoffmann, M. Lehtevä, U. Reichl, L. Martens, and E. Rapp

Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany

SimiRa: A tool to identify coregulation between microRNAs and RNA-binding proteins


M. Preusse, C. Marr, S. Saunders, D. Maticzka, H. Lickert, R. Backofen, and F. Theis

Helmholtz Zentrum München, Institute of Computational Biology, Germany

Constructing a Graph Database for Semantic Literature-Based Discovery


D. Hristovski, A. Kastrin, D. Dinevski, and T. C. Rindflesch

Faculty of Medicine, University of Ljubljana, Slovenia

A systems biology approach toward understanding seed composition in soybean


L. Li, M. Hur, J. Y. Lee, W. Zhou, Z. Song, N. Ransom, C. Y. Demirkale, D. Nettleton, M. Westgate, Z. Arendsee, V. Iyer, J. Shanks, B. Nikolau, and E. S. Wurtele

Department of Genetics, Development and Cell Biology, Iowa State University, USA

Combining computational models, semantic annotations and simulation experiments in a graph database


R. Henkel, O. Wolkenhauer, and D. Waltemath

Department of Computer Science, University of Rostock, Germany

An alternative database approach for management of SNOMED CT and improved patient data queries


W. S. Campbell, J. Pedersen, J. C. McClay, P. Rao, D. Bastola, and J. R. Campbell

University of Nebraska Medical Center, Department of Pathology and Microbiology, US

Semantically linking in silico cancer models


D. Johnson, A. J. Connor, S. McKeever, Z. Wang, T. S. Deisboeck, T. Quaiser, and E. Shochat

Department of Computing, Imperial College London, London, UK

Global biotic interactions: An open infrastructure to share and analyze species-interaction datasets


J. H. Poelen, J. D. Simons, and C. J. Mungall

Center for Coastal Studies Natural Resource Center, USA

Are graph databases ready for bioinformatics?


Christian Theil Have and Lars Juhl Jensen

Department of Metabolic Genetics, University of Copenhagen, Denmark