Graph Databases in Life Sciences WorkshopAs Bio-Technology is one of the hot topics of the century and graph databases are on the rise in this decade, we thought it would be a good idea to bring researchers and bioinformatics developers together for a workshop about the applicability of graph databases in biological research and application.
Fortunately Prof. Lennart Martens a group leader in the Department of Medical Protein Research at VIB and Ghent University offered to host the workshop. So Neo Technology’s Rik Van Bruggen and Lennart Martens organized the workshop and invited a host of attendees from a variety of backgrounds.
poster of the metabolic interaction pathways in humans.
After the introduction by Lennart and Rik, I ran a quick intro to NOSQL and graph databases in particular and their applicability in a wide range of fields, also with some reference to existing biotech applications.
Pablo Pareja of Oh no sequences! presented Bio4j an open-source research database (and platform) integrating many different sources for protein, genome and taxonomy information. Bio4j also runs on Neo4j and currently holds almost 1 billion relationships. (Slides 1, 2, 3)
In the time until lunch I answered some questions about Neo4j especially about the roadmap, scaling and we highlighted some visualization approaches, like Gephi, Cytoscape and HivePlots.
During the breaks and over lunch we had lots of interesting discussions about life sciences in general, working with scientist and particiular data management problems.
After lunch, Anthony Liekens presented biograph.be a knowledge discovery system for finding relevant information in the area of life science, e.g. proteins in reactions ranked by their publication relevance. The system employs a page rank algorithm that is implemented using matrix multiplication on a parallel processing system.
Davy Suvee of Janssen Pharmaceutica and datablend.be presented different Graph Database usecases from his experience at a big pharmaceutical company. He closed the presentation with an intro to a time-traveling graph implementation on top of Datomic called FluxGraph.
Thilo then introduced the topic of the workshop “Graph Databases in Life Science” and the “Reactome” database of human protein interaction pathways. He discussed some Neo4j APIs and how they can be used to import the data from flat CSV files into a graph database. The attendees set up their development environment with the Neo4reactome project that we prepared upfront and ran the import successfully.
use-cases, first visualizing pathways in the Neo4j Web-UI and then running several queries using Neo4j’s query language Cypher to find certain proteins (HBA and HBB) and their interaction pathways.
Find the common pathways of HBA and HBBBoth proteins should be involved in particular pathways, which should be easy to find by querying. Now we want to retrieve only the pathways which have both proteins in common.
START proteinA=node:proteins(accession = "P69905"),
proteinB=node:proteins(accession = "P68871")
- O2/CO2 exchange in erythrocytes
- Uptake of Carbon Dioxide and Release of Oxygen by Erythrocytes
- Uptake of Oxygen and Release of Carbon Dioxide by Erythrocytes
After the workshop the discussions continued over a broad range of topics.
I want to thank again Lennart Martens, Thilo Muth and Rik Van Bruggen for organizing such a great workshop. And of course Pablo Pareja, Davy Suvee and Anthony Liekens for presenting.
We started a “neo4j-biotech” google group some weeks ago, and would like to invite everyone to join this discussion forum to engage in conversations in the biotech domain with colleagues that have the same background and vocabulary.
Michael Hunger, Neo4j Community Team