Before using Neo4j, it took as many as 1,000 lines of code to write the main query for Himmelstein’s graph algorithm used in a bioinformatics application. But with Neo4j’s Cypher graph query language, the query took only 20 lines.
In this week’s 5-Minute Interview (conducted at at GraphConnect San Francisco), we discuss how Neo4j is being used for biological and medical research at UPenn. Himmelstein also narrates where he believes the field of bioinformatics research is headed in 2017.
Tell us about how you use Neo4j at UPenn.
Daniel Himmelstein: I use Neo4j to encode biological and medical knowledge into a network. Neo4j was the best way to encode this type of knowledge – which is produced by millions of studies over the past 50 years – where we are able to represent the rich types of nodes and relationships from real-world biological data.
What made you choose to work with Neo4j?
Daniel: The Neo4j community is the reason I chose it. First, the features are fantastic and were exactly what we needed, mainly because Neo4j dealt with different types of networks extremely well. But the community — with so many things on GitHub where I could report any issues with code and then have it fixed quickly, or ask a question on Stack Overflow, was really great.
The developers have been extremely helpful, and I went to some meetups in San Francisco where I met some of the team. The company provides great support, even though we were never a paying customer as open source users of the product. The community has been great to be a part of.
What are some of the most interesting or surprising results you’ve seen while using Neo4j?
Daniel: Before using Neo4j, I had written a Python package called Hetio, which dealt with a number of different types of networks. It took as many as 1,000 lines of code to do the main query for our algorithm. But when I switched to Neo4j and was able to pour the algorithm into Cypher, the code was only 20 lines. I thought, “Wow. This is a really advanced graph algorithm and Cypher nailed it.”
Cypher had exactly the right constructs to be able to express exactly what we wanted. And it was cool to have people finally think about how to query a graph; previously people hadn’t put much effort into developing a good query language for networks.
If you could start over with Neo4j, taking everything you know now, what would you do differently?
Daniel: If I could go back in time, maybe I would have used Neo4j a little bit earlier. When I first considered Neo4j, I don’t think Cypher was out yet. And because I program primarily in Python, and a little bit in R, there originally wasn’t an intuitive way to interact with Neo4j. But with the new Bolt drivers and the Cypher query language, it has become quite easy to work with Python in Neo4j.
Anything else you want to add or say?
Daniel: I’m really excited. There have been several talks here at GraphConnect San Francisco from people in the bioinformatics field. I know when Emil did the keynote he didn’t include the biology or medicine as one of his six fields, but this will likely be a field in 2017 because it’s really blowing up. We have a lot of data, it has types, and we need to understand those connections, so I expect big growth in the biology field in the next year.
Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at email@example.com
Get the Ebook