Graphs provide significant advantages over relational databases when working with interconnected data. With simple data visualizations, Neo4j provides the tools that uncover previously hidden – but incredibly important – relationships, which is why more people than ever are making the switch.
In this week’s 5-minute interview (conducted at GraphConnect San Francisco), Aggarwal discusses the central role of graph databases in her research on the impact assessment of schema evolution in data warehouses.
Talk to us about how you use Neo4j in your research.
Dippy Aggarwal: I’m a Ph.D. candidate in computer science, and my dissertation is largely about the use of graph databases — and specifically Neo4j — to study impact assessment of schema evolution in a data warehouse context.
The first question we asked when we got started was whether to use a graph or a relational database, and we ultimately chose graph because our data warehouse work centers around an interconnected domain. You have queries, you have ETL, you have schemas – and all these components are tightly coupled. We needed a database that could capture relationships, and there is no debate that Neo4j really excels at that.
What made you choose Neo4j?
Aggarwal: Today there are a lot of graph databases on the market, but Neo4j is very mature. It has a strong and active developer community, and there are so many new features related to security, clustering, scaling and enterprise.
Can you talk to me about some of your favorite Neo4j features?
Aggarwal: My favorite feature of Neo4j is its graph visualization; it’s not just a database that allows you to write queries. Visualization is very important when you are talking about paths and relationships, because without that framework, the entire purpose is defeated.
The other important feature is the ability to use drivers that can be programmed using Java. It has this REST API, which I think is really cool and am using heavily in my Ph.D.
What other technologies do you use alongside Neo4j in your research?
Aggarwal: The main component is Neo4j, but the other pieces are more on the relational side. We have all of our input artifacts as relational schemas and then we use Pentaho — a graphical, XML-based business intelligence tool — that allows us to model ETL queries and workflows. And if we have the ETL workflows and then queries which are in SQL against all those relational schemas, how do we find an economical common model for these artifacts? Neo4j provides a really nice general representation in terms of nodes, edges and relationships that allows us to really flatten all of our heterogeneous artifacts.
Anything else you’d like to add or say?
Aggarwal: I gave a talk here at GraphConnect San Francisco about dockerizing Neo4j and using container orchestration; it was work I performed at Cincinnati Children’s Hospital during my internship. I’m really happy to see the way Neo4j has embraced Docker, which is getting really popular with its containerization capabilities. And it’s clear that Neo4j officially supports this evolution by incorporating Docker images.
Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at firstname.lastname@example.org
Get the Ebook