Graph Databases and Software Metrics & Analysis

This is the first in a series of blog posts that discuss the usage of a graph database like Neo4j to store, compute and visualize a variety of software metrics and other types of software analytics (method call hierarchies, transitive clojure, critical path analysis, volatility & code quality). Follow up posts by different contributors will be linked from this one.

Everyone who works in software development comes across software metrics at some point.
Just because of curiosity about the quality or complexity of the code we’ve written, or a real interest to improve quality and reduce technical debt, there are many reasons.
In general there are many ways of approaching this topic, from just gathering and rendering statistics in diagrams to visualizing the structure of programs and systems.

There are a number of commercial and free tools available that compute software metrics and help expose the current trend in your projects development.
Software metrics can cover different areas. Computing cyclomatic complexity, analysing dependencies or call traces is probably easy, using statical analysis to find smaller or larger issues is more involved and detecting code smells can be an interesting challenge in AST parsing.

Interestingly, many visualizations in and around software development are graph visualizations, from class- and other (UML) diagrams via dependency tracing between and within projects to architectural analysis. One of the reasons of this might be that source code in general can be easily represented as graphs. On the one hand we have trees, especially (abstract) syntax or parse trees (per file, class or structural element) on the other the actual dependencies from project, package, class to method level form a huge directed (cyclic) graph. Also related topics like application (DI) or system orchestration or hard- and software networks are effectively graph structures.

So, having a graph database like Neo4j at hand, what would be more obvious than parsing software systems and project at a certain level and importing the information into the graph database. The graph structure that would accomodate the information quite well would be a direct representation of the concepts in the software projects, consisting of projects, packages, classes, interfaces, types, methods, fields and containing relationships like dependencies, usage, creation, containment, calls, coverage, etc.

Simple Graph Model for Dependency Analysis

Having achieved this, it is completely up to your interests and needs, what you can do with this data. Computing metrics, visualizing and tracing dependencies, finding violations of architectural rules, finding co-usage of classes, detecting interesting patterns or code smells, there are many possibilities.

Just to give one example, a cypher query that calculates the top 10 classes with the longest inheritance paths:

START root=node:types(class="java.lang.Object")
MATCH chain = (root)<-[:EXTENDS*]-(leaf)
RETURN extract(class IN nodes(chain) : AS classes,
length(chain) AS depth

Other tools besides Cypher to help you with this endeavour are:

  • ASM, Antlr or similar parsers for parsing byte- or source code.
  • Neo4j-Shell for exploration
  • Visualisation with GraphViz, D3, VivaGraphJS, Linkurious or others

Another options is to take a time dimension into account to see how structure, elements and relationships change over time.

So it is not suprising, that quite a number of people found this topic interesting enough to invest time and energy to create an intriguing and insightful example of using graph databases in this field. We asked all the participants listed below to write a blog post detailing their idea and make their code/approach accessible. We start to link to existing resources but will update them as soon as the blog posts are online.

  • Raoul-Gabriel Urma: Expressive and Scalable Source Code Queries with Graph Databases (Paper)
  • Rickard Öberg: NeoMVN is tracing maven dependencies (GitHub)
  • Pavlo Baron: Graphlr, a ANTLR storage in Neo4j (GitHub)
  • Dirk Mahler: jQAssistant Enforcing architectural constraints as part of your build process with Neo4j, Cypher and Labels
  • Michael Hunger: Class-Graph, leverages Cypher to collect structural insights about your Java projects (GitHub), (Slideshare), (

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Download My Ebook



Very interesting. We worked in a very similar way to perform change impact analysis on a combination of XML schema&#39;s and WSDL&#39;s at<br /><br />Obviously, also backed by the power of graph.

Byron Mccray says:

Graph is a best method to present the data as community can easily get the knowledge which they are looking for. Every company is doing measurement of the software so that they could estimate the cost, schedule, complexity, requirements of software development in better way

Very interesting article. Many Thanks for the write up. Specially the NeoMVN looks very interesting to me. I will check that out.

[…] was so amazed by projects that others did in this area and published a blog post on “Graph Databases and Software Metrics to show what I’ve found. These […]

1 Trackback

Leave a Reply

Your email address will not be published. Required fields are marked *


Upcoming Event


Have a Graph Question?

Stack Overflow
Community Forums
Contact Us

Share your Graph Story?

Email us: