Graph Databases and Software Metrics & Analysis

This is the first in a series of blog posts that discuss the usage of a graph database like Neo4j to store, compute and visualize a variety of software metrics and other types of software analytics (method call hierarchies, transitive clojure, critical path analysis, volatility & code quality). Follow up posts by different contributors will be linked from this one.

Everyone who works in software development comes across software metrics at some point.
Just because of curiosity about the quality or complexity of the code we’ve written, or a real interest to improve quality and reduce technical debt, there are many reasons.
In general there are many ways of approaching this topic, from just gathering and rendering statistics in diagrams to visualizing the structure of programs and systems.

There are a number of commercial and free tools available that compute software metrics and help expose the current trend in your projects development.
Software metrics can cover different areas. Computing cyclomatic complexity, analysing dependencies or call traces is probably easy, using statical analysis to find smaller or larger issues is more involved and detecting code smells can be an interesting challenge in AST parsing.

Interestingly, many visualizations in and around software development are graph visualizations, from class- and other (UML) diagrams via dependency tracing between and within projects to architectural analysis. One of the reasons of this might be that source code in general can be easily represented as graphs. On the one hand we have trees, especially (abstract) syntax or parse trees (per file, class or structural element) on the other the actual dependencies from project, package, class to method level form a huge directed (cyclic) graph. Also related topics like application (DI) or system orchestration or hard- and software networks are effectively graph structures.

So, having a graph database like Neo4j at hand, what would be more obvious than parsing software systems and project at a certain level and importing the information into the graph database. The graph structure that would accomodate the information quite well would be a direct representation of the concepts in the software projects, consisting of projects, packages, classes, interfaces, types, methods, fields and containing relationships like dependencies, usage, creation, containment, calls, coverage, etc.

Simple Graph Model for Dependency Analysis

Having achieved this, it is completely up to your interests and needs, what you can do with this data. Computing metrics, visualizing and tracing dependencies, finding violations of architectural rules, finding co-usage of classes, detecting interesting patterns or code smells, there are many possibilities.

Just to give one example, a cypher query that calculates the top 10 classes with the longest inheritance paths:

START root=node:types(class="java.lang.Object")
MATCH chain = (root)<-[:EXTENDS*]-(leaf)
RETURN extract(class IN nodes(chain) : class.name) AS classes,
       length(chain) AS depth
ORDER BY depth DESC
LIMIT 10

Other tools besides Cypher to help you with this endeavour are:

ASM, Antlr or similar parsers for parsing byte- or source code.
Neo4j-Shell for exploration
Visualisation with GraphViz, D3, VivaGraphJS, Linkurious or others

Another options is to take a time dimension into account to see how structure, elements and relationships change over time.

So it is not suprising, that quite a number of people found this topic interesting enough to invest time and energy to create an intriguing and insightful example of using graph databases in this field. We asked all the participants listed below to write a blog post detailing their idea and make their code/approach accessible. We start to link to existing resources but will update them as soon as the blog posts are online.

Raoul-Gabriel Urma: Expressive and Scalable Source Code Queries with Graph Databases (Paper)
Rickard Öberg: NeoMVN is tracing maven dependencies (GitHub)
Pavlo Baron: Graphlr, a ANTLR storage in Neo4j (GitHub)
Dirk Mahler: jQAssistant Enforcing architectural constraints as part of your build process with Neo4j, Cypher and Labels
Michael Hunger: Class-Graph, leverages Cypher to collect structural insights about your Java projects (GitHub), (Slideshare), (class-graph.zip)

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Download My Ebook