Going Meta: Exploring the Neo4j Graph Database…as a Graph


The graph data model is inherently visual. Try explaining a graph to someone new. You’ll inevitably draw a picture, or wave your hands around to convey what you mean by ‘nodes… links…. and more nodes’. People think in graphs, and they interpret graph intelligence visually. That’s what makes graph visualization such a powerful tool.

Graph visualization is deeply intuitive and harnesses the brain’s unrivalled ability to spot patterns. It’s also flexible enough to apply to virtually any dataset. If there’s an interesting relationship in your data somewhere, you’ll find value in graph visualization.

To prove it, I thought we’d go a bit meta.

In this post, I’ll use KeyLines’ graph visualization power to explore the Neo4j GitHub community. It’ll show how KeyLines makes your graph data more accessible, insightful and valuable.

Our Stack


First we’ll look at the technologies we’ll use for this GitHub exploration tool:
    • KeyLines – for the visualization front-end
    • GraphQL – to fetch data from the GitHub API
    • Neo4j – as a graph datastore ‘cache’ for our GitHub data
    • Angular – to neatly tie the project together
The Neo4j KeyLines graph visualization architecture


KeyLines and Neo4j integrate seamlessly. With a Neo4j backend, we can cache a copy of our data locally for faster response times. It also gives us access to powerful graph query and analysis functionality.

Neo4j and KeyLines play nicely with the new kid on the block too – GraphQL. The Facebook-backed query language is a specification for pulling data efficiently and in a more ‘type aware’ way than REST. Particularly exciting is its capacity to support nested queries, reducing the number of calls our app needs to make.

Loading a GitHub Account


Let’s kick off our visual exploration by searching GitHub for the world’s most popular graph database:

The initial load of Neo4j GitHub repos

Loading Neo4j’s 20 most recently updated repos

With each KeyLines interaction (a search, a click, a double-click, etc) we’re setting off a set of actions:

  1. Send an event to the service provider.
  2. The service auto-generates some Cypher to query the Neo4j database:
    MATCH (User {login:"christian-cam"})-[PullRequest:PULL_REQUEST]->(Repository)
       	RETURN User, PullRequest, Repository
    

  3. If the response is blank, it sends a GraphQL query to the GitHub API:
    query ($login: String!) {
      user(login: $login) {
    	id
    	name
    	avatarUrl
    	login
    	company
    	pullRequests(first: 50, states: [MERGED], orderBy: {field: CREATED_AT, direction: DESC}) {
      	nodes {
        	id
        	title
        	number
        	commits {
          	totalCount
        	}
        	repository {
          	id
          	name
         	 owner {
            	id
            	login
            	avatarUrl
          	}
        	}
      	}
    	}
      }
    },
    {‘login’: ‘christian-cam’}
    

  4. The GitHub API returns some data, which is cached in our Neo4j instance. It’s then loaded into KeyLines and styled according to your customization code.

The Neo4j GitHub org contains 26 repos, hundreds of users, and millions of pull requests and diffs. It’s a vast data set that we couldn’t hope to understand in its raw format. Thankfully, graph visualization will help us distill thousands lines of data into an interactive chart.

graph data model visualization

My graph data model. It looks complex, but KeyLines will help us explore it manageably

Understanding the Structure with Automated Layouts


The beauty of graph visualization is its ability to convey complex graph structures that you can understand right away. Automated layouts are critical. Each of KeyLines’ seven graph layouts will reveal different features of the network.

The structural layout helps to reveal distinct communities. Here’s the Neo4j GitHub community:

meta exploration of Neo4j graph visualization


We can see that each repo has a distinct community of contributors – some large, some small. At the heart of the GitHub community, we have a core of contributors acting as bridges:

Neo4j network bridges

Some of Neo4j’s cross-repo community heroes

I’ve used a simple color code to indicate the type of connections between people and repos:
    • Grey = pull requests
    • Red = issues raised
    • Blue = pull request reviews
Pull requests have been bundled into weighted links to avoid chart clutter. With this view, one account that stands out is lutovich, especially in some of the driver repos.

Double-clicking on lutovich reveals their most recent commits:

Explore the Neo4j Graph Database community on GitHub using the power of Neo4j graph visualization


This ‘expand and layout’ approach is a powerful way to explore large graphs. It puts the user in the driving seat, so they can explore details at their own pace.

Filtering the Graph by Time


A prominent feature of my GitHub app is the KeyLines time bar – a neat component for exploring temporal networks.

On first load, the spike of pull requests around September stands out. There’s no surprise that this was around the time of Neo4j 3.2.5 and 3.3 beta.

KeyLines time bar visualization

Pull requests (grey), issues (red) and reviews (blue) all see a spike in September 2017

Let’s zoom in to that time period on the chart:

Neo4j GitHub activity in September

A GitHub pull request spike in September

We can see a great deal of development effort going into the core product, plus the the Neo4j Browser, docs, OGM and a couple of drivers – insight that we couldn’t see in our initial graph view.

Social Network Analysis


So we’ve zoomed into our graph’s details – now let’s try exploring outwards. Here I’ve added a few Neo4j partners with GitHub accounts:

The Neo4j partner community graph


Our starting point is familiar. KeyLines’ standard layout spaces nodes around the chart, revealing three distinct clusters with some collaboration between. But the chart is still fairly cluttered.

KeyLines’ new and improved combos functionality is a powerful way to declutter a graph, highlighting the most important nodes and links. Here’s what happens when we combine our repos and run the standard layout:

Combos and layout in KeyLines graph visualization


In two clicks we’ve transformed a cluttered chart into a clear graph visualization. We can instantly see Neo4j’s GitHub community contributors, and the bridges between the different projects. This approach can be applied to any kind of graph dataset, revealing trends and patterns that would otherwise be hidden.

Neo4j partner community overview


Try It Yourself


Inspired to try some graph visualization? We’re happy to help! You can see the power of KeyLines new combos functionality and find working examples of KeyLines with Neo4j on our SDK site, or follow the tutorials on our blog.


Cambridge Intelligence was a Silver sponsor of GraphConnect New York.


Missed out on GraphConnect New York?
Check out videos from all of the sessions now posted on GraphConnect.com with more being posted every day!


Catch Up on GraphConnect