Fun with Beer – and Graphs - Graph Database & Analytics

I make no excuses: My name is Rik van Bruggen, and I am a salesperson.

I think it is one of the finest and nicest professions in the world, and I love what I do. I love it specifically, because I get to sell great, awesome, fantastic products really – and I get to work with fantastic people along the way. But the point is I am not a technical person – at all.

However, I do have a passion for technology, and I feel the urge to understand and taste the products that I sell. And that’s exactly what happened when I joined Neo Technology, the makers and maintainers of the popular Neo4j open source graph database.

So I decided to get my hands dirty and dive in head first and also have some fun along the way.

The fun part comes from something that I thoroughly enjoy: Belgian beer. Some of you may know that Stella Artois, Hoegaerden, Leffe and the likes come from Belgium, but few of you know that this tiny little country in the lowlands around Brussels actually produces several thousand beers.

You can read about it on the Wikipedia page: Belgian beers are good and numerous. So how would I go about putting Belgian beers into Neo4j? Interesting challenge.

Part 1: Getting Beer into Neo4j

First, I started with the data source.

The Wikipedia page actually has a full listing of all Belgian beers, and it also includes the brewery, beer type and alcohol percentage – perfect! I can already see a graph emerging.

But how to get it into Neo4j, without doing any programming?

Well, turns out that it was not that difficult…

Read on, or watch this video:

The next step was to “clean” the Wikipedia data and structure it. I used a Google spreadsheet for this – I was actually amazed at its power and ease of use.

cl97wGMq-D-nWyJvXkhAqEcXaZuOChFwsv-8lEiT_bI4B_rP6yPRBE5Ua-TzEHzy-5m8u6gtjW_HqgR-spSHpPCS51BR0IoAWY5IcpfAzn0IOjzFVWRx

Then after some more manipulation and spreadsheet wizardry, I managed to come up with two, very simple files: one for the nodes (the BeerBrands, the AlcoholPercentages, the BeerTypes and the Breweries) and one for the relationships (a Beerbrand “has a” Alcoholpercentage, “is a” specific beertype and a Brewery “brews” a specific Beerbrand).

gTioEGpcogJAr1WN3awjMx-Xj6VziBOuTtM36BcrARshIwgPAhuYNegPuWZVdyd9jxEJl3m6_uLnV6efa6TbB0o-IBbcx6KFnOCKFQxzQSKshGLSTIMn

So then, I had to get these CSV files into a graph and into a graph database (that is, Neo4j). I tried a number of things but ended up going for a very simple tool called Gephi.

The tool has a visualisation component and an analysis/processing component, but I was most interested in the Data Laboratory. This allowed me to import the CSV files above with two clicks and to create a wonderful visualisation immediately.

ziNmpw2YUdi1lAf4007H_bsv7Gzls-DncQ3JpCjf-wkH4KEsctSu4Rajo1eciPePePVILRCL6Gb696Sk-iAinOHW2pUNx5ijY4i1C-m68Hq5kcwFVLtx

(Editors note: You can also use the Neo4j-Batch-Importer to import CSV files directly into the graph [including indexing], ETL-article by Max de Marzi).

So now I had my Gephi project, but how do I get it into Neo4j?

Well, it turns out there is a Gephi Neo4j plugin available that does exactly that. Just install the plugin, export the Gephi project, and it will generate the Neo4j store files that you can copy over to your graph.db directory.

And now my Neo4j database is up and running. And remember: NO PROGRAMMING INVOLVED. Love that.

To be honest, there was one tiny little hiccup at this point. Because to use the graph in a meaningful way, you really need to have indexes.

Neo4j ships with Lucene for indexing of nodes, relationships and properties, and there is an auto-indexing capability in the product – but that only kicks in AFTER you start adding data to the database.

So the initial import into the database is not indexed. Crap.

Luckily, I have some very bright colleagues at Neo Technology, who have written some nifty utilities that do something about this. And that’s what I did: used the utility, repopulated the autoindex, and there we were. A bit hairy, but NO PROGRAMMING 🙂

Part 2 – Getting Beer out of Neo4j

In the second part, I explore the beer graph visually via the Neo4j Web-interface and some nifty Cypher queries. So, enjoy the video and see the details below:

For example: let’s try to find all Belgian Trappist beers, based on one Trappist beer that I know and love, Orval.

To do that, we need to create a query, using the Cypher query language. Here’s how this works:

First, I get my starting point in the graph, through an index lookup:

START orval=node:node_auto_index(name="Orval")

Then, I try to find a pattern, with a Match clause:

MATCH
orval<-[:Brews]-brewery,

Next, I want another beer, with the same beertype as Orval:

orval-[:isa]->beertype,
anotherbeer-[:isa]->beertype

Now, I want to return the other beers:

RETURN
anotherbeer.name AS name,
COLLECT(beertype.name) AS beertype
ORDER BY anotherbeer.name;

This would give me a very straightforward result set, very similar to what you would expect in traditional database systems:

VlD3OuFlPL4wbId8P9AmIVFa2gerh09Gc-ogI70beSxt37AWvwFQYXVKGlWNkX0Y25nxdC8Q5tgGIRaK5t2nU4is7gdbgLv2ropCrwDiQp3sKD7WZTaG

Another type of query would be to try and find *paths* between two beers:

START
     duvel=node:node_auto_index(name="Duvel"),
     orval=node:node_auto_index(name="Orval")
MATCH p = AllshortestPaths( duvel-[*]-orval )
return p;

In the example above, I tried to see what connects two of my favorite beers: Orval and Duvel.

I found that there are two beers that share either the AlcoholPercentage or the Beertype – very interesting! That is a great recommendation to receive and will require some tasting to be done!

1JaHeMpyuTkFNYFDYwU6WuQg0WJZ4TCKZRzG1hBpTw7bvJK1BC4Jz4EJF-QYKSYcAE06k120Fdiwp134VHeuQyjrPKkAHtblSXC5QZiDZ6wVbB1-ti2y

Last but not least, I also experimented a bit with updating the graph, using a Cypher statement like this one:

begin

START orval=node:node_auto_index(name="Orval")
CREATE (rik{name:"Rik"})-[:loves]->orval;
START duvel=node:node_auto_index(name="Duvel"), 
      rik=node:node_auto_index(name="Rik")
CREATE rik-[:loves]->duvel;

commit

I was able to include a real, ACID-compliant transaction on the graph – adding a “Rik” node to the graph and adding two “loves” relationship, one for Duvel and one for Orval.

It was really interesting to see how Neo4j does the commit/rollback, and how it isolates these updates from the rest of the users. You can test that really easily by having two “clients” talk to the database, and executing the transaction in one client – but querying it from the other.

All in all this was a fantastic and a very educational experience for me. Neo4j is a great tool for many database problems – beer-related or other – and it was really fun to learn how to piece things together and get it to a functioning database, all WITHOUT PROGRAMMING.

I hope you found this helpful, if you want to try it yourself, here is the zipped database directory and here are the initial CSV files and the Cypher queries.

Enjoy your beer,

Rik van Bruggen

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Get My Free Copy