I make no excuses: My name is Rik van Bruggen and I am a salesperson. I think it is one of the finest and nicest professions in the world, and I love what I do. I love it specifically, because I get to sell great, awesome, fantastic products really – and I get to work with fantastic people along the way. But the point is I am not a technical person – at all. But, I do have a passion for technology, and feel the urge to understand and taste the products that I sell. And that’s exactly what happened a couple of months ago when I joined Neo Technology, the makers and maintainers of the popular Neo4j open source graph database.

So I decided to get my hands dirty and dive in head first. But also, to have some fun along the way.


The fun part would be coming from something that I thoroughly enjoy: Belgian beer. Some of you may know that Stella Artois, Hoegaerden, Leffe and the likes come from Belgium, but few of you know that this tiny little country in the lowlands around Brussels actually produces several thousand beers. 



You can read about it on the Wikipedia page: Belgian beers are good, and numerous. So how would I go about putting Belgian beers into Neo4j? Interesting challenge.

Part 1: Getting Beer into Neo4j

First, I started with the data source. The Wikipedia page actually has a full listing of all Belgian beers, and it also states the brewery, beer type, and alcohol percentage – perfect! I can see a graph already emerging. But how to get it into Neo, without doing any programming? Well, turns out that it was not that difficult…

Read on, or watch the video:


Next step was to “clean” the Wikipedia data, and structure it. I used a Google spreadsheet for this – I was actually amazed about its power and ease of use. 



Then after some more manipulation and spreadsheet wizardry, I managed to come up with two, very simple files: one for the nodes (the BeerBrands, the AlcoholPercentages, the BeerTypes and the Breweries) and one for the relationships (a Beerbrand “has a” Alcoholpercentage, “isa” specific beertype, and a Brewery “brews” a specific Beerbrand).



So then, I had to get these CSVs files into a graph and into a graph database that is Neo4j. I tried a number of things, but ended up going for a very simple tool called Gephi. The tool has a visualisation component, and an analysis/processing component, but I was most interested in the Data Laboratory. This allowed me to import the CSV files above with two clicks, and create a wonderful visualisation immediately.



(Editors note, you can also use this Neo4j-Batch-Importer to import CSV files directly into the graph (including indexing), ETL-article by Max de Marzi).

So now I had my Gephi project, but how to get it into Neo4j? Well, turns out there is a Gephi Neo4j plugin available that does exactly that. Just install the plugin, export the gephi project, and it will generate the neo4j store files that you can copy over your graph.db directory.

And now: my Neo4j database was up and running. And remember: NO PROGRAMMING INVOLVED. Love that.



To be honest: there was one tiny little hickup at this point. Because to use the graph in a meaningful way, you really need to have indexes. Neo4j ships with Lucene for indexing of nodes, relationships and properties, and there is an auto-indexing capability in the product – but that only kicks in AFTER you start adding data to the database. So the initial import into the database – is not indexed. Crap. Luckily, I have some very bright colleagues at Neo Technology, who have written some nifty utilities that do something about this. And that’s what I did: used the utility, repopulated the autoindex, and of we were. A but hairy, but NO PROGRAMMING :)


Part 2 – Getting Beer out of Neo4j

In the second part, I explore the beer graph visually via the Neo4j Web-interface and some nifty Cypher queries. So, enjoy the video and see the details below:



For example: let’s try to find all Belgian Trappist beers, based on one trappist beer that I know and love, Orval. To do that, we need to do a query, using the Cypher query language. Here’s how this works:

Getting my starting point in the graph, through an index lookup

START orval=node:node_auto_index(name=”Orval”)

Then trying to find a pattern, with a Match clause

MATCH
orval<-[:Brews]-brewery,

I want another beer, with the same beertype as Orval

orval-[:isa]->beertype,

anotherbeer-[:isa]->beertype

I want to return the other beers
RETURN
anotherbeer.name AS name,
COLLECT(beertype.name) AS beertype
ORDER BY anotherbeer.name;


This would give me a very straightforward, result set, very similar to what you would expect in traditional database systems:



Another type of query would be to try and find *paths* between two beers:

   START
       duvel=node:node_auto_index(name=”Duvel”),
       orval=node:node_auto_index(name=”Orval”)
   MATCH p = AllshortestPaths( duvel-[*]-orval )
   return p;

In the example above, I am trying to see what connects two of my favorite beers: Orval and Duvel. I found that there are two beers that share either the AlcoholPercentage or the Beertype – very interesting! That is a great recommendation to receive and will require some tasting to be done!



Last but not least, I also experimented a bit with updating the graph. Using a Cypher statement like this one:

begin
START orval=node:node_auto_index(name=”Orval”)
CREATE (rik{name:”Rik”})-[:loves]->orval;
START duvel=node:node_auto_index(name=”Duvel”), rik=node:node_auto_index(name=”Rik”)
CREATE rik-[:loves]->duvel;
commit


I was able to include a real, ACID compliant transaction on the graph – adding a “Rik” node to the graph and adding two “loves” relationship, one for Duvel and one for Orval. It was really interesting to see how Neo4j does the commit/rollback, and how it isolates these updates from the rest of the users. You can test that really easily by having two “clients” talk to the database, and executing the transaction in one client – but querying it from the other.

All in all this was a fantastic, and very learning experience for me. I feel like Neo4j is a great tool for many database problems – beer-related or other – and that it was really fun to learn how to piece things together and get it to a functioning database – WITHOUT PROGRAMMING.

I hope you found this helpful, if you want to try it yourself, here is the zipped database directory and here the initial CSV files and the Cypher queries.

Enjoy your beer

Rik van Bruggen  

Keywords:  


10 Comments

Simtel says:

Yes, really its nice post, thanks for posting on <a href="http://www.bringerexport.com/&quot; rel="nofollow">energy drinks Import</a>. I just tried this one for the first time the other day and I thought it was pretty good. I love that new cap though fuckin genius man. “<a href="http://www.bringerexport.com/&quot; rel="nofollow">energy drinks</a>” Love the reviews!

Marco Guado says:

Hello, I followed the instructions in the tutorial and I find errors;<br />1. – BeerRelationships.csv file should change the title, start, end by source, target for the Gephi can import<br />2. – When you import the file, in the Edges tab, I have the following:<br />Source<br />1-null<br />2-null<br />…<br />Target<br />100000-null<br />10000-null<br /><br />There is no relationship between the

Anonymous says:

Hello,<br /><br />I agree with Marco Guado. When i completely copied graph.db folder, no problem occured. But when i worked with the .csv files, i run into some problems. first of all, BeerRelationships.csv file didn&#39;t work with &quot;start&quot; and &quot;stop&quot; title. althought i renamed them with &quot;source&quot; and &quot;target&quot;, it nevertheless didn&#39;t work out. also when

Haha, I like the in-line editor&#39;s note! (Sorry, Rik)

Marco Guado says:

Hello<br />The right way to import csv files in Gephi, is changing the header;<br />BeerNodes.csv = id,label,type<br />BeerRelationsShips.csv = source,target,label<br /><br />for export of Neo4j Gephi to make the relationship by label and write the first relation: HasAlcoholPercentaje, then open the Web console and visualize the nodes 7474, Okey.

Marco Guado says:

Hello right way to import csv files in Gephi is changing the heading; BeerNodes.csv = id, label, type, BeerRelationsShips.csv = source, traget, label.<br />for export of Neo4j Gephi to make the relationship by label and write the first relation: HasAlcoholPercentaje, then open the Web console and visualize the nodes 7474, Okey.

Olga Musayev says:

Could you actually explain what the indexing utility is? How can one export frmo Gephi to neo4j while auto_index-ing the labels or ids?

Olga Musayev says:

Could you actually explain what the indexing utility is? How does one get data from gephi to neo4j while auto_index-ing the id?

Alireza RM says:

I exported my dataset from gephi to neo4j format (using the menu in gephi), but the database do not have the same number of nodes and relationship! Any idea what might went wrong?

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Popular Graph Topics

Archives

Have a Graph Question?

Reach out and connect with the Neo4j staff.
Stackoverflow
Contact Us