Peter Neubauer | Magnus Mårtensson |
Announcing Neo4j on Windows Azure
Neo4j has a ‘j’ appended to the name. And now it is available on Windows Azure? This proves that in the most unlikely of circumstances sometimes beautiful things can emerge. Microsoft has promised Java to be a valued “first class citizen” on Windows Azure. In this blog post we will show that it is no problem at all to host a sophisticated and complex server product such as the Neo4j graph database server on Windows Azure. Since Neo4j has a REST API over HTTP you can speak to this server from your regular .NET (or Java) applications, inside or outside of the cloud just as easily as you speak to Windows Azure Storage.
Intro
This first version (1.0 “JFokus“) of our deployment is a bit simplified in some areas. Still it is a complete and fully functioning deploy of Neo4j to Windows Azure. We are already working on the next major release (2.0) which will be much more turn-key; just upload the application to Windows Azure and launch.
Furthermore we have serious plans to use this approach, Neo4j in Windows Azure, on a live project where we are backing a server application with complex graph calculations. We will layer spatial and social graphs in combined searches on the server side and serve condensed search results to the client applications outside of the Cloud.
This project is not a toy it’s the real deal and it runs very smoothly – Java runs with little or no hassle on Windows Azure!
If you are a .NET developer reading this post
What we have enabled for You, dear .NET developer, is to leverage a really powerful graph database and make it available in Your Windows Azure applications!
You can think of Neo4j as a high-performance schema-free graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.
The data model consists of Nodes, typed Relationships between Nodes and Key-Value pairs on both Nodes and Relationships, called Properties. This is how the Matrix characters and their relationships could look in a Neo4j data model:
How to communicate with it? It is very straight forward: Neo4j communicates using a REST based API over HTTP. This means that you can communicate with it just as easily as you can with standard Windows Azure Storage.
What we have done
The fact of the matter is that Neo4j has been running on Windows for a long time. What we have done in this project is to host it on Windows Azure. We have taken into account such things as dynamic port allocation and the subsequent version will also automatically handle storage backups. The following steps are involved in the deploy of version 1.0:
- Upload a Java Runtime Environment (JRE) to Windows Azure Blob Storage.
- Upload Neo4j to Windows Azure Blob Storage.
- Upload the deployment of the Neo4j Windows Azure hosting project to Windows Azure – which will launch the install automatically.
The install will:
- Download from Windows Azure Blob Storage to our Windows Azure server instance, and deploy, both the JRE and the Neo4j Server.
- Configure diagnostics on the Windows Azure server instance to also include the Neo4j logs in the diagnostics collections.
- Modify the configuration of Neo4j to listen to a run time assigned port, to point to the database storage location and to know the location of the JRE etc.
That completes the install. Next Windows Azure will launch Neo4j – and we receive MAGIC!
Brief comments
This version has a few manual deployment steps to many which we will mitigate in the subsequent versions of this project.
Diagnostics in Windows Azure could not be simpler; Neo4j logs it’s activity, as most servers do, to a configurable directory. Windows Azure is enabled to include custom directories in the standard diagnostics collections which is easily configurable on the machine at startup. This means you can reach the Neo4j diagnostics output for debugging and monitoring.
We will also store the data files of the graph database in a blob in Windows Azure Storage. This will make the database automatically triple-redundantly backed up with automatic fail over. This is built into Windows Azure with no extra effort on our part.
Let’s go into a bit more technical detail below. If this is not your cup of tea; scroll to the end for the summary!
How we have done it
Solution
There is much less code in this solution than you perhaps think? All we need is a hosting project which will host Neo4j in Windows Azure. It also takes care of downloading, installing and configuring Neo4j.
Apart from the tests in our solution we have (in alphabetical order from the screen shot):
- CollectDiagnosticsData: A small project to trigger diagnostics transfer from our Cloud instance to Cloud storage. This is only used for debug purposes and is not a part of the deployed solution. The trigger is fired from a console window on your local machine when and if you want to view the logs of the application.
- Diversify.WindowsAzure.ServiceRuntime: A general library that enhances testability in the Windows Azure SDK.
- Neo4j.Azure.Server: The Windows Azure deployment definition project. This is the thing that is packed up and deployed to Windows Azure. It acts as a bag with configuration for the projects that make up the application.
- Neo4jServerHost: A Windows Azure Worker Role project that hosts Neo4j.
Configuration
Having the application configuration settings separate from your code in Windows Azure is key. The way we have coded our solution is to extract all external links and configuration settings from the code and put it in the Service Definition file* of our Windows Azure Solution. When we have done that we can specify the associated configuration values in the Service Configuration file*.
This gives us the ability to, for instance, upgrade the version of Neo4j simply by replacing the zip-file in blob storage by modifying a few configuration values. No code change required.
As a general rule of thumb you want to make your Windows Azure deployments as configurable as possible to enable easy in place upgrading of your service in the future.
Installation
This is the bit that is more complex in version 1.0 than we’d like. ;~)
The installation of Neo4j involves manually uploading the artifacts of Neo4j and the JRE to Windows Azure Blob Storage before deploy. Sure it’s a fairly normal approach for this type of deployment but it can be made more accessible for a demo application such as this. Again this project is a complete and fully functioning version of Neo4j in Windows Azure but there exists no application that cannot be improved. We want the next version (2.0) to be tun-key in the sense that you should be able to download Neo4j and launch only for full function!
Please note that you can also use another approach for installation in Windows Azure which is to use a so called startup task.
Running the server
When the solution is installed we are ready to run launch Neo4j. A batch file is executed in order to launch through a standard Process.Start() operation.
There should perhaps be more to say here at launch but there really isn’t. It is this simple.
The hosting application kicks of the Neo4j server instance in Windows Azure. All of the configuration of the server is done in the installation steps prior to starting the server.
The Web administration
When the server is running, head over to https://localhost:7474/ to see the web administration:
It gives you access to the main performance measures, a data browser, a scripting console using the Gremlin graph scripting language to test out ideas, and monitoring details regarding the server.
The port on which an application is run on your local Development Emulator is dynamically set. 7474 is the default Neo4j port in the configuration files for the server. The Windows Azure hosting project will dynamically read the allocated port and set it in the config before it launches our server. In my case (Magnus) on my local dev machine the dynamic port was 5100. So for me the link https://localhost:5100/ was correct. Try that or read from the console output when you are running the demo which port your instance launches on. Fortunately the dynamic port selected by the Compute Emulator on the local machine seems to be the same over time.
How do I connect – The Neo4j REST API
The REST API to the Neo4j server is built to be self – explaining and easy to consume, normally mounted at https://localhost:7474/db/data. You can find the docs here. A basic request to the data root URI of your new Neo4j server using CURL looks like
curl -H Accept:application/json https://localhost:7474/db/data/ and gives the response
{
"node" : "https://localhost:7474/db/data/node",
"node_index" : "https://localhost:7474/db/data/index/node",
"relationship_index" : "https://localhost:7474/db/data/index/relationship",
"reference_node" : "https://localhost:7474/db/data/node/0",
"extensions_info" : "https://localhost:7474/db/data/ext",
"extensions" : {
}
}
This describes the whole database and gives you further URLs to discover indexes, the reference data node, extensions and other good information. A REST representation of the first node (without any properties) looks like:
curl https://localhost:7474/db/data/node/0
{
"outgoing_relationships" : "https://localhost:7474/db/data/node/0/relationships/out",
"data" : {
},
"traverse" : "https://localhost:7474/db/data/node/0/traverse/{returnType}",
"all_typed_relationships" : "https://localhost:7474/db/data/node/0/relationships/all/{-list|&|types}",
"property" : "https://localhost:7474/db/data/node/0/properties/{key}",
"self" : "https://localhost:7474/db/data/node/0",
"properties" : "https://localhost:7474/db/data/node/0/properties",
"outgoing_typed_relationships" : "https://localhost:7474/db/data/node/0/relationships/out/{-list|&|types}",
"incoming_relationships" : "https://localhost:7474/db/data/node/0/relationships/in",
"extensions" : {
},
"create_relationship" : "https://localhost:7474/db/data/node/0/relationships",
"all_relationships" : "https://localhost:7474/db/data/node/0/relationships/all",
"incoming_typed_relationships" : "https://localhost:7474/db/data/node/0/relationships/in/{-list|&|types}"
In order to get started, please go over to The main Neo4j Wiki page . For the server, there is a good getting started guide or look at some of the projects using Neo4j:
What can I do with it ?
Building applications with the Neo4j Server is really easy. Either you can just use the raw REST API to insert and update your data, or use one of the bindings to Ruby, .NET, PHP and other languages to start interacting with Neo4j.Neo4j really shines when it comes to deep traversals of your data and analysis of different aspects of your domain. The flexibility of a graph really helps in a lot of scenarios, not only social networking as in the following example.
As a small example – this is what you do to build a sample LinkedIn – like social network and execute a Shortest Path query against it and make a recommendation engine based on that (taken from Max de Marzi’s Neography Ruby bindings for the Neo4j Server). Install them with
gem install neography
A small Ruby example (let’s say in a file called linkedin.rb):
require 'rubygems'
require 'neography'
@neo = Neography::Rest.new
def create_person(name)
@neo.create_node("name" => name)
end
def make_mutual_friends(node1, node2)
@neo.create_relationship("friends", node1, node2)
@neo.create_relationship("friends", node2, node1)
end
def suggestions_for(node)
@neo.traverse(node,"nodes", {"order" => "breadth first",
"uniqueness" => "node global",
"relationships" => {"type"=> "friends", "direction" => "in"},
"return filter" => {
"language" => "javascript",
"body" => "position.length() == 2;"},
"depth" => 2})
end
johnathan = create_person('Johnathan')
mark = create_person('Mark')
phill = create_person('Phill')
mary = create_person('Mary')
luke = create_person('Luke')
make_mutual_friends(johnathan, mark)
make_mutual_friends(mark, mary)
make_mutual_friends(mark, phill)
make_mutual_friends(phill, mary)
make_mutual_friends(phill, luke)
puts "Johnathan should become friends with #{suggestions_for(johnathan).map{|n| n["data"]["name"]}.join(', ')}"
After executing this code with Ruby:
ruby linkedin.rb
You should get the resulting recommendation
Johnathan should become friends with Mary, Phill
You can of course see the increase of data in the Web dashboard at https://localhost:7474, too.
There are a number of other cool examples, for instance an IMDB simulation with recommendations against a Neo4j server instance. Enjoy!
.NET Client library
If you want to talk to a Neo4j instance from your .NET code you will of course need a client library that knows how to communicate with the REST API. There is a blog post here Neo4j .NET Client over HTTP using REST and json that discusses this concept and what would be required to create such a client library. Also there exists a library which is certainly a very good place to start if you want to communicate this way: Neo4RestNetNote: It would be nice to teach Neo4j to use another form of communication more easily consumed by .NET code where perhaps the library pieces are more evolved. We are current looking into this and will keep you posted.
I want to play with it. Where can I get it?
Glad you like it and happy that you want to give it a spin!
If you want to look at our Windows Azure solution you only need to
- Download the Visual Studio 2010 Neo4j Windows Azure hosting project.
If you are aiming to test run our solution either locally on your machine or in the cloud you need a few more pieces of the puzzle. (Again this is version 1.0 and it involves a few more manual steps than we’d like.)
- Download Neo4j.
- Download a Java Runtime Environment.
- Upload Neo4j and JRE to Windows Azure Blob Storage (Or just use your local Development Storage Emulator) to test this on your local machine.
- Launch the hosting project in Visual Studio.
- Configure the solution with your own Windows Azure Storage credentials.
- Deploy Neo4j to your Windows Azure account or hit F5 to run it in your local Development Fabric Emulator).
Summary
During the coding and testing of this project a few experiences are inescapable:
- Java runs very well on Windows Azure. In fact if you are able to run your Java application on a regular Windows Server it will run on a Windows Azure instance. with a little tweaking and fiddling to make this happen, of course.
- Fiddling with folders and paths in your Windows Azure applications to let everything find where everything else is takes some getting used to. Extracting configuration settings is an absolute must! You have to handle this well in order to do run-time configuration changes down the road.
- It is advised to pack the JRE along side the Java application you are deploying to reduce the number of steps required to install the server application on start up.
In version 2.0 of this project we hope to make the Visual Studio Solution very much more turn-key. All you should need to do to test drive this application is to download the solution and launch it. Instantly you should have a running Neo4j server! We intend to do this by downloading the JRE and Neo4j server direct from https://neo4j.org. We will also look into securing the database files and also add multiple instances of servers collaborating together. This last bit, in Cloud-lingo, is called to “scale out”.
Another thing on our list is to make this Java server bark in a different tongue. ;~) But more about this is to come down the line.
If you do look at this project and have comments or feedback feel free to contact us @noopman and @peterneubauer. Hope you will enjoy this new and shiny toy as much as we do!
Cheers,
Magnus Mårtensson – Business Responsible Cloud @ Diversify
Peter Neubauer – VP Product Management @ Neo Technology
Magnus: As a .NET Architect and Cloud specialist I am continuously searching for new tools for my toolbox. There are enormous amounts of great tools out there – and Neo4j is one that outshines the bulk of them. Having the power of a graph database at your fingertips is a fantastic power to harness. With this easy deploy to Windows Azure graph data is no longer a stranger in the .NET field.
Peter: The Neo4j community has seen a lot of interest from the .NET developer community lately. Working with Azure as a Platform-as-a-Service hosting environment for Neo4j gives finally .NET developers the possibility to use all the great features and performance gains of Neo4j on a Microsoft-supported infrastructure. The prospect of a solid NoSQL – offering in the space of graph databases is very exciting for the project.
It has been a pleasure to work in collaboration between Diversify and Neo4j and with Microsoft on this project and we are very thankful for this opportunity to have fun with a great and unexpected technology combination. Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook