Containerized Neo4j: Automating Deployments with Docker on Azure

Editor’s Note: Last October at GraphConnect San Francisco, David Makogon – Senior Azure Architect at Microsoft – and Patrick Chanezon – Technical Staffer at Docker – delivered this presentation on how to automate cloud deployments of Neo4j using Docker and Azure. For more videos from GraphConnect SF and to register for GraphConnect Europe, check out Patrick Chenzon: Docker is a tool that allows you to develop applications more quickly and productively. Its mission is “to build tools of mass innovation,” and it allows for the development of creative internet programming beyond silos.
The Docker Mission: Build, Ship, Run

Cloud Services as Movies

There are three big players in the public cloud market that have been adopted by developers: Amazon, Google and Microsoft. Also available are private cloud services such as VMware, which is entering the public market with vCloud Air, and Microsoft, which has a strong hybrid strategy with Azure and Azure Pack that is installed behind a firewall.
Learn How to Use Neo4j Containers to Automate Deployments with Docker on Azure
To understand each of these cloud tools, movie comparisons serve a helpful purpose. VMware can be compared to the movie “300”; it is courageous and relies on great technology, but we all know how the movie ends. Amazon’s public cloud service can be compared to “Pacific Rim,” in which extraterrestrial monsters enter earth through a fold at the bottom of the ocean. In this case, they are “invading” the enterprise market. Google is like the movie “Back to the Future,” because trying to get people to adopt their service — which didn’t use a firewall — mirrored Marty’s experience playing rock and roll to a non-responsive crowd in the 1950s. They are too ahead of their time. Microsoft’s cloud service mirrors the movie “Field of Dreams” — “build it [a public cloud service] and they will come.”

A Changing Cloud Landscape

When Docker arrived two years ago, it changed the whole cloud landscape by providing a portable approach to cloud technology, along with a way to perform DevOps that doesn’t provide lock-in. With the introduction of this new technology, the whole industry reorganized itself around Docker.
The Linux Cloud Container Ecosystem
At the bottom of this new stack, you have the equivalent of hardware, which are cloud providers such as Amazon, Google and Microsoft. On top of that there are operating systems, all of which shrank starting with CoreOS, which built a small distribution of Linux that included only Docker and a few small managing clusters. Red Hat quickly followed suit with Project Atomic; Ubuntu with Ubuntu Core; and VMware with Photon. Rancher has even system services running in a privileged Docker service, allowing them to run System Docker and Userland Docker. In the next layer, there’s Docker, along with a whole ecosystem of plugins like Weaveworks for networking or ClusterHQ for volumes. Next, there are three main tools for orchestration: Docker Swarm, Apache Mesos and Google’s Kubernetes. GS is an interesting entrant; it’s like Heroku that you can install behind the firewall and it’s open source. Cloud Foundry is reinventing itself with Project Lattice as a Docker orchestration engine; IBM has the Bluemix platform, which includes Cloud Foundry and some Docker services; and Tutum is a software-as-a-service platform for orchestrating a container (Tutum was recently acquired by Docker).

Delving Into Docker

Docker is based on isolation using Linux kernel features, namespaces and cgroups. It also includes an image layer system that allows users to cache layers for created images.
Image Layers in Docker, from the Writable Container to the Kernel
When a user develops an application in Docker, they create a Docker file – a simple declarative format that can inherit from an existing image. In this example, a Java application that is being built in Docker can inherit from Java 7, and copied code allows you to run Java C. You can do a Docker build with an image and then run it in the daemon.
Docker for Developers
Applications based on microservices often have a number of different services, such as a Java front-end combined with a Neo4j database, for which you can use Docker Compose. This YAML declarative format allows you to specify the containers to run, go into the directory and input the code “Docker Compose up” to spin the containers and provide an activity log.
Docker Compose: Running Multiple Containers
Docker Machine allows you to provision VMs in any cloud, on any virtualization platform, as long as the Docker daemon is installed. We also use Kitematic, which is a user interface that works on both Mac and Windows that allows you to create and manage containers.
On the ship side, there is Docker Hub, which houses images and is where ops and devs operate together, and the Docker Trusted Registry, which is integrated with both LDAP and enterprise features that can be installed behind the firewall to manage projects.
On the run side, there are a number of plugins for orchestration, including Docker Swarm. This tool allows for a daemon to be put in front of all the Docker engines within a specific cluster; communicate to the same API from a client back to Swarm; and place a workload wherever there is space based on constraints that are passed through environment variables.
A Docker Swarm Deployment
Docker recently performed some tests with Swarm that scaled to 1,000 nodes. The EC2 allowance maxed out, and now we are performing testing with 10,000 nodes to see how far it can go. Tutum lets you bring your own nodes from behind the firewall and allows you to do your own build, ship and run there. Docker recently announced a new tool, Project Orca, which is currently in private beta. It’s a solution that’s run behind a firewall that we are going to sell to enterprises for running and operating their containers. In terms of standards, last summer, we announced the Open Container Initiative (OCI). There are 35 companies that joined this effort to standardize the runtime and bundle format for containers. The reference implementation of the spec is called RunC, which is something that can be used in place of Docker, or if you need complete control over the creation of namespaces. In the Docker 1.9 RC, we now have Docker networks and volumes, which are powerful features for orchestrating containers. This is particularly helpful when running Neo4j Enterprise in a cluster.

How to Dockerize Neo4j

To expose a port, perform a Docker run minus z to launch it in daemon mode. Eventually you can map a directory on your local machine to the data directory in the container — where the data resides — and then launch Neo4j to connect. For an example of how to dockerize Neo4j, please watch the video clip below: For an example of how to use Docker Compose, please watch the video clip below:

Azure Resource Manager

David Makogon: We’re focusing on building all of our infrastructure — which is typically compute, storage and networking — in the cloud. The traditional approach to this is to spin up some VMs and a storage account and build out the network from there. Potentially this has a virtual network, with all or only some ports open to the outside world that then connect to the database and app resources inside the virtual network. This represents a lot of work, scripting and time. To speed up the process, we introduced the Azure Resource Manager. This allows you to create a single template that describes your infrastructure, storage account, network interface and public IP. The Azure Resource Manager also allows you to define virtual machines, chain them together and build dependencies. Whether you have one virtual machine or 1,000, Azure will spin it up into an atomic operation. On our virtual machines in Microsoft Azure, we have virtual machine extensions — including a Docker extension — that are used for monitoring and injecting code into running VMs. You can spin up a VM, activate the Docker extension inside your deployment script, inject a specific Docker file and pull down the Neo4j file, which shows all of your data. In Azure, you tie everything up with a single resource name. In the below there is a VM, an NIC (a network security group that allows you to specify which ports data comes in and out of), a public IP address, a VNET and a storage account. All of them are chained together in a single resource group with dependencies between each.
The Microsoft Azure Resource Manager
There are a number of ways this can be launched. This includes a REST API call, a language wrapper (such as Java or .NET), command line tools (available in Windows and Mac) and through the web. Watch the Resource Manager launch demo video here or read a step-by-step outline below In the following example Azure Resource Manager script, there are parameters that allow users to dynamically choose a particular world region, enter a VM admin username and password, specify the machine type, etc.
The Azure Resource Manager Script for Defining Regions
The Username and Password Script for Azure Resource Manager
Once these parameters are set, you can start defining dependencies. This example virtual network is dependent on a network security group and specific subnets.
Defining Dependencies in the Azure Resource Manager
Once the VMs are up and running, we can install the Docker extension and then hand off a Docker image to launch. In this case, it’s the Neo4j image.
Install the Docker Extension in the Azure Resource Manager
The Neo4j Docker Image Script in the Azure Resource Manager
This can then be deployed to Azure, which will ask for the various parameters specified by the script programmer, which include the parameters mentioned above (VM name, size, etc.).
Defining Parameters in Microsoft Azure
Once everything has been specified, you can create a resource group, which is the name that houses all the different parameters.
Defining a Resource Group in Microsoft Azure
You then kick off the process of spinning up an entire cluster, which will run for a few minutes.
Spinning Up a Cluster in Microsoft Azure
This can also be done using the Azure command line tool. In this case, you can create a new umbrella resource group which — in this example — is the western U.S.
The Command Line Tool in Microsoft Azure
Next, create a deployment by providing a resource group and deployment name. The below case shows passing a template URI to a GitHub repo.
The Deployment Command in Microsoft Azure
Then specify the storage account, location, admin username, password and DNS name. Now you can spin up the deployment via the command line.
Spinning Up a Deployment Username and Password in Microsoft Azure
Below is an example of a deployment that has already been launched from the portal, the Hoverboard Resource Group. In Azure, navigate to a resource group and then enter the name of your deployment.
The Resource Group Navigation in Microsoft Azure
This shows the exact same grouped resources from above (neodockerVM, neodockervmmyVMNic, etc.).
Grouped Resources in Microsoft Azure
It also creates a public IP address URL.
Using Microsoft Azure to Get a Public IP Address URL
When you paste the URL into your browser and specify the port (in this case, 7474), it pulls up Neo4j.
The Neo4j Browser Accessible Via Port 7474 in a Microsoft Azure Public URL
Next, you can SSH into the VM you just created, which allows you to view the Docker logs.
SSH into a Virtual Machine in order to Read the Docker Logs
Now you are SSH’d into a VM and connected into the Docker container viewing the exact image that is also appearing in the browser.
The Neo4j Docker Image in Your Browser
To recap, the cloud manages the infrastructure. The Azure Resource Manager strings together the pieces of the infrastructure, sets up the dependencies with an atomic operation and deploys the entire resource group together. Docker is then installed on top of those virtual machines, and we automatically inject Neo4j as part of the Docker extension. When that has been completed and spun up, we’re left with Neo4j running.

Resources for Additional Learning

To learn more about how to use Neo4j with Docker, please explore the following resources: Inspired by David and Patrick’s talk? Register for GraphConnect Europe on April 26, 2016 at for more industry-leading presentations and workshops on the evolving world of graph database technology.