By Benjamin Nussbaum, President & CTO of AtomRain | June 15, 2016
Editor’s Note: This presentation was given by Benjamin Nussbaum at GraphConnect Europe in April 2016. Here’s a quick TL;DR of what he covered:
- The importance of cloud security
- The language of security
- NAT Routing
- GraphGrid services
Today we’re going to talk about securely deploying Neo4j into Amazon Web Services (AWS):
To start, we need to ask the question: Why is cloud security important? Over the last several years, there has been an increase in security incidents in which millions of records have been stolen. It has been calculated that each leak costs a company $154 per record, which adds up to a huge loss for businesses.
And as the ones developing and building these solutions, it’s our responsibility to provide a high level of security to our customers. This isn’t just a technical aspect — security begins with personnel. A culture of security is the first place to start.
Each cloud technology provides a set of frameworks, tools and APIs that you can combine with different security components. Certain cloud providers, such as AWS, have a very robust security infrastructure that enable you to work with default security components, which saves time.
Whether you’re using a virtual private cloud (VPC) or private network, you want to have everything in SSL. In Neo4j, run all interactions between your graph and application over SSL, which can be configured on 7473 with HDPS.
AWS and Neo4j Deployment
Now we’re going to explore a few different ways to deploy AWS if you’d like to roll out your own cloud development:
The Language of Security: Part 1
Before figuring out how all the different components work together to secure your environment and Neo4j while being able to access the external world, while simultaneously preventing people from seeing that you’re running Neo4j on 7474 on your server, you need to learn a few acronyms:
- Identity and Access Management (IAM). Provides user- and group-level permissions for authentication and authorization controls to AWS resources. This is where your operations team users and groups are managed for who has access to Neo4j within the organization when authenticated.
- Multi-factor Authentication (MFA). This is an added layer of security that requires a token for access in addition to a username and password. This prevents those who have access to Neo4j information from accessing privileged accounts
- Virtual Private Cloud (VPC). This allows AWS resources to be launched into a private network without being publicly accessible. It also requires a VPN client. This restricts access to authorized personnel with the correct VPN access.
How to Access Secure Information
Once all your information is secure, you need to ensure that the appropriate people can access the secured information. There are a few options:
- openVPN can be used to authenticate a user for VPC access. This has a very low cost of entry — $9.60 per connection per year. This is very affordable, which provides access even for startups.
- Direct Connect establishes a dedicated network connection from your premises — such as an office or data center — to your VPC in AWS. This is a great option for an enterprise with existing infrastructure to migrate data to the cloud because it allows the company to use AWS as an extension of the existing network.
The Language of Security: Part 2
The next set of acronyms relate to security groups, which control inbound and outbound traffic and operate at an instance level with support for “only allows” rules. These include:
- Network Access Control List (ACLs). These control inbound and outbound traffic for one or more subnets, and they are where broad sweeping port decisions are made for public vs. private. These are the broader, sweeping configurations for entire subnets. Something to keep in mind: if you have outbound traffic that requires a response from the server, you need to make sure the response can get back in. If you’re expecting a response from the server, you need to configure your ACL in such a way that you can ensure a response can get back in.
- S3 ACLs. These define the accounts and groups with access and the type of access to a bucket or an object. This provides more granular control and the option to segment groups or individuals.
A Neo4j Example
Below is an inbound security group for Neo4j on the elastic load balancer. You use the two defaults, HTTP and HTTPS:
All the IP addresses are 172.128, which is an internal range of IP addresses in the networking schema and the first 16 for the CIDR block.
7473 is HTTPS and 7474 is HTTP. This provides access from your internal servers to Neo4j, but not from the external world. Because it’s limited to traffic only from the IP range of servers within your network, this prevents any access from external sources.
In this type of infrastructure configuration, a network address translation (NAT) instance controls all inbound and outbound traffic through an Internet gateway. This allows you to control inbound traffic via expected protocols — which generally you’d want 80 in 443, through some API layer, that then proxies to Neo4j and any other application servers you may have behind your VPC. This provides granular control while still providing a way to run your full internal infrastructure and have really good communication, without exposing it to the outside world.
Security at GraphGrid
We’ve set up all the infrastructure so that everything we deploy is inside a VPC, even if it’s across regions and availability zones. There is some fairly complex networking involved with tunneling regions, keeping the resources internal and ensuring that all the pieces are simultaneously isolated but can also communicate.
Consider the following example in which we have three different Neo4j instances in different availability zones that need to communicate with one another:
We have to set up the private DNS and the EBS for the data volume, which can be encrypted if necessary. And then along with the S3 storage, and the elastic load balancer endpoints for master/slave and available. These manage the subnet access with the security groups so that you can have them communicate and route the traffic correctly.
GraphGrid provides the basic security architecture that I just reviewed so that you don’t have to build it from scratch.
Inspired by Benjamin’s talk? Download your copy of this white paper, The Top 5 Use Cases of Graph Databases, and tap into the power of connected data at your enterprise.
About the Author
Benjamin Nussbaum, President & CTO of AtomRain
Benjamin Nussbaum is the President and CTO of AtomRain, the makers of GraphGrid. Benjamin brings to the table nearly 20 years of software architecture and engineering, server infrastructure, database design and technology innovation experience with implementation expertise in enterprise financial, media, medical and automotive software on web, mobile and desktop devices.
From the CEO
Have a Graph Question?
Reach out and connect with the Neo4j staff.Stackoverflow
Share your Graph Story?
Email us: email@example.com