Learning Goals The basics of graph databases – history, structure, use cases, and more. Introduction to Neo4j and its products. Additional Resources Video: Neo4j Graph Platform Announcement (GraphConnect 2017) Online Training Course: Intro to Graph Databases, Neo4j, and Cypher Free… Read more →
- The basics of graph databases – history, structure, use cases, and more.
- Introduction to Neo4j and its products.
- Basic understanding of a database (relational, NoSQL, graph, or other).
Table of Contents
Very simply, a graph database is designed to focus on the relationships between two or more pieces of data. Relationships are treated as the primary elements in the data model, making traversals of massive amounts of data lightning fast and efficient.
The video below presents some history of database technology and explains the specific business and data needs that graph databases fill.
Why Graph Databases?
We live in a connected world! There are no isolated pieces of information, but rich, connected domains all around us. Only a database that natively embraces relationships is able to store, process, and query connections efficiently. While other databases compute relationships at query time through expensive JOIN operations, a graph database stores connections between data from the start. After all, without relationships to connect nodes, a graph is simply data points. It is the relationships between the data that create our graph and produce value. Accessing nodes and relationships in a native graph database is an efficient, real-time operation and allows you to quickly traverse millions of connections per second per core. Independent of the total size of your dataset, graph databases excel at managing highly-connected data and complex queries. Given only a pattern and one or more starting points, graph databases will explore the neighborhood around the starting points, collecting and aggregating needed information from millions of nodes/relationships and leaving untouched the billions that could reside outside the search perimeter.
The video below highlights the goals and discusses a few use cases of graph databases.
The Property Graph Model
Even within graph databases, there are a few different approaches to the components, rules, and models. Neo4j uses the labeled property graph model. Graphs exist in our everyday world, so the structure of a graph database is easily relatable. For those of you familiar with an object model or an entity relationship diagram (ERD), the concepts of labeled property graph model will be familiar.
The labeled property graph contains two major components, along with a few minor components that help us add detailed data to the main two. Nodes are the entities or objects in the graph. They are key pieces of data and can often be determined by finding the nouns in your dataset (a bike, pets, Jennifer, Neo4j, etc).
Relationships provide directed, named, semantically-relevant connections between two node entities. These are often the verbs or actions in your dataset (e.g. Employee WORKS_FOR Company. WORKS_FOR is the relationship between the employee and the company). A relationship always has a direction, a type, a start node, and an end node. Although initially stored in a specific direction (Employee → Company), relationships can always be queried with equal speed in either direction (Employee → Company or Company → Employee). We refer to this as a bidirectional relationship. As relationships are stored, two nodes can share any number or type of relationships without sacrificing performance.
*_Properties_* can be assigned to both nodes or relationships. While nodes can hold any number of attributes (also, key-value pairs), in most cases, relationships have quantitative properties, such as weights, costs, distances, ratings, time intervals, or strengths.
*_Labels_* can be assigned to nodes to help represent the different roles of those nodes in your domain. Multiple labels can be tagged to a single node (e.g. a node could be labeled as a Person and as an Employee, allowing queries on either employees or people to return that node). In addition to contextualizing node and relationship properties, labels may also serve to attach metadata to certain nodes for indexing or constraints.
The simple graph below shows each of these components in action.
There is one core consistent rule in a graph database: “No broken links”. Since a relationship always has a start and end node, you can’t delete a node without also deleting its associated relationships. You can also always assume that an existing relationship will never point to a non-existing endpoint. This is similar to the parent/child restraint in relational databases where you cannot delete the parent record without first removing the child record.
What is Neo4j?
Neo4j is an open-source, NoSQL, native graph database that provides an ACID-compliant, transactional backend for your applications. With development starting in 2003, it has been publicly available since 2007. The source code, written in Java and Scala, is available on GitHub with a thriving community on Neo4j Slack channel and StackOverflow.
Neo4j is used today by thousands of companies and organizations in almost all industries, including financial services, government, energy, technology, retail, and manufacturing (click here to see customers). Over 1,000 developers and architects across the industries are Neo4j Certified Professionals, and you can be too!
Neo4j is referred to as a native graph database because it implements the property graph model to the storage level and uses native graph processing (index-free adjacency). Neo4j does not just rely on graph processing and in-memory libraries, but also provides enterprise-required database capabilities including ACID transaction compliance, cluster support, and runtime failover, making it suitable to use graph data in production scenarios.
Neo4j’s free and open-source Community edition is a high-performance, fully ACID-transactional database. The Community edition includes (but is not limited to) all the functionality described in this section. Neo4j’s Enterprise edition provides all of the functionality of the Community edition, plus scalable clustering, fail-over, high-availability, live backups, and comprehensive monitoring. Learn more about the Community and Enterprise editions.
Neo4j Desktop is a mission control center for developers, making it easy to create, query, and administer your databases. It’s free with registration and includes a development license for Enterprise Edition, as well as an installer for the APOC and Graph Algorithms libraries. It can also be used to maintain instances of Neo4j server hosted in other locations. Neo4j Desktop is the recommended way to get started with Neo4j on your own machine.
Neo4j for Servers
If you want to download Neo4j for a server, the recommended path varies by operating system. There is an official Debian package, Yum package, Docker image, Windows zip with PowerShell module, and a tar for other Linux/UNIX platforms. There is also an unofficial Homebrew formula.
By default, the Neo4j Server is bundled with an interactive, web-based database interface bound to
Don’t want to install anything on your machine? Check out the Neo4j Sandbox, which includes datasets and guides for a variety of use cases including Recommendations Engines, Network and IT Operations, Twitter network analysis, Panama Papers, and many more!