Goals This article is an introduction to graph databases in general and Neo4j in particular. Prerequisites You should have a basic understanding of what a database is. Beginner Table of Contents Overview Why Graph Databases? The Property Graph Model What… Read more →
Very simply, a graph database is a database designed to treat the relationships between data as a first-class citizen in the data model.
Why Graph Databases?
We live in a connected world! There are no isolated pieces of information, but rich, connected domains all around us. Only a database that natively embraces relationships is able to store, process, and query connections efficiently. While other databases compute relationships at query time through expensive JOIN operations, a graph database stores connections as first class citizens.
Accessing nodes and relationships in a native graph database is an efficient, constant-time operation and allows you to quickly traverse millions of connections per second per core.
Independent of the total size of your dataset, graph databases excel at managing highly connected data and complex queries. Armed only with a pattern and a set of starting points, graph databases explore the larger neighborhood around the initial starting points — collecting and aggregating information from millions of nodes and relationships — leaving the billions outside the search perimeter untouched.
The Property Graph Model
If you’ve ever worked with an object model or an entity relationship diagram, the labeled property graph model will seem familiar.
Nodes are the entities in the graph. They can hold any number of attributes (key-value-pairs) called properties. Nodes can be tagged with labels representing their different roles in your domain. In addition to contextualizing node and relationship properties, labels may also serve to attach metadata—index or constraint information—to certain nodes.
Relationships provide directed, named, semantically relevant connections between two node-entities (eg Employee WORKS_FOR Company). A relationship always has a direction, a type, a start node, and an end node. Like nodes, relationships can also have properties. In most cases, relationships have quantitative properties, such as weights, costs, distances, ratings, time intervals, or strengths. As relationships are stored efficiently, two nodes can share any number or type of relationships without sacrificing performance. Note that although they are directed, relationships can always be navigated efficiently in either direction.
The building blocks of the Property Graph
There is one core consistent rule in a graph database: “No broken links”. Since a relationship always has a start and end node, you can’t delete a node without also deleting its associated relationships. You can also always assume that an existing relationship will never point to a non-existing endpoint.
What is Neo4j?
Neo4j is an open-source NoSQL native graph database which provides an ACID-compliant transactional backend for your applications. With development starting in 2003, it has been publicly available since 2007. The source code, written in Java and Scala, is available on GitHub, with a thriving community on the Neo4j Slack and StackOverflow.
Neo4j is used today by thousands of companies and organizations in almost all industries, including financial services, government, energy, technology, retail and manufacturing. Hundreds of developers and architects in those industries are Neo4j Certified Professionals.
Neo4j is referred to as a native graph database because it implements the Property Graph Model efficiently down to the storage level. As opposed to graph processing or in-memory libraries, Neo4j provides full database characteristics including ACID transaction compliance, cluster support, and runtime failover, making it suitable to use graph data in production scenarios.
Some particular features make Neo4j very popular among developers, architects and DBAs:
- Cypher, a declarative query language similar to SQL, but optimized for graphs. Now used by other databases like SAP HANA Graph and Redis graph via the openCypher project.
- Constant time traversals in big graphs both in depth and in breadth due to efficient representation of nodes and relationships. Enables scale-up to billions of nodes on moderate hardware.
- Flexible property graph schema that can adapt over time, making it possible to materialize and use new relationships later on to “shortcut” and speed up the domain data when the business needs change.
Neo4j’s free and open-source Community edition is a high-performance, fully ACID-transactional database. The Community edition includes (but is not limited to) all the functionality described previously in this section. Neo4j’s Enterprise editions provide all of the functionality of the Community edition in addition to scalable clustering, fail-over, high-availability, live backups, and comprehensive monitoring. Learn more about the Community and Enterprise editions.
Neo4j Desktop is a mission control center for developers – making it easy to create, query and administer your databases. It’s free with registration and includes a development license for Enterprise Edition as well as an installer for the APOC and Graph Algorithms libraries. This is the recommended way to get started with Neo4j on your own machine.
If you want to download Neo4j for a server, the recommended path varies by operating system. There is an official Debian package, Yum package, Docker image, Windows zip with PowerShell module and a tar for other Linux/UNIX platforms. There is also an unofficial Homebrew formula.
By default, the Neo4j Server is bundled with an interactive, web-based database interface bound to
Don’t want to install anything on your machine? Check out the Neo4j Sandbox, which includes datasets and guides for a variety of use cases including Recommendations Engines, Network and IT Operations, Twitter network analysis and Panama Papers.