Goals This article is an introduction to graph databases in general and Neo4j in particular. Prerequisites You should have a basic understanding of what a database is. Beginner Overview The Property Graph Model What is Neo4j? Neo4j Editions Neo4j Server… Learn More →
This article is an introduction to graph databases in general and Neo4j in particular.
You should have a basic understanding of what a database is.
We live in a connected world. There are no isolated pieces of information, but rich, connected domains all around us. Only a database that embraces relationships as a core aspect of its data model is able to store, process, and query connections efficiently. While other databases compute relationships expensively at query time, a graph database stores connections as first class citizens, readily available for any “join-like” navigation operation. Accessing those already persistent connections is an efficient, constant-time operation and allows you to quickly traverse millions of connections per second per core.
Independent of the total size of your dataset, graph databases excel at managing highly connected data and complex queries. Armed only with a pattern and a set of starting points, graph databases explore the larger neighborhood around the initial starting points — collecting and aggregating information from millions of nodes and relationships — leaving the billions outside the search perimeter untouched.
The Property Graph Model
If you’ve ever worked with an object model or an entity relationship diagram, the labeled property graph model will seem familiar. The property graph contains connected entities (the nodes) which can hold any number of attributes (key-value-pairs). Nodes can be tagged with labels representing their different roles in your domain. In addition to contextualizing node and relationship properties, labels may also serve to attach metadata—index or constraint information—to certain nodes.
Relationships provide directed, named semantically relevant connections between two node-entities. A relationship always has a direction, a type, a start node, and an end node. Like nodes, relationships can have any properties. In most cases, relationships have quantitative properties, such as weights, costs, distances, ratings, time intervals, or strengths. As relationships are stored efficiently, two nodes can share any number or type of relationships without sacrificing performance. Note that although they are directed, relationships can always be navigated regardless of direction.
The building blocks of the Property Graph
There is one core consistent rule in a graph database: “No broken links”. Since a relationship always has a start and end node, you can’t delete a node without also deleting its associated relationships. You can also always assume that an existing relationship will never point to a non-existing endpoint.
What is Neo4j?
Sponsored by Neo Technology, Neo4j is an open-source NoSQL graph database implemented in Java and Scala. With development starting in 2003, it has been publicly available since 2007. The source code and issue tracking are available on GitHub, with support readily available on Stack Overflow and the Neo4j Google group. Neo4j is used today by hundreds of thousands of companies and organizations in almost all industries. Use cases include matchmaking, network management, software analytics, scientific research, routing, organizational and project management, recommendations, social networks, and more.
Neo4j implements the Property Graph Model efficiently down to the storage level. As opposed to graph processing or in-memory libraries, Neo4j provides full database characteristics including ACID transaction compliance, cluster support, and runtime failover, making it suitable to use graph data in production scenarios.
Some particular features make Neo4j very popular among users, developers, and DBAs:
- Materializing of relationships at creation time, resulting in no penalties for complex runtime queries
- Constant time traversals for relationships in the graph both in depth and in breadth due to efficient representation of nodes and relationships
- All relationships in Neo4j are equally important and fast, making it possible to materialize and use new relationships later on to “shortcut” and speed up the domain data when new needs arise
- Compact storage and memory caching for graphs, resulting in efficient scale-up and billions of nodes in one database on moderate hardware
- Written on top of the JVM
Neo4j’s free and open-source Community edition is a high-performance, fully ACID-transactional database. The Community edition includes (but is not limited to) all the functionality described previously in this section. Neo4j’s Enterprise editions provide all of the functionality of the Community edition in addition to scalable clustering, fail-over, high-availability, live backups, and comprehensive monitoring. Learn more about the Community and Enterprise editions.
You can download Neo4j from
http://neo4j.com/download and install it as a server on all operating systems. By default, the Neo4j Server is bundled with an interactive, web-based database interface bound to
http://localhost:7474. The simplest way of getting started is to use Neo4j’s database browser to execute your graph queries (written in Cypher) in a workbench-like fashion. Results are presented as either intuitive graph visualizations or as easy-to-read, exportable tables.
A remote Neo4j server can be accessed via its Cypher HTTP API, either directly or through one of the many available language drivers. For high performance special cases, you can write Neo4j server extensions in any JVM language to access Neo4j’s internal database engine directly without network overhead.