Chapter 1. Introduction

This chapter introduces Neo4j.

Neo4j is the world’s leading graph database. It is built from the ground up to be a graph database, meaning that its architecture is designed for optimizing fast management, storage, and the traversal of nodes and relationships. Therefore, relationships are described as first class citizens in Neo4j.

In the world of relational databases the performance of a join operation will degrade exponentially with the number of relationships. However, in Neo4j the corresponding action is performed as navigation from one node to another; an operation whose performance is linear.

This different approach to storing and querying connections between entities provides traversal performance of up to four million hops per second and core. Since most graph searches are local to the larger neighborhood of a node, the total amount of data stored in a database will not affect operations runtime. Dedicated memory management, and highly scalable and memory efficient operations, contribute to the benefits.

The property graph approach is also whiteboard friendly. By this we mean that the schema-optional model of Neo4j provides for a consistent use of the same model throughout conception, design, implementation, storage, and visualization. A major benefit of this is that it allows all business stakeholders to participate throughout the development cycle. Additionally, the domain model can be evolved continuously as requirements change, without the penalty of expensive schema changes and migrations.

Cypher, the declarative graph query language, is designed to visually represent graph patterns of nodes and relationships. This highly capable, yet easily readable, query language is centered around the patterns that express concepts and questions from a specific domain. Cypher can also be extended for narrow optimizations for specific use cases.

Neo4j can store trillions of entities for the largest datasets imaginable while being sensitive to compact storage. For production environments it can be deployed as a scalable, fault-tolerant cluster of machines. Due to its high scalability, Neo4j clusters require only tens of machines, not hundreds or thousands, saving on cost and operational complexity. Other features for production applications include hot backups and extensive monitoring.

1.1. Editions

There are two editions of Neo4j to choose from: Community Edition and Enterprise Edition:

1.1.1. Community Edition

The Community Edition is a fully functional edition of Neo4j, suitable for single instance deployments. It has full support for key Neo4j features, such as ACID compliance, Cypher, and programming APIs. It is ideal for learning Neo4j, for do-it-yourself projects, and for applications in small workgroups.

1.1.2. Enterprise Edition

The Enterprise Edition extends the functionality of Community Edition to include key features for performance and scalability, such as a clustering architecture for high availability and online backup functionality. Additional security features include role-based access control and LDAP support; for example, Active Directory. It is the choice for production systems with requirements for scale and availability, such as commercial solutions and critical internal solutions.

1.1.3. Feature details

Table 1.1. Features
Edition Community Enterprise

Labeled property graph model

Native graph processing & storage

ACID transactions

Cypher graph query language

Neo4j Browser with syntax highlighting

Bolt binary protocol

Language drivers for C#, Java, JavaScript & Python

High-performance native API

High-performance caching

Cost-based query optimizer

Graph algorithms library to support AI initiatives

Fast writes via native label indexes

Composite indexes

Slotted and Compiled Cypher runtimes

-

Property-existence constraints

-

Node Key schema constraints

-

Listing and terminating running queries

-

Auto-reuse of space

-

Role-based access control

-

Subgraph access control

-

Property-level security

-

LDAP and Active Directory integration

-

Kerberos security option

-

Table 1.2. Performance & Scalability
Edition Community Enterprise

Causal Clustering for global scale applications

-

Multi-clustering

-

Enterprise lock manager accesses all cores on server

-

Intra-cluster encryption

-

Offline backups

Online backups

-

Encrypted backups

-

Rolling upgrades

-

Automatic cache warming

-

Routing and load balancing with Neo4j Drivers

-

Advanced monitoring

-

Graph size limitations

34B nodes, 34B relationships, 68B properties

No limit

Bulk import tool

Bulk import tool, resumable

-