The Internet-Scale, Native Graph Database
Neo4j equally exploits both data relationships and data elements, empowering the next generation of breakthrough applications.
Yesterday's breakthrough applications were driven by big data – tomorrow's breakthrough applications will be driven by connected data. No longer powered merely by data transactions, these applications draw together every system across the entire enterprise.
These networks of related data are known as graphs.
As a native graph database, Neo4j is specifically optimized to store and traverse these graphs of connected data. By intuitively mapping data points and the connections between them, Neo4j powers intelligent, real-time applications that tackle today's toughest enterprise challenges, including:
The Impact of Native Graph Technology
In order to truly harness the power of connected data, a graph database must be engineered from top to bottom to handle data relationships. Only native graph technology handles the scalability, reliability and performance required by an always-on, mission-critical application.
From its inception, Neo4j defined the benchmark for native graph databases, and has continued to do so as the technology evolves.
What Makes a Graph Database Native? 7 Essentials
Property graph data model
The labeled property graph model – pioneered by the Neo4j team – intuitively maps the data model between whiteboard and keyboard.
Graph-specific visualization and tooling
The Neo4j Browser allows you to visualize your connected data, simplifies your Cypher commands and offers query development tools beyond the command line.
Graph-specific processing engine
For enterprise-grade performance, a native graph database must offer compiled graph queries, graph query planning, graph-specific APIs, native application drivers, graph-specific cost-based optimizers and high performance caching.
Graph-specific scalability features
Neo4j includes off-heap memory management, Causal Clustering that optimizes for both read-only access and read/write access, high availability (HA), disaster recovery and multi-data center support.
- Nodes are the main data elements
- Nodes are connected to other nodes via relationships
- Nodes can have one or more properties (i.e., attributes stored as key/value pairs)
- Nodes have one or more labels that describes its role in the graph
- Example: Person nodes vs Car nodes
- Relationships connect two nodes
- Relationships are directional
- Nodes can have multiple, even recursive relationships
- Relationships can have one or more properties (i.e., attributes stored as key/value pairs)
- Properties are named values where the name (or key) is a string
- Properties can be indexed and constrained
- Composite indexes can be created from multiple properties
- Labels are used to group nodes into sets
- A node may have multiple labels
- Labels are indexed to accelerate finding nodes in the graph
- Native label indexes are optimized for speed
The Cypher Graph Query Language
- Cypher is a declarative graph query language that is intuitive and human-readable
- It is inspired by SQL with pattern matching from SPARQL
- Cypher describes nodes, relationships and properties as ASCII art directly in the language, making queries easy to both read and recognize as part of the graph
- Since it is highly legible, Cypher is easy to maintain, simplifying application maintenance as a result
- Through the openCypher project, Cypher is rapidly becoming the standard and vendor-neutral language for graph technology
There's no denying: Other data stores have their appropriate use cases. But whenever your enterprise wants to leverage the connections between data points, you need to tap into the power of Neo4j.
Here's how the world's leading graph database stacks up against traditional relational databases (RDBMS) and other competing NoSQL data stores:
|Category||Relational Database||Neo4j, Native Graph Database|
|Data Storage||Storage in fixed, pre-defined tables with rows and columns with connected data often disjointed between tables, crippling query efficiency.||Graph storage structure with index-free adjacency results in faster transactions and processing for data relationships.|
|Data Modeling||Database model must be developed with modelers and translated from a logical model to a physical one. Since data types and sources must be known ahead of time, any changes require weeks of downtime for implementation.||Flexible, "whiteboard-friendly" data model with no mismatch between logical and physical model. Data types and sources can be added or changed at any time, leading to dramatically shorter development times and true agile iteration.|
|Query Performance||Data processing performance suffers with the number and depth of JOINs (or relationships queried).||Graph processing ensures zero latency and real-time performance, regardless of the number or depth of relationships.|
|Query Language||SQL: A query language that increases in complexity with the number of JOINs needed for connected data queries.||Cypher: A native graph query language that provides the most efficient and expressive way to describe relationship queries.|
|Transaction Support||ACID transaction support required by enterprise applications for consistent and reliable data.||Retains ACID transactions for fully consistent and reliable data around the clock – perfect for always-on global enterprise applications.|
|Processing at Scale||Scales out through replication and scale up architecture is possible but costly. Complex data relationships are not harvested at scale.||Graph model inherently scales for pattern-based queries. Scale out architecture maintains data integrity via replication. Massive scale up possibilities with IBM POWER8 and CAPI Flash systems.|
|Data Center Efficiency||Server consolidation is possible but costly for scale up architecture. Scale out architecture is expensive in terms of purchase, energy use and management time.||Data and relationships are stored natively together with performance improving as complexity and scale grow. This leads to server consolidation and incredibly efficient use of hardware.|
|Category||Other NoSQL Databases||Neo4j, Native Graph Database|
|Data Storage||No support for connected data at the database level. Performance and data trustability degrade with scale and complexity of connections.||Native graph storage structure with index-free adjacency results in faster transactions and processing for data relationships.|
|Data Modeling||Data model not suitable for enterprise architectures as wide columns and document stores do not offer control at the design level. Puts undue pressure on the application level to catch and solve problems.||Flexible, "whiteboard-friendly" data model allows for fine-grained control of data architecture. Intuitive data model eases communication between developers, architects and DBAs.|
|Query Performance||No graph processing capability for data relationships, thus all relationships have to be created at the application level.||Native graph processing ensures zero latency and real-time performance, regardless of the number or depth of relationships.|
|Query Language||Query language varies, but no query constructs exist to express data relationships.||Cypher: A native graph query language that provides the most efficient and expressive way to describe relationship queries.|
|Transaction Support||BASE transactions lead to data corruption because basic availability and eventual consistency are unreliable for data relationships.||ACID transactions ensure data is fully consistent and reliable around the clock – perfect for always-on global enterprise applications.|
|Processing at Scale||Optimized for ingesting data but not reading data at scale. Scalability depends on scale out architecture that does not protect the integrity of graph-like data, so data is not trustworthy.||Native graph model inherently scales for pattern-based queries. Scale out architecture maintains data integrity via replication. Massive scale up possibilities with IBM POWER8 and CAPI Flash systems.|
|Data Center Efficiency||Scale out architecture assumes ongoing access to more and more commodity hardware without accounting for energy costs, network vulnerabilities and other risks.||Data and relationships are stored natively together with performance improving as complexity and scale grow. This leads to server consolidation and incredibly efficient use of hardware.|
Flexible SchemaThe labeled property graph model captures data as it naturally occurs, eliminating the need to translate a whiteboard model into tables, columns, documents or triples – and eradicating future schema migrations. Instead, developers enjoy the flexibility to add or remove properties as business requirements change, with optional schema constraints for enterprise governance or rules enforcement.
High-Performance Query ExecutionQuerying connected data presents new opportunities to query relationship information in real-time applications. As a native graph database, Neo4j offers index-free adjacency, the fastest way to search through millions of data connections per second (per core). As a result, performance remains constant no matter the volume or complexity of your dataset.
Cypher Query LanguageCypher is a declarative graph query language that naturally describes graph patterns. It is intuitive to both read and learn, and requires 10-100x less code than SQL. Its natural pattern-matching ability means you no longer need to debug nested JOINs. Through the openCypher project, Cypher will become the de facto language for graph technology across the industry.
Scale and PerformanceNeo4j lets you scale across every key dimension: volume, reads, writes and locations – all while providing blazing-fast queries, consistent response times and rock-solid data integrity. Neo4j also offers support for replication with master re-election and failover to keep your data safe and reliable.
Advanced Causal Clustering*Neo4j supports scalability across global data centers through its proprietary Causal Clustering architecture. This Raft-based architecture supports the ability to scale both read/write Core Servers independently from Read Replica servers, allowing your internet-scale application to perform perfectly for a global audience.
Built-in Tooling & VisualizationThe Neo4j Browser allows developers to query and visualize your connected data. Visualization is key to discovering patterns in your graph data that can then be easily translated into perpetual Cypher queries – all within the Browser experience. In addition, query profile and planning tools allow you to fine-tune queries before deploying to production.
Seamless Data ImportNeo4j has always been available for on-premises deployment, but many now use Neo4j in cloud environments like Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform. No matter your preferred platform, fully hosted offerings are available through the Neo4j partner ecosystem. In addition, our official Docker image simplifies automation and deployment, making it easy to get up and running with a single instance or a full HA cluster.
Cloud-Ready DeploymentNeo4j has always been available for on-premises deployment, but many now use Neo4j in cloud environments like Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform. No matter your preferred platform, fully hosted offerings are available through the Neo4j partner ecosystem. In addition, our official Docker image simplifies automation and deployment, making it easy to get up and running with a single instance or a full HA cluster.
Elastic Scalability*Neo4j clustering provides scale-out capabilities for reads, letting you spread out your graph in memory, while ensuring each instance is able to get to any node or relationship using its own local copy. This allows for blazing speed even as your graph dataset grows, all while providing high availability via a replication protocol. Massive scale up architecture is also possible with Neo4j on IBM POWER8 with CAPI Flash.
In-Memory Page Cache*Neo4j Enterprise Edition includes an in-memory page cache that is separate from traditional JVM-based caching strategies. Caching can also be location or data center specific.
Hot Backups*Neo4j Enterprise Edition allows you to take hot, point-in-time backups while your graph database is still running. Your application can keep running 24/7 without compromising the availability or quality of your backups.