Neo4j Graph Data Science Library

Enterprise Analytics Workspace and Graph-Native Machine Learning

Harness the Predictive Power of Relationships

Neo4j created the first enterprise graph framework for data scientists to improve predictions that drive better decisions and innovation. Neo4j for Graph Data Science incorporates the predictive power of relationships and network structures in existing data to answer previously intractable questions and increase prediction accuracy.

The Neo4j Graph Data Science Library is the analytics engine of this framework, making it possible to address complex questions about system dynamics and group behavior. Data scientists benefit from a customized, flexible data structure for global computations and a repository of powerful, robust algorithms to quickly compute results over tens of billions of nodes.

Graph algorithms provide unsupervised machine learning methods and heuristics that learn and describe the topology of your graph. The GDS Library includes hardened graph algorithms with enterprise features, like deterministic seeding for consistent results. And with graph embeddings and trained models inside of the analytics workspace, you can make predictions about your graph from within Neo4j.

Graph Data Science Algorithms

A subset of data science algorithms that come from network science, Graph Algorithms enable reasoning about network structure

Community Detection

Community Detection

Detects group clustering or partition options

Centrality/Importance

Centrality (Importance)

Determines the importance of distinct nodes in the network

Similarity

Similarity

Evaluates how alike nodes are

Heuristic Link Prediction

Heuristic Link Prediction

Estimates the likelihood of nodes forming a relationship

Pathfinding & Search

Pathfinding & Search

Finds optimal paths; evaluates route availability, quality

The categories of algorithms provided with the Neo4j Graph Data Science Library:

  • Community Detection algorithms cluster your graph based on relationships to find communities where members have more significant interactions. This category includes popular algorithms – such as Connected Components and Louvain Modularity. Detecting communities helps predict similar behavior, find duplicate entities or simply prepare data for other analyses.
  • Centrality algorithms reveal which nodes are important based on graph topology. They identify influential nodes based on their position in the network and include the famous PageRank algorithm. These algorithms are used to infer group dynamics such as credibility, rippling vulnerability and bridges between groups.
  • Node Embedding algorithms transform the topology and features of your graph into fixed length vectors that uniquely represent each node. Graph embeddings are powerful because they preserve key features while reducing dimensionality in a way that can be decoded. Embeddings capture the complexity and structure of a graph and transform it for use in various machine learning tasks.
  • Similarity algorithms employ set comparisons to score how alike individual nodes are based on their neighbors or properties. The properties and attributes of nodes are used to score the likeness between nodes. This approach is used in applications such as personalized recommendations and developing categorical hierarchies.
  • Link Prediction algorithms consider the proximity of nodes in a graph as well as structural elements, such as possible triangles between nodes, to predict the likelihood of a new relationship forming in the future or that undocumented connections exist. Preferential Attachment is included in this class of algorithms that has many applications, from drug repurposing and estimating collaboration to criminal investigations.
  • Pathfinding algorithms are foundational to graph analytics and find the most efficient or shortest paths to traverse between nodes. The A* and Dijkstra’s algorithms are included in this category, which are used to understand complex dependencies and evaluate routes for uses such as physical logistics and least-cost call or IP routing.
Graph Algorithms Book Cover

Get your free eBook copy of the new O'Reilly book on Graph Algorithms

Whether you are trying to build dynamic network models or forecast real-world behavior, this book illustrates how graph algorithms deliver value – from finding vulnerabilities and bottlenecks to detecting communities and improving machine learning predictions.

Flexible, Scalable Analytics Workspace

Computational Graph
Native Graph Storage

For efficiency, the graph algorithms run in a customized analytics workspace created by the graph catalog. The computational graphs are loaded in parallel and materialized in-memory from the Neo4j Graph Database.

The GDS Library automates the data transformations so you can easily benefit from maximum compute performance for analytics as well as native graph storage for compact persistence.

Mutable In-memory Graph

The in-memory computational graph is mutable, which means you can reshape it on the fly and layer analytics steps – all without altering the original graph until you’re ready to save results.

This means data scientists can build workflows to streamline processes, like automatically loading a named graph, chaining algorithms together and ultimately writing to their database or exporting new graphs.

Generate Better Predictions

Alicia Frame, Lead Product Manager and Data Scientist at Neo4j, explained why Neo4j for Graph Data Science is the most expeditious way to generate better predictions.

“A common misconception in data science is that more data increases accuracy and reduces false positives,” explained Frame. “In reality, many data science models overlook the most predictive elements within data – the connections and structures that lie within. Neo4j for Graph Data Science was conceived for this purpose – to improve the predictive accuracy of machine learning, or answer previously unanswerable analytics questions, using the relationships inherent within existing data.”

The Graph Data Science Library is part of the Neo4j Graph Data Science framework built for data scientists. It offers a friendly data science experience with guardrails like logical memory management, intuitive API and extensive documentation.

Data scientists can also visually explore algorithm results with Neo4j Bloom and share visual perspectives across data science, development and business teams for better collaboration.

“A common misconception in data science is that more data increases accuracy and reduces false positives. In reality, many data science models overlook the most predictive elements within data – the connections and structures that lie within.”

—Alicia Frame,
Neo4j Data Scientist

Get Started with Neo4j Graph Data Science

The world is driven by connections – it’s time you leveraged the value hidden in your connected data.

To not just react but predict and prescribe the best course of action, you need powerful data science created for connected systems.

The First Enterprise Framework for Graph Data Science

Answer intractable questions

Answer previously intractable questions and use the predictive power of relationships for analytics and machine learning

Scale to tens of billions of nodes

Scale to tens of billions of nodes with optimized, parallelized algorithms and a compact footprint

Performance of a graph-specific analytics workspace

Performance of a graph-specific analytics workspace for computation integrated with a native graph database

key features icon

Scalable in-memory graph model that loads in parallel, flexibly aggregates and reshapes underlying data models

key features icon

Friendly interface with flexible graph reshaping in-memory, logical guardrails and a graph visualization tool

key features icon

Production features from the graph leader with dedicated graph data science support

Improving Analytics, ML & AI for Enterprises

Caterpillar’s AI Supply Chain & Maintenance

  • 27 Million warranty & service documents parsed for text to knowledge graph
  • Graph is context for AI to learn “prime examples” and anticipate maintenance
  • Improves satisfaction and equipment lifespan
Caterpillar

German Center for Diabetes Research (DZD)

  • Connecting 50 research databases, 100k’s of Excel workbooks, 30 bio-sample databases
  • Bytes 4 Diabetes Award for use of a knowledge graph, graph analytics, and AI
  • Customized views for research angles
DZD

Financial Fraud Detection & Recovery

  • Almost 70% of credit card fraud was missed
  • About 1 billion nodes and 1 billion relationships to analyze
  • Graph analytics with queries & algorithms help find $ millions of fraud in 1st year
Financial Fraud

Download the White Paper

Financial Fraud Detection with Graph Data Science Book Cover

White Paper: Financial Fraud Detection with Neo4j Graph Data Science

Explore using the Graph Data Science Library and Neo4j Bloom with the white paper, Financial Fraud Detection with Graph Data Science: How Graph Algorithms & Visualization Better Predict Emerging Fraud Patterns, and learn how to tap into the power of graph technology for higher quality predictions.