Introduction to Graph Data Science library

Neo4j for Graph Data Science

Neo4j for Graph Data Science is a toolbox that combines a native graph analytics workspace and graph database with scalable graph algorithms and graph visualization for a reliable, easy-to-use experience. This framework enables data scientists to confidently operationalize better analytics and machine learning models that infer behavior based on connected data and network structures.

Neo4j for graph data science

Steps for using Graph Data Science

This toolbox aims to provide support and ease of use for all the steps in Graph Data Science analysis.

Graph data science steps

Neo4j Database

Neo4j Database
  • Neo4j is a native graph database, built from the ground up to leverage not only data but also data relationships.

  • Neo4j connects data as it is stored, enabling queries never before imagined, at speeds never thought possible.

Neo4j Bloom

Bloom

Neo4j Bloom is an easy-to-use graph exploration application for visually interacting with Neo4j graphs.

Bloom gives graph novices and experts the ability to visually investigate and explore their graph data from different business perspectives. Its illustrative, codeless search-to-visualization design makes it the ideal interface for fostering communication between peers, managers and executives, where they can share the innovative work of their graph development and analytics teams.

Graph Data Science Library

The Neo4j Graph Data Science Library (GDSL) contains many graph algorithms. The algorithms are divided into categories which represent different problem classes:

  • Path Finding - these algorithms help find the shortest path or evaluate the availability and quality of routes.

  • Centrality - these algorithms determine the importance of distinct nodes in a network.

  • Community Detection - these algorithms evaluate how a group is clustered or partitioned, as well as its tendency to strengthen or break apart.

  • Similarity - these algorithms help calculate the similarity of nodes.

  • Link Prediction - these algorithms determine the closeness of pairs of nodes.

  • Node Embeddings - these algorithms compute vector representations of nodes in a graph.

These graph algorithms help to effectively answer questions about your data’s connectivity or topology, or generate new and highly predictive features for your existing ML pipelines. At this writing, the the GDSL contains more than 50 graph algorithms.

Graph data science algorithms

Three tiers of graph algorithm maturity

Algorithms exist in one of three tiers of maturity:

Production-quality

Indicates that the algorithm has been tested with regards to stability and scalability. Algorithms in this tier are prefixed with gds.<algorithm> and are supported by Neo4j.

Beta

Indicates that the algorithm is a candidate for the production-quality tier. Algorithms in this tier are prefixed with gds.beta.<algorithm>.

Alpha

Indicates that the algorithm is experimental and might be changed or removed at any time. Algorithms in this tier are prefixed with gds.alpha.<algorithm>.

Check your understanding

Question 1

What tool is available to you for visually exploring a graph?

Select the correct answer.

  • Neo4j Viewer

  • Neo4j Bloom

  • Neo4j Explorer

  • Neo4j Graph Explorer

Question 2

How many graph algorithms are available to you in the Graph Data Science Library?

Select the correct answer.

  • ~10

  • ~20

  • ~50

  • ~450

Question 3

What are the tiers of maturity of algorithms in the Graph Data Science Library?

Select the correct answers.

  • Production-quality

  • Beta

  • Alpha

  • Omega

Summary

In this chapter you have familiarized yourself with the Neo4j for Graph Data Science toolkit.