Happy One-Year Anniversary to Neo4j Graph Data Science!

Amy Hodler, Neo4j Partner Marketing Manager

April 13, 2021

4 min read

Celebrating the one-year anniversary of Neo4j's graph data science framework.

Happy anniversary, Neo4j Graph Data Science! Has it really been a year already?! We announced the Neo4j’s Graph Data Science (GDS) framework back in April 2020, and we’ve come a long way since!

To Catch You Up

Neo4j GDS is the first enterprise graph framework for data scientists and includes:

The Neo4j Graph Data Science Library for efficiently executing graph algorithms.
The Neo4j Graph Database for persistence.
The Neo4j Bloom data visualization tool to easily explore and investigate results.

The GDS Library started with 42 graph algorithms in five categories: pathfinding, community detection, centrality, similarity, and heuristic link prediction. It also included an analytics workspace to transform your Neo4j database into an in-memory format specifically to run analytics workloads.

We were pretty proud of the broad graph algorithm support, as well as being able to run these notoriously greedy algorithms over tens of billions of nodes in production. And although we had been working closely with the community, you surprised us right away with tons of well-thought-out requests, covering everything from specific algorithms to more enterprise features – and all things machine learning (ML).

Just a side note: We curiously make big strides around April. Is it the fact that Euler’s birthday is April 15th? Hmm. The 15th also marks the two-year anniversary of the O’Reilly Graph Algorithms book, which now has over 170,000 downloads. And, of course, it’s the date of the Global Graph Celebration Day!

So, What’s New?

We added algorithms (nearly 60 now) based on your feedback, like Hyperlink-Induced Topic Search (HITS) and Speaker Listener Label Propagation, as well as a Pregel API so you could write your own algorithms. We’re really excited to be adding some community contributions soon, as well! However, most of our time this last year has been focused on three areas:

1) New graph-native ML capabilities based on state-of-the science ML techniques to improve your predictions, even when you don’t know exactly what you’re looking for.

Three graph embedding algorithms were added that transform the topology and features of your graph into vectors format specifically for machine learning. As opposed to other types of algorithms, these enable feature engineering even when you don’t know what’s predictive in your data.
To make it easier to gain insights from graph embeddings, we also added general ML algorithms such as the k-nearest neighbors algorithm (k-NN), commonly used for pattern-based classification.
Supervised machine learning workflows are now available inside Neo4j. You can start with a graph, train classification, or link prediction ML models, make a prediction, and then update your graph – without ever leaving Neo4j. We also automated some of the tricky steps like test/train data splits and model scoring so you don’t have to worry about easily overlooked issues like data leakage.

2) Optimizing the infrastructure for even larger workloads and mature enterprise processes.

We added transformation features to the in-memory graph so you can project, subset, and transform based on node labels and relationship types, aggregate or modify relationships for deduplication or weighting, or just to reverse them!
A new enterprise memory format was created to reduce the in-memory footprint by 75% so you can project even larger graphs in less RAM.
You can now persist ML models in Neo4j – they’re stored in the database and survive restarts – and share models among teams.

3) Tighter integration with the Neo4j database and platform to improve workflows.

GDS respects the role-based access control (RBAC) that was added to Neo4j 4.x so you can leverage finely tuned security privileges.
In support of Neo4j Fabric (sharding), graph projections and algorithms can be executed on each shard individually, and the results can be combined via the Fabric proxy.
You can now export an in-memory graph to a new database using the multidatabase capabilities of Neo4j 4.x. (We didn’t realize how popular this was going to be until data scientists told us they loved it for graph-based versioning.)
Neo4j Bloom added categorical coloring that can be based on algorithms results and hierarchical layouts that help users “see” dependencies in their data.

Whew! And our development velocity is increasing in 2021. We’ll continue to enhance our ML features (more awesomeness coming!), add warm-backups for your GDS instances, and bring GDS to Neo4j AuraDB, our fully managed cloud service.

To get your hands on the latest from the GDS Library, visit our download center or go straight to our GitHub repo.

There’s a lot of exciting new features in store but (team: GDS) – [:LOVES] -> (r: feedback {type: “Detailed”}), so let us know what you’d like to see before your next anniversary!

Ready to get started with graph data science?

Download your free copy of Graph Data Science For Dummies.

Download Your Free Copy