Introducing Neo4j 5: Unbounded Scale, Performance, and Agility | Learn More

Online Course Using a Machine Learning Workflow for Link Prediction Setting Up your Development Environment Exploratory Data Analysis Recommendations Predictions Summary: Using a Machine Learning Workflow for Link Prediction Want to Speak? Get $ back. Using a Machine Learning Workflow… Read more →

Using a Machine Learning Workflow for Link Prediction

About this course

This course introduces you to using Neo4j as part of your Data Science and Machine Learning workflows. You will learn how to do this with the help of the citation dataset. This dataset contains papers, authors, and citations from DBLP – a computer science bibliography website.

This course is intended for data scientists and data analysts.

This self-paced training should take you four hours to complete if you perform all of the hands-on exercises in the course.

We have set up a discussion area in our Neo4j Community Site, if you run into problems in the course and need assistance. You should register on the Neo4j Community site where you can view other questions and answers for students taking our online training courses. The Neo4j Community Site is an excellent resource for answering many types of questions posed by other users of Neo4j.

There are four modules in this course. Most modules have hands-on exercises you should complete and all modules have a set of review questions at the end. If you answer all review questions correctly, you will receive a Certificate of Completion for this course. The hands-on exercises in this course can be completed using Neo4j Desktop and Jupyter Notebook. We have provided an estimate of how long each module should take you to complete if you perform the hands-on exercises.

Setting Up Your Development Environment

  • Set up your Neo4j development environment for performing the hands-on exercises of this course.
  • Set up your Python environment for running Jupyter notebooks in this course.

Estimated time: 30 minutes

Exploratory Data Analysis

  • Query a database for its schema.
  • Return and chart the number of node labels and relationship types using matplotlib.
  • Build and plot a histogram of papers and their citations using pandas and matplotlib.

Estimated time: 30 minutes


  • Find potential collaborators for an author.
  • Find relevant papers about a topic for an author.

Estimated time: 60 minutes


  • Describe what link prediction is.
  • Use the link prediction graph algorithms in Neo4j.
  • Understand the challenges when building Machine Learning models on graph data.
  • Build a link prediction classifier using scikit-learn with features derived from the Neo4j Data Science Library

Estimated time: 90 minutes


  • Download Certificate of Completion.
  • Resources to learn more.
  • Course feedback.

Stay Connected

Sign up to find out more about Neo4j's upcoming events & meetups.