Online Course Using a Machine Learning Workflow for Link Prediction Setting Up your Development Environment Exploratory Data Analysis Recommendations Predictions Summary: Using a Machine Learning Workflow for Link Prediction Want to Speak? Get $ back. Predictions About this module In… Read more →
Kleinberg and Liben-Nowell approached this problem from the perspective of social networks, asking this question:
Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future?
We formalize this question as the Link Prediction problem, and develop approaches to Link Prediction based on measures for analyzing the “proximity” of nodes in a network.
For example, we could predict future associations between:
People in a terrorist network.
Molecules in a biology network.
Potential co-authorships in a citation network.
Interest in an artist or artwork.
In each these examples, predicting a link means that we are predicting some future behaviour.
For example in a citation network, we’re actually predicting the action of two people collaborating on a paper.
Link Prediction Algorithms
Kleinberg and Liben-Nowell describe a set of methods that can be used for Link Prediction.
These methods compute a score for a pair of nodes, where the score could be considered a measure of proximity or “similarity” between those nodes based on the graph topology.
The closer two nodes are, the more likely there will be a relationship between them.
Exercise 1: Running Link Prediction algorithms
You will gain some experience running the Link Prediction algorithms.
In the query edit pane of Neo4j Browser, execute the browser command: :play gds-data-science-exercises and follow the instructions for the Link Prediction exercise.
Applying Link Prediction Algorithms
Now that you have learned how to execute the link prediction algorithms, you will learn what to do with the results.
There are two approaches:
Using measures directly
Using the measures directly
You can use the scores from the link prediction algorithms directly.
With this approach, you set a threshold value above which the algorithm would predict that a pair of nodes will have a link.
For example, you might say that every pair of nodes that has a preferential attachment score above 3 would have a link, and any with 3 or less would not.
You can take a supervised learning approach where you use the scores as features to train a binary classifier.
The binary classifier then predicts whether a pair of nodes will have a link.
In the next part of this module you will use the supervised learning approach.
Exercise 2: Building a binary classifier
In this exercise, you will build a binary classifier to predict co-authorships using a notebook.
Launch the 04_Predictions.ipynb notebook and follow the steps in this exercise.
Check your understanding
Which Link Prediction algorithm “captures the notion that two strangers who have a common friend may be introduced by that friend.”?
Select the correct answer.
Which of these challenges do we need to address when building a binary classifier for Link Prediction?
Select the correct answers.
Which feature is the most important in our final model?
Select the correct answer.
You should now be able to:
Describe what Link Prediction is.
Use the Link Prediction algorithms in Neo4j.
Understand the challenges when building Machine Learning models on graph data.
Build a Link Prediction classifier using scikit-learn with features derived from the Neo4j Data Science library.