Online Course Data Science with Neo4j Setting Up your Development Environment Exploratory Data Analysis Recommendations Predictions Summary: Data Science with Neo4j Want to Speak? Get $ back. Predictions About this module In this module you will learn how to build… Read more →

# Predictions

### About this module

In this module you will learn how to build a machine learning classifier to predict co-authorships in the citation graph.

At the end of this module, you should be able to:

- Describe what link prediction is
- Use the link prediction functions in Neo4j
- Understand the challenges when building machine learning models on graph data
- Build a link prediction classifier using scikit-learn with features derived from the Neo4j Graph Algorithms library

### The Link Prediction problem

Link Prediction has been around for a long time, but was popularised by a paper written by Jon Kleinberg and David Liben-Nowell in 2004, titled The Link Prediction Problem for Social Networks.

Kleinberg and Liben-Nowell approach this problem from the perspective of social networks, asking this question:

Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future?

We formalize this question as the Link Prediction problem, and develop approaches to Link Prediction based on measures for analyzing the “proximity” of nodes in a network.

For example, we could predict future associations between:

- People in a terrorist network
- Associations between molecules in a biology network
- Potential co-authorships in a citation network
- Interest in an artist or artwork

In each these examples, predicting a link means that we are **predicting some future behaviour**.
For example in a citation network, we’re actually predicting the action of two people collaborating on a paper.

### Link Prediction Algorithms

Kleinberg and Liben-Nowell describe a set of methods that can be used for Link Prediction.
These methods compute a score for a pair of nodes, where the score could be considered a **measure of proximity** or “similarity” between those nodes based on the graph topology.
The closer two nodes are, the more likely there will be a relationship between them.

### Exercise 1: Running Link Prediction algorithms

You will gain some experience running the Link Prediction algorithms.
In the query edit pane of Neo4j Browser, execute the browser command: `:play data-science-exercises` and follow the instructions for the Link Prediction exercise.

### Applying Link Prediction Algorithms

Now that you have learned how to execute the link prediction algorithms, you will learn what to do with the results. There are two approaches:

### Using the measures directly

You can use the scores from the link prediction algorithms directly. With this approach you set a threshold value above which the algorithm would predict that a pair of nodes will have a link.

For example, you might say that every pair of nodes that has a preferential attachment score above 3 would have a link, and any with 3 or less would not.

### Exercise 2: Building a binary classifier

In this exercise, you will build a binary classifier to predict co-authorships using a notebook.

### Check your understanding

### Question 1

Which link prediction algorithm “captures the notion that two strangers who have a common friend may be introduced by that friend.”?

Select the correct answer.

- Adamic Adar
- Common Neighbors
- PageRank
- Preferential Attachment

### Summary

You should now be able to:

- Describe what link prediction is
- Use the link prediction functions in Neo4j
- Understand the challenges when building machine learning models on graph data
- Build a link prediction classifier using scikit-learn with features derived from the Neo4j Graph Algorithms library