Popularity and PageRank at Westerburg High


Learn more about the PageRank algorithm.


Greetings and salutations!

If you’re like me, high school is something you look back on with mixed emotions. There were good times, and good friends, but being an adult in 2019 is so much better. When I was in high school there weren’t even any graph databases! At least I have the classic teen films of the ’80s and ’90s to provide me with myriad idealised (or, not so idealised) memories of my teenaged world.

In fact, I sometimes use examples from high school to illustrate how graph databases and graph algorithms work. I know what you’re thinking… “What’s your damage, Joe?” Sounds strange, I know, but let me show you. Trust me, it’ll be really very!

One of the most famous graph algorithms is PageRank, which was created by Google and named after one of its founders (Larry Page). When answering search requests, Google uses PageRank to help measure the importance of a web page so it can return the most important results first. According to Google:

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
Web pages get a higher PageRank score when they are linked to many times from other pages, especially when those are high quality links. This is calculated to several levels deep – so when a webpage with a high PageRank score links to another web page, that is considered a higher quality link than a link from a webpage with a low PageRank score.

Makes perfect sense, right? Excited about graph algorithms yet? Colour you stoked?

“Get crucial!” I can hear you saying. Where’s the PageRank Cliff’s Notes? This is why I think it’s easier (or at least, more fun) to explain PageRank as a High School popularity contest. I’ll do this using the characters and plot from the classic 1988 dark comedy teen film “Heathers.” Ready? Now check this out…

The most powerful clique at Westerburg High are the Heathers: Heather Chandler, Heather Duke, Heather McNamara and (confusingly) Veronica Sawyer. They’re a bunch of Swatch dogs and Diet Coke heads. Heather Chandler is their leader – the most popular girl at school, she’s worshipped at Westerburg and she’s only a Junior! She can get into parties at Remington University and everything.

MATCH path = (h1:Heather)-[:LIKES|ENVIES]->(h2:Heather) RETURN path



As you can see from this Bloom screenshot, the Heathers all like each other… but, actually, not that much. They’re more frenemies than friends!

On each relationship in our graph we’ve got a “weight” property to indicate how strong that relationship is, on a scale from 1 to 10. The Heathers all “LIKE” each other with a weight of “4.” The other Heathers envy Heather Chandler, but Veronica doesn’t – she thinks Heather Chandler is such a Megabitch. In our graph a strong envy would be a weight of “5” and a slight envy would be a weight of “2.”

It’s critical to note that in our graph the direction of each relationship is very important. Since Heather Duke and Heather Chandler “LIKE” each other, we have a relationship going in each direction. But just because Heather Duke envies Heather Chandler doesn’t mean Heather Chandler is jealous of Heather Duke!

Using Bloom to drill down into the relationships around Heather Chandler in our graph, we can start to see why she’s so popular:



We can see that there are 27 “FANCIES” relationships pointing to Heather Chandler, 16 “ENVIES” relationships, and 15 “LIKES” relationships. Furthermore, they’re strong relationships – the “ENVIES” relationships generally have a weight of “5” (because all the girls really want to be her), and the “FANCIES” relationships generally have a weight of “10” (because all the boys really want to date her).

To really understand how PageRank works, though, we have to step back and look at the whole graph.

MATCH path = (x:Student)-[]->(y:Student) RETURN path



Wow, that’s complex! Never let anyone tell you that navigating the social world of high school is straightforward.

The Social Hierarchy of Westerburg High: An Overview


You can see that we have several different groups of students at Westerburg High:

    • The Loners
    • The Metal Heads
    • The Stoners
    • The Geek Squad
    • The Dweebs
    • The Year Book Club
    • The Posh Kids
    • The Jocks
    • The Cheerleaders
    • The Heathers
In our graph we have two “Loners.” The tragic Martha “Dumptruck” Dunnstock is in the lower left of the graph all by herself, without any relationships at all. No one at Westerburg lets her play their reindeer games! The other “Loner” is Jason “J.D.” Dean, who only has a reciprocal “FANCIES” relationship with Veronica Sawyer.

The Metal Heads at Westerburg like each other a lot, and the Stoners also like each other a lot. Both groups “LIKE” each other but with a lower weight. No one else “LIKES,” “ENVIES” or “FANCIES” either the Metal Heads or the Stoners.

The Geek Squad “LIKE” each other a lot, and they also “LIKE” the Dweebs but with a lower weight.

The Dweebs “LIKE” each other very much. They also “LIKE” the Geek Squad and the Year Book Club with a lower weight.

The Year Book Club “LIKE” each other a lot. They also “LIKE” the Dweebs and the Posh Kids with a lower weight.

The Posh Kids – or Preppies, when I was in High School – “LIKE” each other very much. They also “LIKE” the Year Book Club, the Jocks and the Cheerleaders with a lesser weight. All Stoners, Metal Heads, Geek Squad, Dweebs and Year Book Club members slightly “ENVY” the Posh Kids.

The Jocks “LIKE” both each other and the Cheerleaders a lot. They also “LIKE” the Heathers and the Posh Kids with a lesser weight. All boys from the Stoners, Metal Heads, Geek Squad, Dweebs, Year Book Club and Posh Kids sort of “ENVY” the Jocks. All the Stoner, Metal Head, Geek Squad, Dweeb, Year Book Club and Posh Kid girls “FANCY” the Jocks quite a bit. Most of the Jocks have a Cheerleader girlfriend, so each Jock has a “FANCIES” relationship with one Cheerleader. All except poor Kurt Kelly, that is – he has an unrequited “FANCIES” relationship to Veronica Sawyer.

The Cheerleaders “LIKE” both each other and the Jocks very much. They also “LIKE” the Heathers and the Posh Kids but with a lower weight. All boys from the Stoners, Metal Heads, Geek Squad, Dweebs, Year Book Club and Posh Kids really “FANCY” the Cheerleaders. All the Stoner, Metal Head, Geek Squad, Dweeb, Year Book Club and Posh Kid girls sort of “ENVY” the Cheerleaders. Each Cheerleader has a Jock boyfriend, so each Cheerleader has a “FANCIES” relationship with one Jock.

In general, the Heathers don’t “LIKE” anyone but themselves. The exception is Veronica, who “LIKES” Betty Finn the Dweeb. Heather McNamara “FANCIES” her Jock boyfriend Ram Sweeney, and Veronica “FANCIES” the Loner, Jason “J. D.” Dean.

The PageRank Popularity Contest


Phew. Makes total sense, right? You’d be such a pillow case if you tried to write a Cypher query to do PageRank with that amount of complexity, though. Graph algorithms to the rescue!

We can calculate the PageRank score of each Student in our graph with a simpler query, which calls the PageRank graph algorithms function:

CALL algo.pageRank.stream('Student', '', {
  iterations:3, dampingFactor:0.85, weightProperty: "weight"
})
YIELD nodeId, score
WITH nodeId, score, collect(labels(algo.asNode(nodeId))) as labels
RETURN algo.asNode(nodeId).name AS name, labels, score
ORDER BY score DESC

This query runs the PageRank algorithm for all of our Students, using all of the relationship types (“LIKES,” “FANCIES,” “ENVIES”) in our graph and taking into account the “weight” property on each relationship. A higher “weight” results in a greater boost to the PageRank score for the student at the end of that relationship.

Crucially, this query calculates PageRank using relationships three levels deep. This is what makes PageRank like a “popularity contest” – it’s not just how many people “LIKE, FANCY, or ENVY” you that matters. What also matters is how popular they are. In other words, how many people “LIKE, FANCY or ENVY” the people who “LIKE, FANCY or ENVY” you… and how many people “LIKE, FANCY or ENVY” the people who “LIKE, FANCY or ENVY” the people who “LIKE, FANCY or ENVY” you.

Put another way, if you’re an unpopular geek and you have 10 unpopular geeks that “LIKE” you, you won’t be as popular as a Posh Kid who is “LIKED” by a smaller number of popular Jocks.

The query above returns these top PageRank results:



Unsurprisingly, Heather Chandler has the highest PageRank score. Next is Veronica, whose score is boosted a bit by the fact that both Kurt Kelly (a popular Jock) and J.D. “FANCY” her, and that she has a strong reciprocal “LIKES” relationship with Betty Finn.

Ram Sweeney is the Jock with the highest PageRank score – the fact that the very popular Heather McNamara “FANCIES” him boosts his score significantly.

In general, the Cheerleaders are slightly more popular than the Jocks – because there are more boys in our graph than there are girls – and the boys all “FANCY” the Cheerleaders.

Perhaps most surprising is how high up the rankings both Betty Finn (a Dweeb) and J.D. (the Loner, quelle surprise) are. This is because Veronica Sawyer “LIKES” Betty and “FANCIES” J.D. Some of her popularity rubs off on them, and improves their PageRank score significantly!



Looking at the next set of query results, we can see that the Posh Kids are less popular than the Jocks, but more popular than the Dweebs and the Year Book Club.

Normally, we would think the Year Book Club would be significantly more popular than the Dweebs – as they’re higher up the social hierarchy and are “LIKED” by the Posh Kids. But, since Betty Finn “LIKES” all the Dweebs, and she’s more popular because of her strong “LIKES” relationship with Veronica Sawyer, all the Dweebs get a nice PageRank score boost from her!



Finally, we can see that the Geek Squad are more popular than the Metal Heads and the Stoners. While the Metal Heads and Stoners all “LIKE” each other, and there are more of them, the Geek Squad all have “LIKES” relationships from the Dweebs – and the cascading popularity boost from Veronica Sawyer down through Betty Finn to the other Dweebs also boosts the Geek Squad PageRank score.

It’s no surprise that poor Martha “Dumptruck” Dunnstock has the lowest PageRank score in the graph, since she’s a true Loner with no relationships to boost her popularity score.

Summary


So there we have it – proof that it’s not just the quantity of your High School relationships but also the quality of them which makes you popular. Hang out with other popular people and it’s Remington parties all the way. Hang out with losers and it’s keggers with kids all year!

That was a great review of the PageRank algorithm, but I have to motor if I’m going to take Armstrong for a walk. I hope you now have a new way to understand PageRank, and can find ways to apply this powerful and useful graph algorithm using your own graphs. Later!


Ready to take your graph analytics to the next level? Click below to get your free copy of the O’Reilly Graph Algorithms book and discover how to develop more intelligent solutions.

Download My Free Copy