Why Jay-Z Shouldn’t Drive Your Recommendations: The Intuition Behind the Jaccard Coefficient
Sr. Manager, Technical Product Marketing, Neo4j
4 min read

Do Jay-Z and I have a lot of friends in common? Instagram might tell you that we do. After all, many of the people who follow me on Instagram also follow Jay-Z. So logically, someone who follows Jay-Z might also be interested in following me.
Swiss botanist Paul Jaccard and his Jaccard Coefficient would beg to differ. Not an avid hip-hop listener, Jaccard was instead into studying different plant species in the Alps and Jura Mountains in the early 1900s. He wanted to answer a very basic question: “How similar are these places in terms of the plants that grow there?”
But simply counting plants wasn’t enough. One mountain might have more species overall because it’s larger, lower, or easier to reach—not because it’s meaningfully different. Jaccard realized that raw totals blurred the comparison he actually cared about. What mattered wasn’t how many species lived in each place, but how much their ecosystems overlapped.
And from that rather mundane question, he derived an incredibly impactful algorithm. Jaccard realized that he could get a good sense of how similar the ecologies of these two mountains were by counting the number of species they had in common (the intersection) and dividing it by the total number of species in both locations (the union). That led to this now-famous equation:
And with that simple insight, he discovered a stable way to compare sets—and earned himself a permanent spot in statistics textbooks.
So what do we mean by “quality” relationships? Let’s return to our Jay-Z example. On Instagram, if you follow Jay-Z, what’s the likelihood you’re friends with one of his followers?
Get the Free eBook!
Learn the math and intuition behind five of our most popular graph algorithms with “A Practical Introduction to Graph Algorithms”.
Almost zero. He’s just too popular.
If Instagram recommended people to you simply because you both followed Jay-Z, your feed would be flooded with millions of strangers. We need a way to modulate the influence of someone like Jay-Z in this network—a way to keep his popularity from overpowering the signal. Enter our now-familiar friend, the Jaccard Coefficient. Consider the graph below, where each node represents a person, and each edge represents a connection. We would like to know: who am I actually similar to, based on who I’m connected to?
Think back to that algorithm:
- The numerator is the number of neighbors A and B have in common (the intersection).
- The denominator is the total number of neighbors that both A and B have (the union), whether they overlap or not.
So, for our example, Ted and I have an intersection of one: Jay-Z. Next comes the union: every one of us is either connected to. That includes Ted, Jay-Z, me, and Ted’s extra connection, for a total of four. One shared neighbor divided by four total neighbors gives a Jaccard similarity of 1/4, or 0.25.
Now for Jay-Z. We have an intersection of one again (Ted). But the union is much larger this time: me, Ted, and Jay-Z’s five other connections, for a total of eight. One divided by eight gives a similarity of 1/8, or about 0.125.
So, when Instagram decides to recommend a new friend to me, naturally, they will recommend that mystery person Ted is friends with rather than one of Jay-Z’s many followers.
This is why people reach for Jaccard in the first place. It keeps popularity from overpowering everything else. You don’t want recommendations built on big outliers like Jay-Z. In the same vein, an online retailer wouldn’t want to recommend a random product to a teacher just because a ‘super-shopper’ who buys everything under the sun also happened to buy pen and paper. That overlap doesn’t mean much. Instead, the retailer wants to find other shoppers with similar profiles and recommend products that are more relevant, such as whiteboard markers.
Jaccard helps businesses recommend products to customers with similar profiles. As much as it saddens me to say, I am much more similar to Ted than Jay-Z.
Wherever your data lives, Neo4j Graph Analytics makes it easy to put these ideas into practice. In fact, we even have a follow-along blog for using the Jaccard Coefficient to power better recommendations! With Graph Analytics for Snowflake and Graph Intelligence for Microsoft Fabric, you can deploy algorithms like the Jaccard Coefficient directly on your existing data to power better recommendations, segmentation, and decision-making—without ETL or infrastructure overhead.




