SumoDB in Neo4j: Graph Analytics of Grand Sumo
Senior Developer Advocate at Neo4j
5 min read

On a recent trip to Japan on a beautiful spring day in March I found myself wandering the streets of Osaka looking for something to do in the afternoon. Just my luck, the famed Grand Sumo Tournament happened to be in town for the Haru Basho — Spring tournament. As with any graph enthusiast, as I watched I wondered what kind of graph model could be built based on the unique tournament structure.

Modeling Grand Sumo as a Graph
The tournament structure is quite interesting. It consists of 6 Tournaments a year in various cities across Japan. Each tournament is 15 days with 6 different divisions split into East & West camps, the top two divisions being Juryo (2nd highest) and Makuuchi (top division). Juryo has 28 wrestlers or rikishi at any given time and Makuuchi consists of the best 42 rikishi in Sumo.
Each division has various rankings within them. Juryo has 14 ranks for each East and West. Makuuchi has 8–12 rikishi labelled san’yaku which are the top best 4 ranks of Yokuzuna, Ozeki, Sekiwaki and Komusubi each with a East and West distinction. The remaining 30–34 rikishi of Makuuchi are the “rank-and-file” Maegashira 1–17 also with East vs West subdivision.
A given match between two wrestlers always has a single winning move/technique called a kimarite. Uniquely, there are no weight classes in Sumo wrestling so rikishi can range anywhere from 90kg to 200kg or more in the same bout. This means weight can play a dominate factor in the outcome of a match, however speed and agility also play important roles. The rikishi with the best record in their division is the winner after 15 days, with playoff matches only used to break ties. Based on the structure of the tournament one possible way to model this system can be seen below:

The Data of Grand Sumo
As with most of our world these days, a vast amount of data can be found online. For the purposes of this blog all of the data was sourced from https://sumodb.sumogames.de/Default.aspx. We used python to extract HTML parsed data and transform it in to Cypher statements for use in Neo4j. For this blog, we are going to focus on the top 3 rikishi and all of their historical bouts from Juryo and Makuuchi. A sample of the data as a graph can be seen below:

Graph Algorithms and Bloom
The latest Neo4 Desktop App can combine the power of graph databases with graph visualizations in Bloom using Algorithms built on Graph Data Science. These can enable new insights from a 1000 ft view that otherwise might be buried in a series of tables. Below is a look at all rikishi who have ever fought Yokozuna East — Hoshoryu, Yokuzuna West — Onosato and Ozeki East — Aonishiki. We see which rikishi have defeated others with thicker lines indicating the amount of bouts shared between them during their time in Juryo and Makuuchi.

Suppose we want to analyze the most influential rikishis who connect the graph the most. In this case, it is those rikishi who have battled most often between Hoshoryu, Onosato and Aonishiki. Using the GDS plugin with Bloom we can overlay or project various algorithms’ scores to answer this question. One such algorithm is the Eigenvector Centrality Measure shown below:

Focusing in on the top 10 scores for our Algorithm we find the 10 rikishi most influential in our graph can be seen below:

One insight that jumps out is that rank doesn’t always precipitate connection among the top ranked rikishi. Despite Aonishiki being the top Ozeki -East, his centrality score is quite a bit lower than many of the other rikishi in the dataset. This can be explained by the fact that Aonishiki is one of the youngest and fastest ranking rikishi in history. In fact, had Aonishiki won the Haru Basho in Osaka this year, he likely would have been the fastest rikishi to reach Yokuzuna Rank ever at the age of 21, with his debut Basho being the July Tournament of 2023. Sadly, his race to Yokuzuna was foiled with his first professional make-koshi, a losing tournament record, of 7 wins and 8 losses. This means he will need to win or be runner up in two back to back tournaments to be considered for promotion to the highest rank of Yokuzuna.
Conclusion
In summary, we explored some of the intricacies of modeling Sumo wrestling using Graph Databases like Neo4j. We showed how tools such as Bloom and Graph Data Science Algorithms can be used to analyze the bigger picture without getting into the weeds of tables and pesky joins and counts. And we found that sumo rank doesn’t always mean you are the most central component in the sea of rikishi.
If you enjoyed this blog, feel free to like, follow, and share with others. If you would like to generate SumoDB in your own Neo4j instance, you can find the Cypher code here. Next time, we will explore how to analyze this type of model at scale in Snowflake using the Neo4j Graph Data Science Native Snowflake App.
SumoDB in Neo4j: Graph Analytics of Grand Sumo was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.








