GraphGists

Introduction

Every year the postseason tournament of NBA attracts billions of audiences across the world and dollars as well. Gambling companies have their own secret algorithms to calculate the betting rates of each game. A team with a good track of winning would certainly have a better chance to win. But how about teams with mixed histories? I am not a gambler, but I am interested in the algorithms behind.

Problem

Typically to determine the winning rate between two teams, you could look at their historical matches against each other: total wins and total losses; but what if the two teams had never played together before or, more likely, had never played in the recent a few years, especially during the NBA finals when one is from the east and the other is from the west? One way to compare is to calculate their winning percentages: (total wins) / (total games played). However, the winning percentages may not be very comparable considering each team mostly (or sometimes exclusively) played against teams from their own conference.

Solution

With the Path concept in graph databases, not only the traditional statistics could be easily calculated but also new methods could be introduced to link any two teams together through the chain of playoffs. If we consider two teams that had played against each other as linked and this link relationship can be relayed, then all teams that had entered a certain year’s postseason tournament were linked, either directly or indirectly; or all teams that in NBA playoffs are linked if we combine multiple year’s playoffs together. Based on the links, shortest paths can be leveraged to find the closest relationship between any two teams. And if we assign numeric values to those links, any two teams could be compared using measurable mechanism.

Data Model

  • Each NBA team is a node [30 nodes in total]

  • Each playoff round (not a single match but the entire round) is a node [15 rounds per year]

  • The results of each playoff round will be stored in edges; the direction of each edge will be from a Team to a Round, and the number of wins will be stored in that edge [0,1,2,3,4]

Setup

The setup code will load 2013, 2014 and 2015 NBA playoffs results to the graph database as sample data.

NBA playoffs of a single year

Display the entire 2015 playoff result.

MATCH (t)-[]->(p:Playoff)
WHERE p.Year = "2015"
RETURN t,p

Historical playoffs of a team

Display all historical playoffs of a single team.

MATCH (t:Team {Name: "Golden State"})-[w:WIN]->(:Playoff)<-[l:WIN]-()
RETURN t,w,l

Calculate winning percentage

List all historical wins and losses of each team in the past 3 years.

MATCH (t:Team)-[w:WIN]->(:Playoff)<-[l:WIN]-()
RETURN t.Name as TEAM, SUM(w.Win) AS TOTAL_WIN, SUM(l.Win) as TOTAL_LOSS,
(toFloat(SUM(w.Win)) / (toFloat(SUM(w.Win))+ toFloat(SUM(l.Win)))) as WIN_PERCENTAGE
ORDER BY SUM(w.Win) DESC

If 2 teams have met before: Display playoff history between 2 teams

If two teams had played together before, the following query will list out all their matches.

MATCH (t1:Team {Name: "Miami"})-[r1:WIN]->(p:Playoff)<-[r2:WIN]-(t2:Team {Name:"San Antonio"})
RETURN t1,r1,p,r2,t2

If 2 teams have met before: Calculate wins and losses

For two teams that had played together before, their winning percentage could be used to predict the stronger team.

MATCH (t1:Team {Name: "Miami"})-[r1:WIN]->(p:Playoff)<-[r2:WIN]-(t2:Team {Name:"San Antonio"})
RETURN p.Year as Year,r1.Win as Miami,r2.Win as San_Antonio
ORDER BY p.Year DESC

In the above case, San Antonio and Miami had played 12 games in the past 3 years during playoffs, and San Antonio won 7 & lost 5. So San Antonio may have a slighly higher chance to win if they meet again in 2016 NBA final.

If 2 teams have never met before: Simple Example

Find all shortest paths beween 2 teams

MATCH (t1:Team {Name: "Miami"}),(t2:Team {Name:"Portland"}),
	p = AllshortestPaths((t1)-[*..14]-(t2))
RETURN p
MATCH p= AllShortestPaths((t1:Team {Name: "Miami"})-[:WIN*0..14]-(t2:Team {Name:"Portland"}))
WITH extract(r IN relationships(p)| r.Win) AS RArray
RETURN RArray

In the above case, one path [1,4,4,1] indicates the two teams may have an equal chance to win and but Miami shows a little bit advantage in the other [4,3,4,,1]. So we may say that Miami has a better chance to win Portland if they met in the finals.

If 2 teams have never met before: Complex Example

Find all shortest paths between 2 teams

MATCH (t1:Team {Name: "Golden State"}),(t2:Team {Name:"Toronto"}),
	p = AllshortestPaths((t1)-[*..14]-(t2))
RETURN p
MATCH (t1:Team {Name: "Golden State"}),(t2:Team {Name:"Toronto"}),
	p = AllshortestPaths((t1)-[r:WIN*..14]-(t2))
WITH r,p,extract(r IN relationships(p)| r.Win ) AS paths
RETURN paths

The above case is more complicated than the simple example: there are 6 shortest paths with 8 relations in each respectively. So I introduced a concept called average net win, which is the average net win of each path, where [net win] = [total number of wins] - [total number of losses] in one shortest path. When the average net win from team A to team B is positive, team A has a winning advantage over team B.

MATCH p= AllShortestPaths((t1:Team {Name: "Golden State"})-[:WIN*0..14]-(t2:Team {Name:"Toronto"}))
WITH extract(r IN relationships(p)| r.Win) AS RArray, LENGTH(p)-1  as s
RETURN AVG(REDUCE(x = 0, a IN [i IN range(0,s) WHERE i % 2 = 0 | RArray[i] ] | x + a)) //total win
- AVG(REDUCE(x = 0, a IN [i IN range(0,s) WHERE i % 2 <> 0 | RArray[i] ] | x + a)) //total loss
AS NET_WIN

Extension To Current Model

The model depicted above simply uses raw numbers of wins and losses to suggest a Boolean result during comparison. If we want to rank more teams together or establish a more accurate quantifiable rating system, more elements could be introduced to the model - stadium, for instance, could help to incorporate the home court advantage into consideration during the analysis; or assign weights to different rounds in the playoffs, so number of net wins in the NBA Finals could have more significant influence than the ones in First Round during the calculation of average net wins along shortest paths.