# Super (Data) Model: Graphing “RuPaul’s Drag Race”

It’s that time of year, y’all… “RuPaul’s Drag Race” is back! At least in the UK… and, at least, that’s what I’ve been told.

I’m a newcomer to the “Drag Race” party, you see. I’ve always liked RuPaul but, in general, I avoid reality television programming and never managed to get into past seasons of “Drag Race.” This year is different though – I have some new folks in my social circle and they seem to talk about “Drag Race” all the time. My rapid descent into “Drag Race” fandom was unavoidable.

Rather than “dragging” my heels and moaning about it, I’ve decided to use this as a teachable moment. These new friends of mine have no idea what a graph database is, and so it’s very hard to convince them that I spend my weekdays actually working instead of watching old teen films and walking my dog.

What better way to explain Neo4j than to graph “Drag Race” for them! I’m up to the task – I have the Charisma, Uniqueness, Nerve and Talent to make the best “Drag Race Graph” you done ever seen.

Ready? ‘Cause if you want to understand graph databases, you better work!

To get started, I searched around the internet for data about the show and found a recent data science effort called RuPaul-Predict-a-Looza, which tries to build analytical models to predict the winners and losers of upcoming “Drag Race” episodes before they air.

I was relieved to find that I wasn’t the only data geek to be looking at the show in this way! Their dataset was exactly what I was looking for, so I loaded it into Neo4j.

Here’s the data model I came up with:

```CALL db.schema()
```

The structure of the graph really focuses on two primary types of nodes: `Contestants` and `Episodes`.

We have some information about where `Contestants` come from – their `Home Towns` and their `Home States`.

We can see which `Season` each `Contestant` appeared in, as well as their season `Ranking` (with a ranking of ‘1’ being the season’s winner). Each `Season` has a number of `Episodes`, and each `Episode` has a `Type` (`Casting`, `Competition`, `Finale`, `Recap` or `Reunion`).

Each `Competition Episode` also has a `Maxi-Challenge Type` (`Comedy`, `Personal Branding`, `Acting`, etc.), which tracks what kind of main challenge the `Contestants` had to face in that `Episode`.

Between `Contestants` and `Episodes`, we have a number of relationship types that describe the outcome for each `Contestant` in the `Episodes` in which they appeared. We can see who `Won` an `Episode`, who was in the bottom two but not eliminated, who was eliminated, etc.

Now that we understand how our graph is structured, we can have a look at the data and explore the graph in more detail. For instance, I have it on good authority that Season 4 is really “Drag Race” at its best.

An overview graph for this season looks like this:

```MATCH (c:Contestant)-[ais:APPEARS_IN_SEASON]->(s:Season {number: 4})-[he:HAS_EPISODE]->(e:Episode)
MATCH (c)-[hsr:HAS_SEASON_RANKING]->(r:Ranking)
MATCH (e)-[ht:HAS_TYPE]->(et:EpisodeType)
OPTIONAL MATCH (e)-[hmct:HAS_MAXI_CHALLENGE_TYPE]->(mct:MaxiChallengeType)
RETURN *
```

We can see that there were 11 `Competition Episodes` in this `Season`, along with a `Recap Episode`, a `Finale Episode` and a `Reunion Episode`.

There were three `Episodes` with an `Acting` challenge and two with a `Sewing` challenge, while the rest of the `Episodes` in this season had distinct types of challenges: `Singing`, `Personal Branding`, `Makeover`, etc.

There were 13 contestants in this season; and since Alissa Summers was eliminated first, she has a `Ranking` of 13. There were two runners up in Season 4, each with a `Ranking` of 2 – Phi Phi O’Hara and Chad Michaels. The winner of Season 4 was Sharon Needles, and if we look at her graph in more detail we see the following structure:

```MATCH (hc:City)<-[ht:HOMETOWN]-(c:Contestant {name: 'Sharon Needles'})-[r]->(e:Episode)<-[he:HAS_EPISODE]-(s:Season)
MATCH (c)-[hs:HOME_STATE]->(st:State)<-[i:IN]-(hc)
OPTIONAL MATCH (e)-[hmct:HAS_MAXI_CHALLENGE_TYPE]->(mct:MaxiChallengeType)
RETURN *
```

Sharon hails from Pittsburgh, Pennsylvania and according to her she looks spooky but is really nice. Before being announced as the winner of Season 4: Episode 14, she won four episodes – with a `Sewing` challenge, a `Commercial` challenge, an `Acting` challenge and a `Ball` challenge. Her strength seems to be `Acting` challenges, since she won one and was in the `High Group` (up for consideration as the winner) for another. She seems to have had a more difficult time with her `Makeover` challenge (she was in the `Low Group`, for consideration in the bottom two) and her `Singing` challenge (she was in the bottom two and had to lip sync for her life).

I’m curious how contestants from my `Home State` have fared.

I was born in Connecticut, but I lived in New York for so long that I consider it home. I’m proud to say that New York has produced the most “Drag Race” `Contestants` of any state (28 in total, 25 from New York City) including three winners! You go, New York!

```MATCH (r:Ranking)<-[hsr:HAS_SEASON_RANKING]-(c:Contestant)-[h:HOMETOWN]->(ht:City)-[:IN]->(:State {name: 'New York'})
RETURN *
```

In the spirit of the RuPaul-Predict-a-Looza, I’m also curious to see what kinds of insights we can get from our graph.

One common graph use case in industries like retail, telecom and other B2C and B2B business models is Customer Journey Analytics – looking at a series of events or actions by a customer and seeing if there’s a pattern we can use to predict outcomes.

For example, it might be that there are patterns of events in a customer’s history – purchases, complaints, account changes, social media posts, etc. – which indicate that they are about to “churn” or take their business elsewhere.

I wonder if there’s a pattern of contestant results that’s common to winners of each “Drag Race” season.

Let’s look at the results from the first three challenges each season winner faced, and see if there are any commonalities:

```MATCH (:Ranking {position: 1})<-[:HAS_SEASON_RANKING]-(c:Contestant)-[result]->(e:Episode)
WITH c, result, e order by e.number
WITH c, collect(type(result)) as resultTypes
WITH c, collect([resultTypes[0], resultTypes[1], resultTypes[2]]) as firstThree
RETURN firstThree, count(c) as frequency, collect(c.name) as contestants
ORDER BY frequency DESC
```

Of the 11 season winners, we can see seven of them each had their own unique patterns of results in the first three episodes of their seasons. Two of them – BeBe Zahara Benet from Season 1 and Bob the Drag Queen from Season 8 – were both `Safe` in Episode 1, were both `Safe` in Episode 2, and both `Won` Episode 3 of their respective seasons.

We can also see that another two winners – Bianca Del Rio from Season 6 and Violet Chachki from Season 7 – both `Won` Episode 1, were in the `High Group` in Episode 2, and were `Safe` in Episode 3 of their respective seasons. While not hugely significant from a statistical standpoint, these shared patterns of winners’ results are certainly interesting.

If we look for this pattern in the `Contestants` of “Drag Race” UK Series 1, hoping to predict the winner, we might put our money on Divina De Campo. She fits the first pattern above and was `Safe` in Episode 1, `Safe` in Episode 2 and `Won` episode 3 (with a fierce Bowie-esque look made from plaid plastic carrier bags).

You heard it here first!

Now it’s time for me to sashay away, though I’ll be back to graph another day.

If you followed along with today’s blog post and have even more ideas about how to use our Drag Race Graph, then condragulations – you’re officially a Graphista! If not, then I’m sorry, dear, but you’re up for elimination (just kidding, sort of).

Either way, remember: If you can’t love yourself, how the hell are you going to love somebody else? Can I get an amen?

[Graphs are everywhere – even on the runway! This is another example of a knowledge graph, and there are so many ways we could further expand and enrich it – with social media data, for example, or information about how the contestants got along with each other (or didn’t) during the season. Knowledge graphs can be used to represent any information domain, from tea to “Drag Race” to engineering data from NASA. The sky’s the limit!]

Want to learn more about graph databases and Neo4j? Click below to register for our online training class, Introduction to Graph Databases and master the world of graph technology in no time.