GraphGists

Introduction

After attending plays around Seattle, I became curious about the relationships between theater companies, plays, actors, directors, and crew. This GraphGist starts to explore some of that data, using the production history of the Seattle Shakepeare Company.

Model

Model

  • Person: represents a person! Unique by name for this simplified model, may have relationships like "Wrote", "Directed"

  • Play: represents a written play - a Person WROTE the Play, which may be STAGED in a Production. Properties might include year written, year published, and genre.

  • Production: represents a performance or group of performances of a given play. The Production will generally be DIRECTED by a Person, and may include properties indicating the start and end year and month, and whether it was performed as part of a tour or in parks.

Graph Visualization

Queries

Care to guess which play has been performed the most?

MATCH (prod:Production)-->(play:Play)<-[WROTE]-(w:Person)
WHERE w.name = "William Shakespeare"
RETURN play.name, count(prod.title) as n_productions
order by n_productions desc

There have been 11 productions of Romeo and Juliet! Probably not a big surprise that this one tops the list - let’s take a look at those productions:

It looks like Romeo and Juliet is a reliable touring production. When I remove touring productions & park performances, how does the list of top plays change?

MATCH (prod:Production)-->(play:Play)<-[WROTE]-(w:Person)
WHERE w.name = "William Shakespeare"
and (prod.onTour = False and prod.inParks = False)
RETURN play.name, count(prod.title) as n_productions
order by n_productions desc

For the Shakespeare plays, I’m also curious about genre - how often are comedies performed compared to histories?

MATCH (prod:Production)-->(play:Play)<-[WROTE]-(w:Person)
WHERE w.name = "William Shakespeare"
RETURN play.genre, count(prod.title) as n_productions
order by n_productions desc

Which Shakespeare plays haven’t yet been performed by the Seattle Shakespeare Company?

MATCH (writer:Person)-[WROTE]->(play:Play)
WHERE writer.name = "William Shakespeare"
WITH play
MATCH (play)
WHERE NOT (play)<-[:STAGED]-()
RETURN play.name, play.genre

Only eight!

Taking advantage of neo4j’s flexible data model

There are several cases where a production was "adapted", "conceived", or "compiled", rather than written - Henry IV parts I and II were combined into a single performance, scenes from several Shakespeare plays were peformed in a single production, etc - and flexible schema of neo4j allowed for unique relationship labels to preserve those edge cases:

MATCH p=(person:Person)-[r]-(play:Play)
WHERE ANY(x in relationships(p) WHERE type(x) <> "WROTE")
AND ANY (x in relationships(p) WHERE type(x) <> "DIRECTED")
RETURN person.name, type(r), play.name;