Graph of a musical groups' albums, songs and lyrics


The Idea

Being the dad of a teenage daughter means I listen to a lot of the current music. Lady Gaga, Taylor Swift. Recently is all about One Direction. As “” recently said “One Direction owns the internet in 2015. Sometimes I hear “this is a sad song” or “this is a happy one”. What could I learn about their music using Neo4j? Could one derive any sort of sentiment from the lyrics? Could I get my daughter interested in this? Only one way to find out…​

How to start

The first step was to learn more about the group. There are currently four members but for most of their albums there were five. Harry Stiles, Niall, Liam, Zayn and Louis.They have released five albums, Four, Take me home, Up all night, Midnight memories and Made in the A.M. With the help of my daughter we found a site that had the lryics to all of the songs. What I found was that while some of the song files contained information about who was singing what section, many did not. I was hoping that maybe the sentiment could be aided by knowing the singer. Maybe Harry always sings sad/ break up songs(he did date Taylor Swift). Since this information isn’t consistent I couldn’t count on it.

Song sentiment ?

I felt it was important to have the ability to track lyrics by location in the song, row and column. This way one could query “what words appear the most often at the start(0,0) of a song? How often do certain word combinations( “I” and “you”) appear on the same line? This last question could be useful in better understanding sentiment?


Tools: Python, py2neo, R and RNeo4j.

The Model

The first step was to organize the songs into files by album. Once this was done it was simple to get Python to read in a list of albums, songs titles, and lyrics(words). The graph…​

I decided that a Group node would refer to a band or singer. A group would be made up of members and members were artists. For bands this is fine. I made the choice to treat single acts the same as way. So Lady Gaga or Taylor Swift would be a considered a group,member and artist.


  • Group

  • Member

  • Artist

  • Album

  • Song

  • Lyrics


  • Album BY Group

  • Lyric IN Song

  • Song ON Album

  • Member ISA_ARTIST Artist

  • Group HAS_MEMBER Member


For the gist I restricted the data to one song per album and reduced the lyrics by two thirds. Even with this there are still 581 lyric nodes. There are 232 unique words. The difference is due to words being repeated but in different locations. The word “you” is found 28 times in the five songs

Find all songs where the word "my" appears

MATCH (l:Lyric{name:"my"})-[r0:IN]-(s:Song) RETURN,l.row,l.column

Show distinct lyrics in the song "If I Could Fly"

MATCH (n:Lyric)-[r0:IN]-(s:Song{name:"If I Could Fly"}) RETURN distinct (
MATCH (l:Lyric)-[r0:IN]- (n:Song) where =~ "(?i)said"  RETURN n,l

Show all lyrics in Act My Age.


Show all artists and members for the group


Show all songs on all of the albums. For the gist there is only one song per album.


Show all albums and members for the group


Show all of the lryics for the song "Kiss you". There are some connections of lryics to other songs. This is becuase those lryics are used in the same location. The lryic "Baby" is used in "Kiss Me" and "What makes you beautiful" in the same row and column.


A query to find songs where the words ‘I’ and “you” are on the same line. The query works well in Python since I can filter out return values of 0. This type of search will be help when looking for phrases, words on the same line.

match (l1:Lyric{name: 'I'}) --(s:Song)
match (l2:Lyric{name :'you'}) --(s:Song)
return case  when l1.row = l2.row then [l1,l2,s] else 0 end


Song Act My Age










Actual line, row 3 :"I can count on you after all that we’ve been through"

If I Could Fly










Actual line, row 5 :"I hope that you listen 'cause I let my guard down"

Sentiment and R

While not an R expert, I found examples to help make a start.

Below is a bar chart of the top ten most common lyrics. “I” and “you” are popular.


Sentiment The last thing to consider is sentiment. Using the simple process of positive and negative words I’d like to see if one make a determination of sentiment. There isn’t a song word list that I could find so I elected to use the AFINN list. Following examples from Jeffrey Breen and Andy Bromberg I was able to get some results. I didn’t divide the songs up into training and test sets, instead I picked two songs and processed them. My daughter suggested that “Best Song Ever” would be happy and “If I could Fly” would be sad.

The process start with a query:

graph = startGraph("https://localhost:7474/db/data/") query = "MATCH (l:Lyric) -[r0:IN]-(n:Song{name:'best song ever'}) RETURN"

ta = cypher(graph, query)

This returned a list of lryics. Next I counted the number of lyrics that matched a positive or negative word in the AFINN list. I classified the words into “reg”, scale 1-3 and “very” scale 4-5 for both positive and neg.

Using R functions naiveBayes() and predict(). The method is very simple but the results do follow that Best Song Ever “happier” then If I Could Fly. It would be good to get One Directions opinion on this.

“Best Song Ever” reg very positive 10 3 negative 3 0

“If I Could Fly” Reg very positive 1 0 negative 4 0

One thing I noticed is that simple word matching isn’t sufficient.For movie reviews or emails this may work. Song are more complex.

Example. A happy song might have the line “I love you” while a sad song might have a line “I used to love you”. Both have the positive word “love” in them but the second line could be viewed as sad, love lost. This is where querying lyrics on the same line could help. Its more complex than matching positive and negative words.

Conclusion This was fun and I got a little Father daughter time in as well. I’d like to pursue this to see what can be done by considering phrases and connected words.