Unearthing Historical Networks Using Neo4j and Archival Documents

Editor’s Note: This presentation was given by Corey Clawson at NODES 2019 in October 2019.

Project Goals

This project is an outgrowth of my graduate studies, which looked at archipelagic and archive studies, and analyzed what they have in common.

Initial goals for this project included the following:
    • Offer a means of considering/modeling artistic influence.
    • Consider the cultural flows of concepts of sexuality and gender as well as artistic strategies.
    • Map artistic communities’ formation via processes of migration and urbanization.
Eventually, what I arrived at is that the connectedness and dispersion of those fields went hand-in-hand with Neo4j’s flexible structures.

In this way, this is an exercise that looks back at the historical documents that exist; connects people through letters and translations; and explores what networks emerge, how artists, writers, and composers wrote to and influenced one another. It also delves into how over the course of the 19th and 20th centuries, certain cultural centers emerged as people moved to urban communities and cities like Paris.

Here’s a mock-up we’ve put together for grant applications:

We hope for this project to eventually be public-facing, and to serve as an opportunity for the public to look at queer histories, folks, and communities.

Throughout this project, Neo4j offered opportunities for us to explore histories in terms of the letters, translations, and various flows through which this information and art traveled the world.

We started by examining similar projects that are out there. Six Degrees of Kevin Bacon takes a similar approach of looking at letters and connections by examining how Elizabethan and 17th-century folks were connected through letters and other items. Similar projects were done on the French Enlightenment, as well as on thinkers in the 19th and 20th century, including Peruvian poet César Vallejo.

Project Stages

The first step of our process was to identify the figures we wanted to look at and locate the materials to parse. In this phase, we entered the data into Neo4j and made decisions as to how we wanted to structure the data through labels and properties.

Eventually, we’d like to adapt this and make it a tool for the public to explore history on a large-scale level. We also hope to receive funding as we scale up this project.


A note on our sources: We looked at works of literature, published letters, translations, as well as adaptations of one another’s work. For example, someone might write a poem that becomes an opera, or someone might adapt the life of a writer into a musical. These types of adaptations within queer communities are ways we can measure the influence of one writer upon an entire historical lineage.

Archives 101: Arbiters/Enforcers of History

Archives are our arbiters and enforcers of history. There are a lot of political questions that go into what gets archived, as they take up resources and cultural capital (for example, who is a poet that’s major enough to have their paper saved for centuries?).

One way we have access to these archives without having to obtain permission from literary estates and visiting them ourselves is by employing public information found on a finding aid.

These finding aids are organized in various different ways. Some have minimal contents, maybe just an outline of what papers exist, which might only include the date and a person they wrote to. Sometimes, you don’t even have a full name, so it’s a process that can’t be entirely computerized:

On the other hand, you have more detailed projects like the Walt Whitman Archive online where you have a lot of information, including the date, where it was written, and annotations about the people mentioned in these letters.

Above on the right, you’ll see a letter from Bram Stoker, who wrote Dracula later in life. As shown, Whitman responded to his fan letter when he was younger.

Initial Thoughts and Revising

Finally, this is how we’ve organized our project:

You’ll see certain properties and types of relationships. We have PARTNER_OF, LETTER_TO, TRANSLATED and literary criticism of one another (LIT_CRIT_ON). We’ve moved away from all of these being housed as properties of a given person or letter to adding more nodes and labels.

Below, you’ll see three of our outputs, including letters we’ve collected to and from Walt Whitman:

We also have documents pertaining to Whitman held by the Beinecke Library at Yale, which is a really significant source for queer writers, authors, and their papers:

Finally, we have mentions of Whitman’s “Calamus,” one of the more significant and foundational works in literature you see people mentioning in letters:

Bigger Questions to Consider

Through this project, we hope to consider bigger questions like how race and gender are segregated within these communities, and if we can quantify the centrality of certain figures and recover lost relationships through, for example, token recognition.

A big thanks to my partner at StudioSucreBleu; Andrew Pankratz, who has been a fantastic technical resource; and my professors who’ve encouraged and pushed this project in different directions.

Want to learn more about what you can do with graph databases? Click below to get your free copy of the O’Reilly Graph Databases book and discover how to harness the power of connected data.

Download My Free Copy