By Aileen Agricola | August 28, 2012
Matthew Jockers of the University of Nebraska analysed 3,592 digitized novels published in the UK, Ireland and the US between 1780 and 1900 using a combination of Google’s algorithm, machine learning and a series of techniques used in computational text analysis including stylometry, corpus linguistics and network analysis.
Network analysis allowed Jockers to visualize the thematic distances between each novel. “Networks are constructed out of nodes (books) and edges (distances). When plotted, nodes with less similarity (i.e. with larger distances between them) will spread out further in the network,” he explains in a paper detailing his methodology. By generating different visual models of the network it was possible for Jockers to witness the ebb and flow of certain popular themes evolving over the century. The above photo depicts the links between the network’s nodes according to author gender.
“This visualisation reveals that works by female authors (coloured light gray) and male authors (black) are more stylistically and thematically homogeneous within their respective gender classes,” writes Jockers. “As a result of this similarity in ‘signals’, female-authored books cluster together on the south side of the main network, while male-authored books are drawn together in the north.”
It revealed a few interesting anomalies, such as the fact that Harriet Beecher Stowe’s 1852 Uncle Tom’s Cabin shares more similarities with novels written by male authors than by female.
Ultimately, Austen and Scott both came out on top, with Jockers referring to them as “the literary equivalent of Homo erectus or, if you prefer, Adam and Eve”. However, since both writers were active towards the beginning of Jocker’s chosen timeframe, it was impossible to get a good view of who influenced them. Widening the timeframe would provide more details as to the source of the two writers’ appeal.
Read the full article.