By Aileen Agricola | April 7, 2016
Excerpt from article published by Datanami
After reconstructing the database schema, the ICIJ used the open source extract, transform, and load (ETL) tool from Talend to load the data into Neo4j. The graph database consisted of the names and addresses of the 215,000 business entities that Mossack Fonseca had created, as well as the names of the principles and intermediaries involved. Each entity had at least three individuals attached to it, resulting in a graph with about a million nodes. It’s not huge by graph database standards, but plenty big enough to require big data tech.
Once the database was set up, it was a simple matter to install and configure Linkcurious to essentially provide a GUI (graphical user interface) atop the database. Having the visual depiction of the graph of names and addresses was critical in making sense of the data, especially for non-technical reporters.
“That was really key,” Cabra said. “Sometimes when you search documents, you don’t see patterns of who’s connected to who. It’s very difficult. Our brains are not wired to visually see graphs.”
While some of ICIJ’s reporters are tech-savvy, many of them are not. But that didn’t really matter, because the Linkcurious interface is so easy to use, Cabra said.
“A lot of reporters actually thought that Linkcurious was doing magic,” she said. “They said ‘Oh my God, now I can see it so clearly. This person is connected to this person and I had missed this connection before.’ It was very good to actually see how people connected among themselves and to the companies.”
More sophisticated users could submit queries directly against the graph database using Cypher, Neo4j’s query language. But most opted to use the GUI.
“With Linkurious, I have to say, everybody understands how to connect dots basically,” Cabra said. “Without graph visualization, seeing that would have been very labor intensive….The reality is, when we’ve done that before we’ve missed things. With this, everything is grouped together, and you just have to click on the dots, so to speak, to see who’s behind it.”