Using Neo4j, the ICIJ has built upon their Pulitzer Prize-winning investigation of 2016 – the Panama Papers – and they’ve begun to add politicians featured in early Paradise Paper reports to their Offshore Leaks Database.
The new 1.4 TB of data – 13.4 million documents – includes information leaked from trust company Asiaciti and from Appleby, a 100-year-old offshore law firm specializing in tax havens as well as information leaked. The files were obtained by German newspaper Süddeutsche Zeitung and shared with Washington D.C.-headquartered ICIJ, a network of independent reporting teams around the world.
As in previous investigations, Neo4j plays a key role in revealing the connections between the wealthy, their money and the taxation-friendly countries in which it resides.
The reason: Graph databases excel at managing highly connected data and complex queries.
Instead of using tables the way a relational database does, graphs use special structures incorporating nodes, properties and relationships to define and store data, making them highly proficient at analyzing the relationships and any interconnections between data — and allowing journalists to “follow the money” easier than ever.
Unprecedented Volumes of Highly Connected Data
Pierre Romera, chief technology officer of the ICIJ, told Business Insider: “Most of the leaks we get are not structured since they are raw documents.
“With the Paradise Papers, those documents represented 1.4 TB of data and were gathered from different sources. Putting them in a single database was a challenge for us. With Neo4j and [visualisation tool] Linkurious, and after a few weeks of research, we were able to propose to our 382 journalists a way to explore the data and also to share visualisations from stories they were working on. It’s surprising how intuitive a graph database can be for non-tech savvy people. Thanks to this approach, we could both investigate and prepare the future releases.”
According to Mar Cabra, the ICIJ’s Data and Research Unit Editor, using Neo4j was the only solution available to meet her requirements when they broke the Panama Papers investigation last year.
“It’s a revolutionary discovery tool that’s transformed our investigative journalism process,” she said, “because relationships are all important in telling you where the criminality and secrecy lies, who works with whom, and so on. Understanding relationships at huge scale is where graph techniques excel.
“At least 11.5m documents, and far larger than any data leaks we have investigated before, we needed a technology that could handle these unprecedented volumes of highly connected data quickly, easily and efficiently.
“We also needed an easy-to-use and intuitive solution that didn’t require the intervention of any data scientist or developers, so that journalists around the globe would work with the data, regardless of their technical abilities. Linkurious Enterprise was the best platform to explore this data and to share insights in a secure way. Using the Linkurious graph visualization platform with Neo4j is a powerful combination,” she added.
According to Neo4j Co-Founder and CEO, Emil Eifrem “Whatever else we can be sure of as the Paradise Papers’ investigation unfolds, it’s only with world-class tools like Neo4j and Linkurious that world-class investigation of vast and complex datasets like this can happen in our Age of Connections.
“Graph databases are the only option when trying to make sense of the vast terabytes of connected data that we are producing more and more of, and they are an essential tool for international agencies, governments, financial services and security firms trying to uncover the truth.”
Stay Tuned for More Coverage of the Paradise Papers
In the coming days and weeks, the Neo4j team will continue to unveil how graph technology powered the Paradise Papers investigation, including an in-depth look at the ICIJ data model with example queries, graph visualizations and more.
In the meantime, continue to follow the ICIJ’s Paradise Papers coverage exploring the political and economic dimensions of the investigation as they continue to unfold.
Read the White Paper