The International Consortium of Investigative Journalists (ICIJ) revealed the financial secrets of 35 current and former world leaders, more than 330 politicians and public officials in 91 countries and territories, and a worldwide roster of criminals titled “The Pandora Papers” in early October 2021.
ICIJ’s investigation into the Pandora Papers – named after the first mortal woman in Greek mythology and the more well known “Pandora’s box” – reveals the financial secrets of many global power players who, instead of collectively working to take down offshore systems, have been benefiting from these systems by hiding their assets in covert companies and trusts.
The Pandora Papers are ICIJ’s latest investigation in a series of three, following the Panama and Paradise Papers. With the help of Neo4j’s graph databases, ICIJ was able to ingest, break down, and analyze the data they uncovered and report on their findings.
If you are unfamiliar with the three investigations, here is a brief synopsis:
The Panama Papers took the world by storm five years ago in one of the “biggest leaks and largest collaborative investigations in journalism history,” ICIJ reported.
The organization centered its 2016 investigation around more than 11.5 million financial and legal records that exposed a system of crime and corruption across a number of countries around the world, hidden by offshore companies.
With the help of Apache Solr and Tika, the journalists uncovered the metadata of offshore tax documents of the world’s richest – and most corrupt – elite. They then connected the information they extracted in Neo4j, created a graph, and made it visually accessible through the Linkurious visualization tool.
The Panama Papers key findings:
- The offshore holdings of 140 politicians and public officials from around the world
- Current and former world leaders exposed include the prime minister of Iceland, the president of Ukraine, and the king of Saudi Arabia
- More than 214,000 offshore entities appear in the leak, connected to people in more than 200 countries and territories
- Major banks have driven the creation of hard-to-trace companies in offshore havens
This investigation also includes far more files and documents revealing information about “U.S. citizens, residents, and companies than previous ICIJ investigations – at least 31,000 of them,” the story continued.
The Paradise Papers key findings:
- Reveals offshore interests and activities of more than 120 politicians and world leaders, including Queen Elizabeth II and 13 advisers, major donors, and members of U.S. President Donald J. Trump’s administration
- Exposes the tax engineering of more than 100 multinational corporations, including Apple, Nike, and Botox-maker Allergan
- Reveals tax haven shopping sprees by multinational companies in Africa and Asia that use shell companies in Mauritius and Singapore to reduce taxes
- Shines a light on secretive deals and hidden companies connected to Glencore, the world’s largest commodity trader, and provides detailed accounts of the company’s negotiations in the Democratic Republic of the Congo for valuable mineral resources
- Provides details of how owners of jets and yachts, including royalty and sports stars, used Isle of Man tax-avoidance structures
The most recent investigation consists of a 2.94 terabyte treasure trove of data exposing the financial secrets of wealthy elites, politicians and public officials, current and former country leaders, celebrities, criminals, members of royalty, and leaders of religious groups worldwide.
What makes the Pandora Papers different from the Panama Papers and Paradise Papers is that the leak came from 14 different firms with documents in multiple languages, indicating a complex global system of tax evasion and financial secrecy.
ICIJ enlisted the help of over 600 journalists from 150 media outlets across 117 countries and took more than a year to structure, research, and analyze the data.
The Pandora Papers are the largest investigation in this ICIJ series and utilizes Neo4j technology to better understand and analyze the data.
How Did ICIJ Use the Neo4j Graph Data Platform?
The first step was to procure the data, a process that took journalists months, as there was a deluge of it – a whopping 2.94TB in all, including 11.9 million documents.
The ICIJ uses an open source stack consisting of their datashare platform, machine learning (ML) toolkits in Python, and Neo4j and Linkurious as their graph visualization and analytics tooling.
An analysis of the procured data showed that only 4 percent was structured, and therefore, easily searchable. The remaining 96 percent of the data was unstructured and consisted of PDFs, emails, Word documents, images, and other formats. To make unstructured data searchable, it needed to be converted to structured data. Depending on the data type, different mechanisms were utilized:
For PDFs and/or document files, Python was used to automate data extraction and structure the data. For other data types, the ICIJ used ML and other tools, including Fonduer and Scikit-learn software, to identify separate forms and longer documents.
Once converted, data was loaded into Neo4j in the form of entities and relationships, and ICIJ journalists used Linkurious to generate visualizations. Now, the data was searchable allowing reporters to explore and investigate all the complex direct and indirect connections between companies and people across the 14 different offshore firms.
For added context, ICIJ journalists created a knowledge graph by bringing in external datasets like public records, sanctions lists, and previous leak information. This helped pinpoint and paint a precise picture of all the businesses and the ultimate beneficial owners (UBOs) implicated in the leak.
The Pandora Papers key findings:
- 11.9 million files
- 14 offshore service providers (law firms, banks, etc.)
- Oldest documents from 1970, but most between 1996 and 2020
- Shell companies in 38 jurisdictions, including the United States for the first time
- 27,000 shell companies and 29,000 ultimate beneficial owners
- 130 billionaires (Forbes list)
- 330 politicians from 90 countries
An earlier Neo4j blog shows readers how they can analyze the data that has already been provided for the last two ICIJ investigations.
The data model the organization used is as follows:
- Entity: shell company or offshore construct
- Intermediary: law firm or bank that helped create and manage the shell company
- Officer: proxy or real owners/shareholders/directors of a shell company
- Address: registered addresses for the entities above
As reporters continue to unravel “Pandora’s box,” data from the millions of files will continue to be analyzed and webbed to each other through the Neo4j graph database.
I think we can all agree this is fascinating stuff.
If you want to learn more about how graph-based analytics made the investigation possible, don’t miss our webinar on April 12 with ICIJ journalists Emilia Diaz-Struck and Miguel Fiandor Gutiérrez.
Register for the Webinar Now
Register for the Webinar Now