Digging Into the ICIJ Pandora Papers Dataset with Neo4j




Yesterday, the Pulitzer Award winning International Consortium of Investigative Journalists (ICIJ) published the first data-release of the recent Pandora Papers investigation that reveals the financial secrets of many global power players who, instead of collectively working to take down offshore systems, have been benefiting from these systems by hiding their assets in covert companies and trusts.

This time the data publication was not split across the different investigations but contains the full offshoreleaks database in one dataset, so you can explore the data of shell companies, law firms, banks, and ultimate owners across all leaks and investigations.

The data model is consistent with previous publications. Officers are related in several ways (directors, shareholders, beneficiaries) with Entities (shell companies). Intermediaries (banks, law firms) manage the creation and operation of those shell companies. And all of them have addresses that can be used for investigations as well.

Pandora Papers Dataset

Each node has fields for countries and country codes to associate them with specific geographies and many other pieces of information.

The first data release for the Pandora Papers consists of 26k Officers, 18k Entities, and 1,000 Intermediaries.



Installing the Dataset in Neo4j Desktop

The easiest way to get started with the dataset is to Download & Install Neo4j Desktop.

1. Download the “dump” file from the public GitHub repository.
2. Either in the example project or a newly created project you can use “Add File” to add the dump file to your project.



3. Then choose “Create new DBMS from Dump.”

4. Provide a password.
5. Wait a few seconds until the db is created then hit “Start.”
6. After it’s started, open “Neo4j Browser” on the running database.

7. Within Neo4j Browser use :play icij-offshoreleaks to launch the interactive guide. (Pin it on top with the pin icon).



Exploring the Data with Neo4j Bloom

Besides exploring the provided guides, you can also pick any of the published stories and look behind the scenes by searching for the people, organizations, and jurisdictions mentioned in the story.


If you don’t feel comfortable with a query language, you can also use the Neo4j Bloom Visualization Software to explore the data with a more natural language interface and visually.

You can start Neo4j Bloom from the “Graph Apps” sidebar in Neo4j Desktop. It will open for your currently running database.




You can walk through it with me right here.


Follow a Story from the “The Landlords” Investigations

As an example of how you can investigate published stories yourself, here is an example of the “South Africa’s smart city” from the “The Landlords” investigations on property ownership by shell companies with unbeknownst owners.

In “South Africa’s smart city” Ruslan Goryukhin – a key aide to some of Putin’s closet friends – is reported to be connected to companies involved in the development of the first “post apartheid smart city” – Cradle City in South Africa.

Let’s see what we find in our data, first querying for Goryukhin.

MATCH path = (o:Officer)-[r]->(:Entity)
WHERE o.name CONTAINS 'GORYUKHIN' AND o.sourceID STARTS WITH "Pandora Papers"
RETURN path LIMIT 100 


Another person mentioned in the story was “Preston Hampton Haskell IV, the son of a Texas construction billionaire.” Let’s see if we can find him too, and his connections to Goryukhin.

MATCH (o:Officer), (o2:Officer)
WHERE o.name CONTAINS 'GORYUKHIN' AND o.sourceID STARTS WITH "Pandora Papers" AND o2.name contains 'PRESTON HAMPTON HASKELL'
MATCH path=allShortestPaths((o)-[*]-(o2))
RETURN path LIMIT 25


The report speaks of a shell company named Kelburn One and Amari Land International Ltd – later renamed to Forum Properties Africa – which unfortunately are not in the published dataset.

Programmatic Access

The example repository also comes with code-examples in Python, Java, Javascript, .Net, and Go. There is also a full GraphQL project for the dataset with a schema for the neo4j/graphql integration library that you can use to run and deploy a GraphQL API.

These examples show how to connect to the database and run a query against the data. So if you’re inclined to build a dashboard or app, feel free to use those.

Here is the JavaScript example:



Load the Database into Neo4j AuraDB

If you want to dive deeper into the dataset you can also use our cloud service Neo4j AuraDB to load the dataset into an AuraDB Pro instance.

  1. Register or Log in at console.neo4j.io
  2. Create an AuraDB Professional instance (a size of 2G should be enough to upload the file).
  3. Save the password!
  4. Go to the “Import Database” and upload the dump file.
  5. “Open” the database, provide your password.
  6. Then run :play icij-offshoreleaks for the interactive browser guide.
  7. From here you can continue to do anything that you want.

Available on the Neo4j Labs Demo Server (read-only)

A read-only database is also available on the neo4j-labs demo server. Just use “offshoreleaks” as username/password/database.

With :play icij-offshoreleaks you can run the interactive guides there.



Learn how graph technology can be used to combat money laundering in this solution guide: How to Combat Money Laundering Using Graph Technology. Click below to get your free copy.

Read the Solution Guide