Graphing our way through the ICIJ offshore jurisdiction data


In 2013 the International Consortium of Investigative Journalism (ICIJ) released a subset of the leaked dataset on offshore jurisdictions to the public. This dataset contains ownership information about companies created in 10 offshore jurisdictions including the British Virgin Islands, the Cook Islands and Singapore. It covers nearly 30 years until 2010.

The publicly released dataset is a small part of a cache of 2.5 million leaked offshore files that ICIJ analysed. Even so, it still contains around 250k nodes, 500k edges and 1.2 million properties. This size and relational nature of the complex, international networks between offshore entities make it an excellent match for graph databases such as Neo4j.

This GraphGist explores a very small subset of this public dataset and demonstrate how this data can be modeled, queried, and displayed.

Simplified Data Model

The simplified ICIJ data model consists of three node types: Entity nodes, Location nodes and Jurisdiction nodes, which are linked by seven relationship types.

Figure 1. The Simplified Data Model

An Entity can be a natural person, organization, or juridical entity and is related to other entities, either as a OFFICER of the other entity, a CLIENT of the other entity or as a related entity (e.g. two daughter companies). Both the OFFICER and the CLIENT relationships have subtypes indicated with a "type"-property on the edge, e.g. "Shareholder", "Beneficial Owner", etc. and can have a "start" and "end"-property indicating the duration of the relationship.

Entity`s are related to a geographical `Location through a LOCATED-relationship, based on a non-normalized address string. This Location can have a bi-directional COLLOCATED-relationship with another Location, indicating that while the address-string is not the same, the geographical location is. For instance: two entities may be in the same building, but use different postal addresses.

Each location is PARTOF a Jurisdiction, which can be a country, or a somewhat independent subnational territorial unit that is PARTOF a larger jurisdiction/country. Making this distinction in the datamodel is highly relevant, given the structure of offshore networks. The most popular jurisdictions are neither well-established, large jurisdictions such as countries, nor small and possibly unstable independent territories, but those in between: semi-independent territories that have a certain degree of juridical and fiscal autonomy. Examples of such territories include British Crown dependencies such as Jersey or Guernsey.

Apart from their geographical location, `Entity`s are related to a jurisdiction through a INJURD-relationship ("in jurisdiction"), indicating that they have a tax obligation to this jurisdiction.

The selected example data contains 13 `Entity`s, 4 `Jurisdiction`s and 5 `Location`s, related with 36 relationships.

Initial Data Setup

We load the example data using a set of Cypher CREATE-statements. Identifier-properties are included on all the nodes, which allows for easy look-up in the online version of the ICIJ-database (just change the trailing number in the link)

The graph below gives a first overview, with nodes indicating the Entity`s, green for the `Location`s and orange for the `Jurisdiction.

Basic descriptive queries

Listing node characteristics

MATCH (e:Entity)
OPTIONAL MATCH (e)-[:LOCATED]->(location)-[:PARTOF]->(jurisdiction)
OPTIONAL MATCH (jurisdiction)-[:PARTOF]->(main_jurdisdiction)
RETURN e.label AS Entity, e.type AS Type, e.status AS Status, e.incorporated AS Incorporated, jurisdiction.label AS Jurisdiction, main_jurdisdiction.label AS `Main Jurisdiction`

A first descriptive query provides an overview of the included entities: juridical type, activity-status, incorporation date, and the jurisdiction they are located in. In the query we make the distinction between the direct and the main jurisdiction: if the jurisdiction has a PARTOF relation with another jurisdiction, the later is also displayed as "Main Jurisdiction".

Why this is relevant is immediately visible in the results table: while offshore entities such as the Sefren Trust are directly registered in countries such as Singapore, entities such as CorpShare Ltd are registered in Labuan, a federal territory of Malaysia that is aggressively marketed as an offshore financial centre.

Listing edge characteristics

MATCH (e1:Entity)-[r:CLIENT|:OFFICER]->(e2:Entity)
RETURN e2.label AS `Entity 1`, r.type AS `is a ... of`, e1.label AS `Entity 2`, r.start AS Since

A second descriptive query shows us the types of CLIENT and OFFICER relationships present in the example dataset. Note the presence of Crédit Agricole, the largest retail banking group in France, which is a client of the Singapore-based Sefren Trust managed by Antwerp-based entrepreneur Luscha Baumwald.

Exploring hidden relationships

The power of graph databases and query languages becomes more readily visible when we are interested in complex relations between entities, which would required demanding `JOIN`s, in traditional databases.

Should we check for companies on the 7½th floor?

When looking for link that are possibly not apparent on first sight, we might look at entities that formally share the same Location. However, this might overlook links, as locations are matched on an non-normalized address-string, e.g. a different postbox would mean there is no formal relation.

An example of a more inclusive query is presented below. We start from a selected entity, the offshore entity Gurker Sdn Bhd, and select its registered location using the first MATCH and the WITH statement. In the second MATCH statement we query for all entities that are (1) registered on the same location (identical address) and the entities that are registered on the locations that are collocated with the address of our starting entity.

While Sherper Sdn Bhd, CorpDirect Ltd, CorpSec Ltd, and CorpShare Ltd share the address of Gurker Sdn Bhd, we find an additional, collocated entity: Portcullis TrustNet (Labuan) Limited. The first group of entities is registered on the 6th floor, while the later is registered on the 7th floor of the same building.

MATCH (gurker:Entity { label : 'Gurker Sdn Bhd' })-[:LOCATED]->(location)
WITH location
MATCH (l_entity:Entity)-[:LOCATED]->(location)<-[:COLLOCATED]-(colocation)<-[:LOCATED]-(colo_entity:Entity)
RETURN l_entity.label AS `Same location`, location.label AS `Gurker Address`, colo_entity.label AS `Collocated`, colocation.label AS `Collocated Address`

Two Belgians walk into an offshore jurisdiction…​

This query will return all `Entity`s located in Belgium:

MATCH (e:Entity)-[:LOCATED]->(location)-[:PARTOF]->(:Jurisdiction { label : 'Belgium' })
RETURN e.label AS Label, location.label AS Location

The two returned entities are persons living in Antwerp, Belgium. A more interesting follow-up query would be establishing whether there is a relationship between these two persons throughout the graph of offshore entities.

To answer this, we use the built-in shortestPath function. We specify the two nodes between which we are establishing a path and specify the types of relationships the shortest path algorithm may follow. We are explicitly interested in client/officer links, shared/collocated addresses, related entity-relations, etc. By specifying this, we also exclude paths (PARTOF) that go over jurisdictions—​otherwise the shared jurisdiction of Belgium would of course be the shortest path.

MATCH (baumwald:Entity { label:"Luscha Baumwald" }),(bossaerts:Entity { label:"Christiaan W Bossaerts" }), p = shortestPath((baumwald)-[:LOCATED|:CLIENT|:OFFICER|:RELATED|:COLLOCATED*]-(bossaerts))
RETURN p AS `Shortest Path Baumwald-Bossaerts`

The query returs a single result, establishing that there is a link between the two Belgian entities. The figure below, generated by running the same query in the Neo4j 2.0 local web interface, gives a more readily interpretable view.

The path is completed by the RELATED-path between Portcullis TrustNet (Labuan) Limited and Portcullis Trust (Singapore) Limited. These are regional branches of Portcullis TrustNet, one of dozens of offshore service providers, and the source of a large part of the leaked ICIJ-data. The main service companies such as Portcullis TrustNet—​one of the largest in the industry—​provide is ensuring that names, finances, business interest,s and political links remain hidden.

Figure 2. The Results


The real value of these kind of applications lies of course not in clever queries, but in the degree that it would help investigative and data journalists in trawling through such massive datasets. This GraphGist is focussed on the technical aspects of modelling and querying the public ICIJ-dataset, and not the results as such. However, even the example data (that was selected at random from the Belgian subset), show the potential these kind of applications have for data journalism. Two comments:

A public search identifies Christiaan Bossaerts as the Belgian honorary Consul-General for Indonesia. Honorary Consulships are generally given to individuals with good connections in the representing country, especially w.r.t. business-links. A honorary Consul with involvement in entities in an infamous offshore jurisdiction such as Labuan might be an interesting start for an article.

Similarly, Luscha Baumwald shows up in the news in 2012, when he was convicted for fraud due to his involvement in the Radisson-case. This luxury hotel in Antwerp was used for years as a front to launder money from tax evasion and offshore constructions. As far as I can tell, this case has not yet been linked in the media to the ICIJ-dataset.