GraphGist: Preview

by Maarten Hermans


This GraphGist has not yet been submitted and approved for publication. If you're the developer, please submit for publication using the GraphGist Portal.


In 2013 the International Consortium of Investigative Journalism (ICIJ) released a subset of the leaked dataset on offshore jurisdictions to the public. This dataset contains ownership information about companies created in 10 offshore jurisdictions including the British Virgin Islands, the Cook Islands and Singapore. It covers nearly 30 years until 2010.

The publically released dataset is a small part of a cache of 2.5 million leaked offshore files that ICIJ analysed. Even so, it still contains around 250k nodes, 500k edges and 1.2 million properties. This size and relational nature of the complex, international networks between offshore entities make it an excellent match for graph databases such as Neo4j.

For this GraphGist-walkthrough we select a very small subset of this public dataset, and demonstrate how this data can be modelled, queried and displayed.

Simplified Data Model

We restructure the ICIJ-data into a simplified version. Our datamodel consists of three node types, Entity nodes, Location nodes and Jurisdiction nodes, related with seven edge types. A limited subset of properties is included, e.g. whether the entity is still active or not ("status"), the date of incorporation, etc.

An Entity can be a natural person, organisation or juridical entity and is related to other entities, either as a OFFICER of the other entity, a CLIENT of the other entity or as a related entity (e.g. two daughter companies). Both the OFFICER and the CLIENT relationships have subtypes indicated with a "type"-property on the edge, e.g. "Shareholder", "Beneficial Owner", etc. and can have a "start" and "end"-property indicating the duration of the relationship.

Entities are related to a geographical Location through a LOCATED-relationship, based on a non-normalized address string. This location can have a bi-directional COLLOCATED-relationship with another location, indicating that while the address-string is not the same, the geographical location de facto is. For instance two entities sharing an address, but with a different postbox. This is frequently the case with offshore companies, as they can be registered with thousands at the same address.

Each location is PARTOF a Jurisdiction, which can be a country, or a somewhat independent subnational territorial unit that is PARTOF a larger jurisdiction/country. Making this distinction in the datamodel is highly relevant, given the structure of offshore networks. The most popular jurisdictions are neither well-established, large jurisdictions such as countries, nor small and possibly unstable independent territories, but those in between. I.e. semi-independent territories that have a certain degree of juridical and fiscal autonomy to facilitate offshore constructions, but are part of a larger territorial entity to insure stability. For instance British Crown dependencies such as Jersey or Guernsey.

Apart from their geographical location, entities are related to a jurisdiction through a INJURD-relationship ("in jurisdiction"), indicating that they have a tax obligation to this jurisdiction.

The selected example data contains 13 Entities, 4 Jurisdictions and 5 Locations, related with 36 relationships.

Initial Data Setup

We load the example data using a set of Cypher CREATE-statements. Identifier-properties are included on all the nodes, which allows for easy look-up in the online version of the ICIJ-database (just change the trailing number in the link)

The graph below gives a first overview, with purple nodes indicating the Entities, green for the Locations and orange for the Jurisdictions.

CREATE (sefren:Entity { identifier: "164507", label: "Sefren Trust", type: "Singapore Trust"})
CREATE (baumwald:Entity { label: "Luscha Baumwald", identifier : "52458"})
CREATE (portcullis_SP:Entity { label: "Portcullis Trust (Singapore) Limited", type: "Singapore Trust"})
CREATE (creagr_SP:Entity { label: "Credit Agricole (Suisse) S.A. - Singapore Branch"})
CREATE (temasek:Location {label : "6 Temasek Boulevard #09-06 Suntec Tower 4 Singapore 038986" })
CREATE (rubenslei:Location {label : "Rubenslei 16 Bus 5 2018 Antwerpen Belgium" })

CREATE (sefren)-[:CLIENT {type : "Master Client"}]->(creagr_SP)
CREATE (sefren)-[:OFFICER {type : "Trust Settlor"}]->(baumwald)
CREATE (portcullis_SP)-[:OFFICER {type: "Trustee"}]->(sefren)
CREATE (portcullis_SP)-[:LOCATED]->(temasek)
CREATE (sefren)-[:LOCATED]->(temasek)
CREATE (baumwald)-[:LOCATED]->(rubenslei)

CREATE (singapore:Jurisdiction { label: "Singapore" })
CREATE (belgium:Jurisdiction { label: "Belgium" })
CREATE (labuan:Jurisdiction { label: "Labuan" })
CREATE (malaysia:Jurisdiction { label: "Malaysia" })
CREATE (labuan)-[:PARTOF]->(malaysia)

CREATE (temasek)-[:PARTOF]->(singapore)
CREATE (rubenslei)-[:PARTOF]->(belgium)
CREATE (portcullis_SP)-[:JURISDICTION]->(singapore)

CREATE (bossaerts:Entity { label : "Christiaan W Bossaerts", identifier : "83570"})
CREATE (portcullis_ML:Entity { label : "Portcullis TrustNet (Labuan) Limited", identifier : "59994"})
CREATE (taxanderlei:Location {label : "Taxanderlei 2, 2900 Schoten, Belgium", identifier: "287007" })
CREATE (fefc:Entity { label: "Far East Furniture Corporation", identifier: "161595", type: "Labuan Offshore Company", status : "Defunct", incorporated : "2003-05-21"})
CREATE (fplabuan_L7:Location {label : "Level 7(F2), Main Office Tower, Financial Park Labuan Complex, Jalan Merdeka, 87000 Labuan F.T., Malaysia" })

CREATE (fefc)-[:JURISDICTION]->(labuan)
CREATE (bossaerts)-[:LOCATED]->(taxanderlei)
CREATE (taxanderlei)-[:PARTOF]->(belgium)
CREATE (portcullis_ML)-[:LOCATED]->(fplabuan_L7)
CREATE (fplabuan_L7)-[:PARTOF]->(labuan)

//Christiaan W Bossaerts Master Client of Far East Furniture Corporation
CREATE (fefc)-[:CLIENT { type : "Master Client"}]->(bossaerts)

//Christiaan W Bossaerts Beneficial Owner of Far East Furniture Corporation
CREATE (fefc)-[:OFFICER { type : "Beneficial Owner", start : "2003-05-21"}]->(bossaerts)

//Shareholder 	Portcullis TrustNet (Labuan) Limited
CREATE (fefc)-[:OFFICER { type : "Shareholder", start : "2003-05-21"}]->(portcullis_ML)

//Far East Furniture Corporation Resident Director Gurker Sdn Bhd
CREATE (gurker:Entity { label : "Gurker Sdn Bhd", identifier : "71435"})
CREATE (fplabuan_L6:Location {label : "Level 6(D) Main Office Tower, Financial Park Labuan Complex, Jalan Merdeka, P.O. Box 80887, 87018 Labuan F.T., Malaysia", identifier: "236846" })
CREATE (gurker)-[:LOCATED]->(fplabuan_L6)
CREATE (fplabuan_L6)-[:PARTOF]->(labuan)
CREATE (fefc)-[:OFFICER {type : "Resident Director", start : "2003-05-21"} ]->(gurker)

//Far East Furniture Corporation Secretary Sherper Sdn Bhd
CREATE (sherper:Entity { label : "Sherper Sdn Bhd", identifier : "48691"})
CREATE (sherper)-[:LOCATED]->(fplabuan_L6)
CREATE (fefc)-[:OFFICER {type : "Resident Director", start : "2003-05-21"} ]->(sherper)

//Resident Director 	CorpDirect Ltd
CREATE (corpdirect:Entity { label : "CorpDirect Ltd", identifier : "72244"})
CREATE (corpdirect)-[:LOCATED]->(fplabuan_L6)
CREATE (fefc)-[:OFFICER {type : "Resident Director", start : "2006-03-03"} ]->(corpdirect)

// Secretary 	CorpSec Ltd
CREATE (corpsec:Entity { label : "CorpSec Ltd", identifier : "97296"})
CREATE (corpsec)-[:LOCATED]->(fplabuan_L6)
CREATE (fefc)-[:OFFICER {type : "Secretary", start : "2006-03-03"} ]->(corpsec)

// Shareholder 	CorpShare Ltd
CREATE (corpshare:Entity { label : "CorpShare Ltd", identifier : "112732"})
CREATE (corpshare)-[:LOCATED]->(fplabuan_L6)
CREATE (fefc)-[:OFFICER {type : "Shareholder", start : "2006-03-03"} ]->(corpshare)

// Christiaan W. Boassaerts Master Client of Q.C.G.C. Limited
CREATE (qcgc:Entity { label : "Q.C.G.C. Limited", identifier : "167232", type: "Standard International Company", status: "Active", incorporated : "2006-11-15" })
CREATE (qcgc)-[:JURISDICTION]->(labuan)
CREATE (qcgc)-[:CLIENT {type : "Master Client",  start : "2006-10-16"}]->(bossaerts)
CREATE (qcgc)-[:OFFICER {type : "Shareholder", start: "2006-11-15"} ]->(corpshare)
CREATE (qcgc)-[:OFFICER {type : "Resident Director", start: "2006-11-15"}]->(corpdirect)
CREATE (qcgc)-[:OFFICER {type : "Secretary", start:"2008-05-15" }]->(corpsec)

CREATE (fplabuan_L6)-[:COLLOCATED]->(fplabuan_L7)
CREATE (fplabuan_L7)-[:COLLOCATED]->(fplabuan_L6)
CREATE (portcullis_ML)-[:RELATED {type : "Related entity"}]->(portcullis_SP)
CREATE (portcullis_SP)-[:RELATED {type : "Related entity"}]->(portcullis_ML)
Loading graph...

Basic descriptive queries

Listing node characteristics

MATCH (e:Entity)
OPTIONAL MATCH (e)-[:LOCATED]->(location)-[:PARTOF]->(jurisdiction)
OPTIONAL MATCH (jurisdiction)-[:PARTOF]->(main_jurdisdiction)
RETURN e.label AS Entity, e.type AS Type, e.status AS Status, e.incorporated AS Incorporated, jurisdiction.label AS Jurisdiction, main_jurdisdiction.label AS `Main Jurisdiction`
Loading table...

A first descriptive query provides an overview of the included entities: juridical type, activity-status, incorporation date, and the jurisdiction they are located in. In the query we make the distinction between the direct and the main jurisdiction: if the jurisdiction has a PARTOF relation with another jurisdiction, the later is also displayed as "Main Jurisdiction".

Why this is relevant is immediately visible in the results table: while offshore entities such as the Sefren Trust are directly registered in countries such as Singapore, entities such as CorpShare Ltd are registered in Labuan, a federal territory of Malaysia that is aggressively marketed as an offshore financial centre.

Listing edge characteristics

MATCH (e1:Entity)-[r:CLIENT|:OFFICER]->(e2:Entity)
RETURN e2.label AS `Entity 1`, r.type AS `is a ... of`, e1.label AS `Entity 2`, r.start AS Since
Loading table...

A second descriptive query shows us the types of CLIENT and OFFICER relationships present in the example dataset. Note the presence of Crédit Agricole, the largest retail banking group in France, which is a client of the Singapore-based Sefren Trust managed by Antwerp-based entrepreneur Luscha Baumwald.

Exploring hidden relationships

The power of graph databases and query languages becomes more readily visible when we are interested in complex relations between entities, which would required demanding JOINS, etc. in traditional databases.

Should we check for companies on the 7½th floor?

For instance, when looking for link that are possibly not apparent on first sight, we might look at entities that formally share the same Location. However, this might overlook links, as locations are matched on an non-normalized address-string, e.g. a different postbox would mean there is no formal relation.

An example of a more inclusive query is presented below. We start from a selected entity, the offshore entity Gurker Sdn Bhd, and select its registered location using the first MATCH and the WITH statement. In the second MATCH statement we query for all entities that are (1) registered on the same location (identical address) and the entities that are registered on the locations that are collocated with the address of our starting entity.

While Sherper Sdn Bhd, CorpDirect Ltd, CorpSec Ltd, and CorpShare Ltd share the address of Gurker Sdn Bhd, we find an additional, collocated entity: Portcullis TrustNet (Labuan) Limited. The first group of entities is registered on the 6th floor, while the later is registered on the 7th floor of the same building.

MATCH (gurker:Entity { label : 'Gurker Sdn Bhd' })-[:LOCATED]->(location)
WITH location
MATCH (l_entity:Entity)-[:LOCATED]->(location)<-[:COLLOCATED]-(colocation)<-[:LOCATED]-(colo_entity:Entity)
RETURN l_entity.label AS `Same location`, location.label AS `Gurker Address`, colo_entity.label AS `Collocated`, colocation.label AS `Collocated Address`
Loading table...

Two Belgians walk into an offshore jurisdiction…​

While making the required distinctions in the datamodel between locations, jurisdictions, etc., finding entities in a given jurisdiction is still straightforward. This query will return all entities located in Belgium:

MATCH (e:Entity)-[:LOCATED]->(location)-[:PARTOF]->(:Jurisdiction { label : 'Belgium' })
RETURN e.label AS Label, location.label AS Location
Loading table...

The two returned entities are persons living in Antwerp, Belgium. A more interesting follow-up query would be establishing whether there is a relationship between these two persons throughout the graph of offshore entities.

To answer this, we use the build-in shortestPath-function. We specify the two nodes we are establishing a path between, and specify the types of relationships the shortest path algorithm may follow. We are explicitly interested in client/officer links, shared/collocated addresses, related entity-relations, etc. By specifying this, we also exclude paths (PARTOF) that go over jurisdictions—​otherwise the shared jurisdiction of Belgium would of course be the shortest path.

MATCH (baumwald:Entity { label:"Luscha Baumwald" }),(bossaerts:Entity { label:"Christiaan W Bossaerts" }), p = shortestPath((baumwald)-[:LOCATED|:CLIENT|:OFFICER|:RELATED|:COLLOCATED*]-(bossaerts))
RETURN p AS `Shortest Path Baumwald-Bossaerts`
Loading table...

The query returs a single result, establishing that there is a link between the two Belgian entities. The figure below, generated by running the same query in the Neo4j 2.0 local webinterface, gives a more readily interpretable view.

The path is completed by the RELATED-path between Portcullis TrustNet (Labuan) Limited and Portcullis Trust (Singapore) Limited. These are regional branches of Portcullis TrustNet, one of dozens of offshore service providers, and the source of a large part of the leaked ICIJ-data. The main service companies such as Portcullis TrustNet—​one of the largest in the industry—​provide is ensuring that names, finances, business interests and political links remain hidden.


The real value of these kind of applications lies of course not in clever queries, but in the degree that it would help investigative and data journalists in trawling through such massive datasets. This GraphGist is focussed on the technical aspects of modelling and querying the public ICIJ-dataset, and not the results as such. However, even the example data (that was selected at random from the Belgian subset), show the potential these kind of applications have for data journalism. Two comments:

A public search identifies Christiaan Bossaerts as the Belgian honorary Consul-General for Indonesia. Honorary Consulships are generally given to individuals with good connections in the representing country, especially w.r.t. business-links. A honorary Consul with involvement in entities in an infamous offshore jurisdiction such as Labuan might be an interesting start for an article.

Similarly, Luscha Baumwald shows up in the news in 2012, when he was convicted for fraud due to his involvement in the Radisson-case. This luxury hotel in Antwerp was used for years as a front to launder money from tax evasion and offshore constructions. As far as I can tell, this case has not yet been linked in the media to the ICIJ-dataset.