Analyzing the FinCEN Files with Neo4j

ICIJ used Neo4j to analyze the FinCEN Files data, uncovering a vast network of money laundering.

The FinCEN Files Investigation

The International Consortium of Investigative Journalists (ICIJ) has recently exposed a vast network of industrial-scale money laundering running through Western banks and generally ignored by U.S. regulators – and they used Neo4j to help crack the case wide open.

The global investigation, dubbed the FinCEN Files, reveals how money launderers move their dirty money. From drug cartels and corrupt regimes to arms trafficking and other international crimes, these global banks have turned a blind eye – or even straight up refused to stop – as they earn huge profits from each transaction.

FinCEN files

Together with BuzzFeed News and other media partners, the ICIJ spent 16 months organizing and analyzing the FinCEN Files. Using the Neo4j graph database and Linkurious graph visualization, along with many other tools, journalists built a knowledge graph to explore more than 400 spreadsheets containing data on 100,000 transactions and pieced together a nuanced picture of a broken system.

The results draw from more than 2,100 suspicious activity reports (SARs) between 1997 to 2017, which accounted for transactions of more than $2 trillion USD in dirty money. These reports were filed by banks and financial firms with the U.S. Department of Treasury’s Financial Crimes Enforcement Network (FinCEN), but were largely ignored or overlooked.

The FinCEN Files follow other breakthrough reporting and Pulitzer-Prize winning investigations from the ICIJ such as the Panama Papers, Paradise Papers, Swiss Leaks, West Africa Leaks and Luanda Leaks.

In this post, we take a closer look at the graph data model used in the FinCEN Files and walk you through a demo of querying and visualizing the connected data. We also take a deeper dive into the data using graph data science for more nuanced insights.

Some Context on Neo4j and Data Journalism

In the spring of 2016, a big piece of investigative reporting hit the streets: The ICIJ published the Panama Papers. An unprecedented set of publications, events, political revolutions and corporate boardroom changes followed – the ICIJ had hit upon a very dark nerve of the financial establishment.

The offshore constructions used by the rich, famous and criminals alike scandalized many everyday citizens, and in the aftermath, a number of different governmental and regulatory institutions initiated changes to end common tax evasion tactics. During the Panama Papers investigation, the ICIJ used not only a set of invaluable documents obtained by an anonymous source, but they also used an impressive set of technological building blocks that made an impossible task come together. One of these building blocks was the Neo4j graph data platform, and both as a company and as a community we have been proud contributors to this task of data-driven investigative journalism.

This article is about a new, and perhaps more important, piece of reporting just released by the ICIJ – using a very similar methodology and technology architecture. In the FinCEN Files, they don’t uncover offshore tax dodging constructions but instead reveal banking schemes that would and should rock the financial services world.

These banking schemes enable crime, oppression and authoritarianism across the globe. Now they’ve been brought to light thanks to the combined efforts of a global team of journalists and the power of graph technology.

Let’s dive in.

The Raw FinCEN Files Data

The ICIJ published a small subset of the suspicious activity report (SAR) data that we can use to visualize and query some of the SAR filings.

Much detail has been removed from the published data. Each transaction only includes:

    • The involved banks (name, country and geolocation)
    • The filer, originator and beneficiary
    • The begin-date and end-date
    • The monetary volume
    • The number of filings
FinCEN files data.

We can turn this tabular information into a graph dataset that represents the participants of the filing via relationships.

The Data Import Process

For each Filing we create a node to store the attribute data. For each entity (bank) mentioned in either of the files for that filing we create an Entity and store its name and geolocation and connect it to its Country. Then we connect the Filing with an appropriate relationship to the Entity:

    • originator => ORIGINATOR
    • beneficiary => BENEFITS
    • filer => FILED
    • entity_b => CONCERNS
FinCEN data model

Our FinCEN Files graph data model.

We are providing an import script for the data, and we’re also creating a demo database with the imported data and a Neo4j database dump that you can import into your own local or cloud instance.

For the demo server link use the following login information:

    • username: fincen
    • password: fincen
    • database: fincen
The Neo4j database dump and import script can be found in this GitHub repository:

FinCEN Files Data Visualization and Exploration

To explore the data after the import, one option is Neo4j Bloom™ – an interactive graph data visualization and exploration tool. With the provided “perspective,” each entity is rendered with a specific icon and caption, and you further investigate the data by just entering the relevant search phrases in the search box.

Visual results for searching Neo4j Bloom with “Entity Deutsche Bank Filing”:

Neo4j Bloom

Bloom visualization results for “Filing Benefits Entity Russia”:

Data visualization

Querying the FinCEN Files Data

In the Neo4j Browser, a number of queries can give us some deeper insight into the data.

Here are the queries and resulting data visualization for the top 10 Filings and their participating Entities:

MATCH (f:Filing)
MATCH p=(f)--(e:Entity)

Cypher search results

MATCH (e:Entity)--(f:Filing)
WITH e, round(sum(f.amount)) as total
WITH e, total ORDER BY total DESC LIMIT 10
RETURN, collect(c.code) as countries, total

Cypher query results

Other Data Visualizations

The countries of the beneficiaries can be highlighted based on the aggregate transaction volumes, as you can see below.

bank geolocation

We can visualize the bank-data by geolocation as a heat map, using the Neomap application for Neo4j Desktop.

Neomap application.

Using Graph Data Science to Analyze the FinCEN Files Data

Banks we use the ORIGINATOR and BENEFITS relationships to create a virtual TRANSFER relationship from one bank to another that holds the total amount.

Then, on top of that projected graph, we run a clustering algorithm to identify clusters of banks exchanging money.

MATCH (from:Entity)<-[:ORIGINATOR]-(f:Filing)-[:BENEFITS]->(to:Entity)
WITH FROM, TO, sum(f.amount) as sum
MERGE (FROM)-[t:TRANSFERRED]->(to) set t.amount=sum

Then we can run the Louvain clustering algorithm.

Louvain graph algorithm

Or we can find the banks who received the most money transitively by using the PageRank algorithm. In this case, these banks include:

Or the banks can be visualized by node size using Neo4j Bloom:


The FinCEN Files investigation is not the first project to reveal international criminal activities moving through the financial system, and it certainly won’t be the last.

Global investigations at this scale – whether conducted by journalism organizations, government bodies or self-policing enterprises – have shown time and again that they require the power of the entire graph technology stack: graph database, graph data visualization and graph data science.

When investigators can store, query, explore and analyze the connections in their data, no dark secrets are safe.

Ready to dig in deeper to graphs and the FinCEN Files?
Check out the Neo4j FinCEN Files Sandbox.

Go To Sandbox

Further Resources

For Developers:
Use Cases:
Past Investigations:
Get Started: