Original article written by Emil Eifrem, published in Minutehack

The Panama Papers affair shows just how important data is in finding hidden truths.

Ask editor Mar Cabra of the International Consortium of Investigative Journalists (ICIJ), the group behind the The Panama Papers investigation, what it is she thinks her team does, and the answer is pretty much what Woodward and Bernstein would have said 40 years ago: “We use technology to tell great stories.”

In the heyday of The Washington Post’s takedown of a corrupt President, Woodstein relied on a phone, perhaps a fax, and a library of clippings and information sources.

Now, reporters depend on data. Huge amounts of data that has to be probed, sifted and worked with. That’s why Cabra and rest of the global team of investigators have embraced data-based techniques as core to what they do – and as important as old school tenacity and a nose for a story.

Bigger than Snowden and more relevant to business

We now call this data-driven journalism, and it has just pulled off its biggest coup – The Panama Papers. Not only does exposure of the activities of clients of a Panamanian law firm qualify as the world’s largest financial scandal, it’s also, at 2.6 terabytes and 11.5 million documents, far larger than anything Snowden or Wikileaks managed. Let’s review how this happened.

When an anonymous source tipped off the ICIJ about a huge amount of classified internal company information of Panamanian law firm Mossack Fonseca, Cabra knew a major international scoop was possible. The problem was the data was too complex to be analysed by traditional means.

Cabra knew her team would need a sophisticated tool to analyse this data set, one that could process a large volume of highly connected data quickly, easily and efficiently.

It’s worth noting that such analysis had to be accessible to investigative journalists around the globe, regardless of their technical abilities (as the vast majority were not technical). It also had to be able to reveal patterns out of a vast pool of unstructured information, mainly in scanned bank statements and so not easily searchable by conventional means.

Read the full article →