Hey @neo4j please do this once again!!! #panamapapers #neo4j
— sanchezcastro (@SCHZCAS) January 15, 2017
https://t.co/B8pjxNKCyA
As part of the Buzzfeed article opens in new tabHelp Us Map TrumpWorld, the four investigative journalists, John Templon, Alex Campbell, Anthony Cormier, and Jeremy Singer-Vine asked the public to help them map and analyze the data that they investigated, confirmed and published surrounding the business connections of Donald J. Trump:
Now we are asking the public to use our data to find connections we may have missed, and to give us context we don’t currently understand. We hope you will help us — and the public — learn more about TrumpWorld and how this unprecedented array of businesses might affect public policy.As you probably know, opens in new tabNeo4j has been used in investigative journalism before. Most notably in the collaborative work of the International Consortium of Investigative Journalists (ICIJ) on the opens in new tab Panama Papers investigations, but also by other investigative organizations and journalists. To support this kind of work, we launched the opens in new tabData Journalism Accelerator Program last year to help investigative data journalists better leverage opens in new tabgraph database technology.
So, this below is our small contribution toward mapping and analyzing the first part of the TrumpWorld dataset. Much more connected data analysis is possible, which I’ll outline at the end of the article.
Digging into the TrumpWorld Dataset
The opens in new tabBuzzfeed article points to a Google Spreadsheet with collected, verified and researched data of the connections of organizations and people related to Trump with each group listed on a separate tab.
The data is fortunately available as a publicly readable document, so we can use our old trick of downloading it as CSV/TSV and using the download URL with
LOAD CSV
to import the dataset into Neo4j.If you want to, opens in new tabdownload and start Neo4j (it only takes a minute). Then you can run these queries in your local Neo4j instance, or have a look at the opens in new tabinteractive variant of this post as a opens in new tabGraph Gist (our data science notebooks for graphs). All example queries in this blog post use the opens in new tabCypher graph query language.
If you just want to run Neo4j with this dataset using a Docker container, you can do so. My colleague Mark Needham prepared one for you to use
opens in new tab
docker run --publish=7474:7474 --publish=7687:7687 markhneedham/neo4j-3.1-trumpworld
then you can open Neo4j as usual on https://localhost:7474 to access the data.
Let’s have a look at the data first. Here is a sample:
WITH 'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=634968401' AS url LOAD CSV WITH HEADERS FROM url AS row RETURN row.`Organization A`,row.`Organization B`,row.Connection, row.`Source(s)` LIMIT 5
row.`Organization A` | row.`Organization B` | row.Connection | row.`Source(s)` |
---|---|---|---|
4 SHADOW TREE LANE MEMBER CORP. | 4 SHADOW TREE LANE LLC | Ownership | opens in new tabhttps://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html |
40 WALL DEVELOPMENT ASSOCIATES LLC | 40 WALL STREET LLC | Ownership | opens in new tabhttps://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html |
40 WALL STREET LLC | 40 WALL STREET COMMERCIAL LLC | Ownership | opens in new tabhttps://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html |
40 WALL STREET MEMBER CORP. | 40 WALL STREET LLC | Ownership | opens in new tabhttps://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html |
401 MEZZ VENTURE LLC | 401 NORTH WABASH VENTURE LLC | Ownership | opens in new tabhttps://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html |
You can see that it’s quite straightforward: two columns for the node-entities (i.e., organizations), one each for the type of connection, along with the source of the information which represent the relationships.
As the connection information is quite verbose and not regular enough to serve as relationship-type, we either have to simplify it, as is demonstrated in opens in new tabthis more detailed TrumpWorld graph.
Or we can use one generic relationship-type per mapping, which we will do here for the sake of simplicity:
(:Organization)-[:CONNECTED_TO]->(:Organization)
(:Person)-[:INVOLVED_WITH]->(:Organization)
(:Person)-[:RELATED_TO]->(:Person)
With this simple model at hand, we can now go ahead and import the data into our graph, following the same basic pattern:
- We get the data from the CSV file
- Create or find (
MERGE
) the two nodes - Connect them with a relationship
Data Import
Data Constraints
As we don’t want duplicate entries for organizations and people, we set up two constraints:
CREATE CONSTRAINT ON (o:Organization) ASSERT o.name IS UNIQUE; CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE;
Connect Organizations with Other organizations from the 1st Tab
WITH 'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=634968401' AS url LOAD CSV WITH HEADERS FROM url AS row MERGE (o1:Organization {name:row.`Organization A`}) MERGE (o2:Organization {name:row.`Organization B`}) CREATE (o1)-[con:CONNECTED_TO]->(o2) SET con.connection=row.Connection, con.source=row.`Source(s)`
Connect People with Organizations from the 2nd Tab
WITH 'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1368567920' AS url LOAD CSV WITH HEADERS FROM url AS row MERGE (p:Person {name:row.Person}) MERGE (o:Organization {name:row.Organization}) CREATE (p)-[con:INVOLVED_WITH]->(o) SET con.connection=row.Connection, con.source=row.`Source(s)`
Connect People with other People from the 3rd Tab
WITH 'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=905294723' AS url LOAD CSV WITH HEADERS FROM url AS row MERGE (p1:Person {name:row.`Person A`}) MERGE (p2:Person {name:row.`Person B`}) CREATE (p2)-[con:RELATED_TO]->(p1) SET con.connection=row.Connection, con.source=row.`Source(s)`
Import Results
So, in total we imported 1514 nodes and 1857 relationships.
And this is the full TrumpWorld graph in all its “beauty,” after querying for all data (raise the view limits in the sidebar config), go to fullscreen and then zoom out.
Running Some Simple Graph Database Queries
Most-Connected Organizations
What are the organizations that have the most connections within the TrumpWorld dataset, and which are those mainly?
MATCH (o:Organization)-[r]-() RETURN o.name, count(*), collect(distinct type(r)) AS types ORDER BY count(*) DESC LIMIT 5 ╒═════════════════════════════════════╤═══════════╤═══════════════════════════════════╕ │"o.name" │"count(*)"│"types" │ ╞═════════════════════════════════════╪═══════════╪═══════════════════════════════════╡ │"THRIVE CAPITAL" │86 │["CONNECTED_TO","INVOLVED_WITH"]│ ├─────────────────────────────────────┼───────────┼───────────────────────────────────┤ │"40 WALL STREET LLC" │41 │["INVOLVED_WITH","CONNECTED_TO"]│ ├─────────────────────────────────────┼───────────┼───────────────────────────────────┤ │"DJT HOLDINGS LLC" │36 │["INVOLVED_WITH","CONNECTED_TO"]│ ├─────────────────────────────────────┼───────────┼───────────────────────────────────┤ │"KUSHNER COMPANIES" │32 │["INVOLVED_WITH","CONNECTED_TO"]│ ├─────────────────────────────────────┼───────────┼───────────────────────────────────┤ │"TRUMP HOTELS & CASINO RESORTS" │28 │["INVOLVED_WITH","CONNECTED_TO"]│ └─────────────────────────────────────┴───────────┴───────────────────────────────────┘
The Second-Degree Kushner Network
MATCH network = (:Person {name:"JARED KUSHNER"})-[*..2]-() RETURN network
Which Banks Are Involved with These Organizations and People?
MATCH (bank:Organization)--(other) WHERE bank.name contains "BANK" RETURN *
Donald Trump’s Connections to Vladimir Putin
And finally let’s look at the shortest paths between Vladimir Putin and Donald Trump.
MATCH (vp:Person {name:"VLADIMIR PUTIN"}),(dt:Person {name:"DONALD J. TRUMP"}) MATCH path = allShortestPaths( (vp)-[*]-(dt) ) RETURN path
As expected, the shortest path between Putin and Trump officially goes through Rex Tillerson. But as you can also see, there is a missing direct connection between Trump and Tillerson, the current nominee for U.S. Secretary of State.
As we’ll see next, the other nominees are there, so this is one bit of missing information that I reported back to Buzzfeed.
Other Nominees of Trump’s Cabinet
MATCH (p:Person)-[con:RELATED_TO]->() WHERE con.connection CONTAINS "Nominee" RETURN p.name, con.connection ORDER BY split(p.name," ")[-1] DESC LIMIT 5
p.name | con.connection |
---|---|
RYAN ZINKE | Nominee for Secretary of Interior |
JEFF SESSIONS | Nominee for Attorney General |
WILBUR ROSS | Nominee for Secretary of Commerce |
ANDY PUZDER | Nominee for Secretary of Labor |
SCOTT PRUITT | Nominee for Administrator of the Environmental Protection Administration (EPA) |
Nominee Influence
And finally we can look at the combined influence that these nominees exert within the TrumpWorld graph – at least from what is in the data we have here:
MATCH (o:Organization)<-[:INVOLVED_WITH]-(p:Person)-[con:RELATED_TO]->() WHERE con.connection CONTAINS "Nominee" RETURN o,p
We see that these 18 individuals are involved with 299 organizations that they either work with, own or control, are indebted to or are otherwise involved with Donald Trump.
Fun Graph Query
To conclude, I want to demonstrate what graph databases are quite good at – quickly finding transitive relationships within data. First, I looked for the longest shortest paths in our graph, checking a cross product of roughly 17,000 pairs of people. This finished in little over three seconds on my laptop.
match (p1:Person), (p2:Person) where id(p1) < id(p2) match p = shortestPath((p1)-[*]-(p2)) return p1,p2,length(p) order by length(p) desc limit 20 ╒══════════════════════════════╤═══════════════════════════╤═════════════╕ │"p1" │"p2" │"length(p)" │ ╞══════════════════════════════╪═══════════════════════════╪═════════════╡ │{"name":"VICTOR HEGGELMAN"} │{"name":"ZHUO RAN"} │10 │ ├──────────────────────────────┼───────────────────────────┼─────────────┤ │{"name":"VICTOR HEGGELMAN"} │{"name":"CHRIS CHRISTIE"}│10 │ ├──────────────────────────────┼───────────────────────────┼─────────────┤ │{"name":"TIM MOL"} │{"name":"ZHUO RAN"} │10 │ ├──────────────────────────────┼───────────────────────────┼─────────────┤ │{"name":"ROMA DOWNEY"} │{"name":"ZHUO RAN"} │10 │ ├──────────────────────────────┼───────────────────────────┼─────────────┤ │{"name":"CHRIS CHRISTIE"} │{"name":"ROMA DOWNEY"} │10 │ ├──────────────────────────────┼───────────────────────────┼─────────────┤ │{"name":"LEV LEVIEV"} │{"name":"JOSE GARCIA"} │9 │ └──────────────────────────────┴───────────────────────────┴─────────────┘
If we pick two of those – Lev Leviev and Roma Downey – who have probably little to do with each other in real life except her wearing diamonds, we can see who connects them in our graph.
MATCH (p1:Person {name:"LEV LEVIEV"}), (p2:Person {name:"ROMA DOWNEY"}) MATCH path = allShortestPaths((p1)-[*]-(p2)) RETURN path
Extending the TrumpWorld Graph
There are many ways this graph can be extended, and I'll work on some of them in the coming weeks. Here are some ideas:
- Combine with related data from opens in new tablittlesis.org, "the involuntary registry of the 1%"
- Add FEC campaign financing information
- Add foundation detail data from opens in new tab citizenaudit.org
- Add people involved with the election campaign
- Add second-degree organizations related to the other people in the graph, not just Donald J. Trump
- Add more detailed information on the organizations and their setup (location, leadership, deals), e.g. by querying APIs like opens in new tabOpenCorporates or by integrating with the data from opens in new tabAleph the OCCRP data portal
Our partner Linkurious was inspired by the first part of my work to opens in new tabdemonstrate how you'd visualize this data with their software.
Click below to get your free copy the O’Reilly Graph Databases book and discover how to harness the power of graph databases for your next project.
Get My Free Copy