GraphGists

Importing the Buzzfeed TrumpWorld Dataset into Neo4j

I came across this tweet by Sanchez Castro (@SCHZCAS) which I’m more than happy to support.

As part of the Buzzfeed article Help Us Map TrumpWorld, the four investigative journalists, John Templon, Alex Campbell, Anthony Cormier, and Jeremy Singer-Vine asked the public to help them map and analyze the data that they investigated, confirmed and published:

Now we are asking the public to use our data to find connections we may have missed, and to give us context we don’t currently understand. We hope you will help us — and the public — learn more about TrumpWorld and how this unprecedented array of businesses might affect public policy.

sub buzz 31493 1484333437 1

Setup with Local Database

If you want to work with this data locally, please download and install Neo4j 3.1 or use a blank Neo4j Sandbox (of course you can also explore the already imported Trumpworld sandbox).

Looking at the Data

The Buzzfeed article linked int he tweet points to a Google Spreadsheet with collected and verified/researched data of relationships of Trump organizations to other organizations, in total we have 770 organizations (as of today) with 611 relationships. The same data is available as a CSV and GraphML file in their GitHub Repository.

The data is fortunately available as a public google doc, so we can use our old trick of loading the CSV-Download URL with LOAD CSV into Neo4j.

Let’s have a look at the data first, here as a sample:

WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1996904412' AS url
LOAD CSV WITH HEADERS FROM url AS row
RETURN row.`Entity A`,row.`Entity A Type`, row.`Entity B`,row.`Entity B Type`,row.Connection, row.`Source(s)`
LIMIT 5
╒════════════════════════════════════╤═════════════════════╤═══════════════════════════════╤═════════════════════╤════════════════╤══════════════════════════════════════════════════════════════════════════════════════╕
│"row.`Entity A`"                    │"row.`Entity A Type`"│"row.`Entity B`"               │"row.`Entity B Type`"│"row.Connection"│"row.`Source(s)`"                                                                     │
╞════════════════════════════════════╪═════════════════════╪═══════════════════════════════╪═════════════════════╪════════════════╪══════════════════════════════════════════════════════════════════════════════════════╡
│"4 SHADOW TREE LANE MEMBER CORP."   │"Organization"       │"4 SHADOW TREE LANE LLC"       │"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
├────────────────────────────────────┼─────────────────────┼───────────────────────────────┼─────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│"40 WALL DEVELOPMENT ASSOCIATES LLC"│"Organization"       │"40 WALL STREET LLC"           │"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
├────────────────────────────────────┼─────────────────────┼───────────────────────────────┼─────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│"40 WALL STREET LLC"                │"Organization"       │"40 WALL STREET COMMERCIAL LLC"│"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
├────────────────────────────────────┼─────────────────────┼───────────────────────────────┼─────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│"40 WALL STREET MEMBER CORP."       │"Organization"       │"40 WALL STREET LLC"           │"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
├────────────────────────────────────┼─────────────────────┼───────────────────────────────┼─────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│"401 MEZZ VENTURE LLC"              │"Organization"       │"401 NORTH WABASH VENTURE LLC" │"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
└────────────────────────────────────┴─────────────────────┴───────────────────────────────┴─────────────────────┴────────────────┴──────────────────────────────────────────────────────────────────────────────────────┘

The most frequently mentioned organizations found by this query:

WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1996904412' AS url
LOAD CSV WITH HEADERS FROM url AS row
WITH row
WHERE row.`Entity A Type` = 'Organization' AND row.`Entity B Type` = 'Organization'
UNWIND  [row.`Entity A`, row.`Entity B`] AS org
RETURN org, count(*)
ORDER BY count(*) DESC LIMIT 10

We see the usual suspect on top.

╒══════════════════════════════╤══════════╕
│"org"                         │"count(*)"│
╞══════════════════════════════╪══════════╡
│"THRIVE CAPITAL"              │"84"      │
├──────────────────────────────┼──────────┤
│"MERCER FAMILY FOUNDATION"    │"41"      │
├──────────────────────────────┼──────────┤
│"40 WALL STREET LLC"          │"40"      │
├──────────────────────────────┼──────────┤
│"DJT HOLDINGS LLC"            │"35"      │
├──────────────────────────────┼──────────┤
│"KUSHNER COMPANIES"           │"30"      │
├──────────────────────────────┼──────────┤
│"TRUMP HOTELS & CASINO RESORTS│"28"      │
│, INC."                       │          │
├──────────────────────────────┼──────────┤
│"TRUMP TOWER COMMERCIAL LLC"  │"21"      │
├──────────────────────────────┼──────────┤
│"TRUMP ORGANIZATION LLC"      │"18"      │
├──────────────────────────────┼──────────┤
│"THE TRUMP ORGANIZATION, INC."│"13"      │
├──────────────────────────────┼──────────┤
│"TTTT VENTURE LLC"            │"8"       │
└──────────────────────────────┴──────────┘

What kind of relationships exist in the data:

WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1996904412' AS url
LOAD CSV WITH HEADERS FROM url AS row
RETURN row.Connection AS type, count(*)
ORDER BY count(*) DESC
╒═══════════════════════════════════════════════════════...═╤══════════╕
│"type"                                                 ... │"count(*)"│
╞═══════════════════════════════════════════════════════...═╪══════════╡
│"President (as of 2016 FEC filing)"                    ... │"475"     │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Ownership"                                            ... │"323"     │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Director"                                             ... │"126"     │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Investor"                                             ... │"120"     │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Reported member"                                      ... │"103"     │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Former director"                                      ... │"66"      │
....                                                    ...
│"\"Kushner and Cui have had an extensive ongoing dialog...i│"1"       │
│te House official,\" per Bloomberg"                    ... │          │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Marino served as Barry's law clerk"                   ... │"1"       │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Former director of African American outreach"         ... │"1"       │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Former director of advance"                           ... │"1"       │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Executive vice president and special counsel to Donald... │"1"       │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Ziya \"awarded a series of multimillion-dollar contrac... │"1"       │
├───────────────────────────────────────────────────────...─┼──────────┤
│"Former deputy director member relations"              ... │"1"       │
└───────────────────────────────────────────────────────...─┴──────────┘

While some of them like Ownership, or Investor are straightforward others (Ziya \"awarded a series of multimillion-dollar contracts to Azarpassillo,\" per The New Yorker) are very specific, and probably not a good choice for relationship-type to query on.

So we have two options, one could be to use a generic relationship and put all the Connection information into a property, or alternatively we do some cleanup/unification and have a richer set of relationships.

Simple, Direct Data Import

We look at the simpler variant here, to quickly get results. THe unification approach is demonstrated here, for those of you that want to have a bit more interesting graph model.