Follow The Data – FEC Campaign Finance Data Challenge

In politics, people are often advised to “follow the money” to understand the forces influencing decisions. As engineers, we know we can do that and more by following the data.

Inspired by some innovative work by Dave Fauth, a Washington DC data analyst, we arranged a workshop to use FEC Campaign finance data that had been imported into Neo4j.

FEC Campaign Finance Data

Every Sunday of every year, the FEC updates campaign finance data sets for the current two-year election period plus the most recent five (5) two-year election periods. The data sets include:
    • all individuals registered as candidates for President, House, or Senate
    • all registered committees engaged in political fundraising
    • all individual contributions greater than $200
In addition, there are extra files concerning transactions between committees and then some for associating records (ooh look, relationships!).

After exploring some evolutionary import strategies (starting with the most direct, then iterating), we settled on an approach which structured the data to look like this:

Campaign Finance Data in a Graph

Query Challenge

With the data imported, and a basic understanding of the domain model, we then challenged people to write Cypher queries to answer the following questions:
    • All presidential candidates for 2012
    • Most mythical presidential candidate
    • Top 10 Presidential candidates according to number of campaign committees
    • Find President Barack Obama
    • Lookup Obama by his candidate ID
    • Find Presidential Candidate Mitt Romney
    • Look up Mitt Romney by his candidate ID
    • Find the shortest path of funding between Obama and Romney
    • List the 10 top individual contributions to Obama
    • List the 10 top individual contributions to Romney
Care to give the challenge a try? OK, then follow the steps on the github project site to clone the importers. You’ll want to run the related importer like so:

./bin/fec2graph --force --importer=RELATED

Then just start up Neo4j and open a browser to https://localhost:7474 to query away. If you’re new to Cypher read through this introduction to learn the basics of querying a graph.

Submit the queries to me by next Thursday and we’ll pick a winner from the correct entries. Prize? A free pass to GraphConnect of course! Coming this November 5 & 6 in San Francisco, GraphConnect is a fantastic conference devoted to graph databases.

Want a hint?

Alrighty. Let’s take a look at #2. After successfully listing all candidates for the first query, you could page through the listing to look for names that seem…off. Use limit and skip in the return clause to page through the long listing:

start candidate=node:candidates('CAND_ID:*') 
where candidate.CAND_OFFICE='{fill this in}' AND candidate.CAND_ELECTION_YR='{this too}'
return candidate.CAND_NAME skip 100 limit 100;

Once you spot one of the many candidate names that isn’t real, you can query for it directly:
start candidate=node:candidates(CAND_NAME:'CLAUS, SANTA')
return candidate;

Cypher Masters

From our recent workshop, the winners are:
    • Matt Tyndal
    • Lou Kosak
    • Pengchao Wang
Congratulations, and thanks to everyone who joined us for the event. With the announcement of next week’s winner we will include solutions to the challenge. Good luck!