This interactive Neo4j graph tutorial covers a common credit card fraud detection scenario.

Introduction to Problem

Banks, merchants and credit card processors companies lose billions of dollars every year to credit card fraud. Credit card data can be stolen by criminals using a variety of methods. Bluetooth-enabled data skimming devices can be placed on the card reader on the pump that dispenses your petrol. The data might be stolen in a mass breach by hackers of a large retailer, as was the case with Target and Home Depot in recent years. Sometimes the criminal is simply the clerk at the checkout line at the grocery or in a restaurant, where the victim’s card is swiped through a small device or surreptitiously jotted down.

Typical Scenario

In December 2013, police in Abington, Pennsylvania arrested two Post Office employees for stealing credit card information and using it to buy more than $50,000 worth of merchandise. Their scheme was quite typical; here is how they operated:

  • the Post Office clerks copied the credit card information of some of their customers while processing transactions;

  • they then located these customers' home addresses;

  • using the credit card numbers, they would place orders online for goods or gift cards, to be delivered at their victims' home address;

  • with goods ordered online, an accomplice would wait at the address to intercept the deliveries;

The pair were apprehended not long after Post Office patron reported a man attempting to intercept one of the packages at his home, but not before the pair had bought Christmas gifts and gone on vacations with the fraudulently obtained information.

Explanation of Solution

Graph databases can help find credit card thieves faster. By representing transactions as a graph, we can look for the common denominator in the fraud cases and find the point of origin of the scam.

Credit Card Fraud Graph Data Model

A series of credit card transactions can be represented as a graph. Each transaction involves two nodes: a person (the customer) and a merchant. The nodes are linked by the transaction itself. A transaction has a date and a status.

Legitimate transactions have the status "Undisputed". Fraudulent transactions are "Disputed".

The graph data model below represents how the data looks as a graph.

Credit Card Fraud
Figure 1. Credit Card Fraud

Sample Data Set

You can download the complete dataset here:

Identify the Fraudulent Transactions

We collect all the fraudulent transactions.

MATCH (victim:Person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
RETURN AS `Customer Name`, AS `Store Name`, r.amount AS Amount, r.time AS `Transaction Time`
ORDER BY `Transaction Time` DESC

Identify the Point of Origin of the Fraud

Now we know which customers and which merchants are involved in our fraud case. But where is the criminal we are looking for? What’s going to help use here is the transaction date on each fraudulent transaction.

The criminal we are looking for is involved in a legitimate transaction during which he captures his victims credit card numbers. After that, he can execute his illegitimate transactions. That means that we not only want the illegitimate transactions but also the transactions happening before the theft.

MATCH (victim:Person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
MATCH (victim)-[t:HAS_BOUGHT_AT]->(othermerchants)
WHERE t.status = "Undisputed" AND t.time < r.time
WITH victim, othermerchants, t ORDER BY t.time DESC
RETURN AS `Customer Name`, AS `Store Name`, t.amount AS Amount, t.time AS `Transaction Time`
ORDER BY `Transaction Time` DESC

Zero in on the criminal

Now we want to find the common denominator. Is there a common merchant in all of these seemingly innocuous transactions? We just have to tweak the Cypher query to sort out the previous results according to the number of times we see each merchant.

MATCH (victim:Person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
MATCH (victim)-[t:HAS_BOUGHT_AT]->(othermerchants)
WHERE t.status = "Undisputed" AND t.time < r.time
WITH victim, othermerchants, t ORDER BY t.time DESC
RETURN DISTINCT AS `Suspicious Store`, count(DISTINCT t) AS Count, collect(DISTINCT AS Victims
Where is the thief?
Figure 2. Where is the thief?

In each instance of a fraudulent transaction, the credit card holder had visited Walmart in the days just prior. We now know the location and the date on which the customer’s credit cards numbers were stolen. With a graph visualization solution like Linkurious, we could inspect the data to confirm our intuition. Now we can alert the authorities and the merchant on the situation. They should have enough information to take it from there!

For more graph-related use cases, make sure to check the blog of Linkurious: