How to use phone calls to identify criminals?
The fact that a mobile phone can be a dangerous thing to have for a professional criminal has entered the popular culture a while ago. In the Wire for example, drug dealers use “burners”, cheap phones they dispose of regularly. Why? because your phone operator is authorized to collect information about whom you call, for how long and from where. In certain circumstances, that data can be used by law enforcement. But do you know the techniques used by police officers to use phone data to aid in arrests and convictions?We are going to see how using graph technologies it is possible to analyse phone calls to find criminals.
To illustrate our use case, let’s use a common scenario. In a residential neighborhood, a store robbery is committed during the day by a group of 4 criminals. The criminals are masked, use a stolen vehicle and leave no fingerprints. In that kind of case, finding an answer may take a lot of legwork. A witness noticed that one of the criminal used his phone to make a call minutes before the crime. Equipped with a search warrant, a police officer can contact mobile phone operators to collect information about the calls made and received near the robbery when it happened.Data model to analyse the network in the phone calls
The data phone operators provide law enforcement is highly tabular. Trying to identify unique phone numbers and their relationships in tabular data is very hard. We are thus going to use the phone calls data to build a graph. That graph will show how the phone numbers are connected by phone calls. From a list of calls, we are inferring a network. For this article, we have prepared a small dataset using Mockaroo. That data is in a spreadsheet format. Here are the columns :- FULL_NAME : full name of phone subscriber ;
- FIRST_NAME : first name of phone subscriber ;
- LAST_NAME : last name of phone subscriber ;
- CALLING_NBR : phone number of the caller ;
- CALLED_NBR : phone number of the person called ;
- START_DATE : start of phone call ;
- END_DATE : end of phone call ;
- DURATION : duration of phone call ;
- CELL_SITE: ID of cell site used to route phone call ;
- CITY : city of cell site used to route phone call ;
- STATE : state of cell site used to route phone call ;
- ADDRESS : address of cell site used to route phone call ;
Importing the call records
Now that we have defined a model, we are going to populate it with the data stored in the spreadsheet. To store our graph, we will use Neo4j, a popular graph database. Neo4j has a language called Cypher that makes it easy to import csv files. Here is a script that can turn our data into a Neo4j graph :
//Setup initial constraints
CREATE CONSTRAINT ON (a:PERSON) assert a.number is unique;
CREATE CONSTRAINT ON (b:CALL) assert b.id is unique;
CREATE CONSTRAINT ON (c:LOCATION) assert c.cell_tower is unique;
CREATE CONSTRAINT ON (d:STATE) assert d.name is unique;
CREATE CONSTRAINT ON (e:CITY) assert e.name is unique;
//Create the appropriate nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM “file:c:/Users/Jean/Downloads/call_records_dummy.csv” AS line
MERGE (a:PERSON {number: line.CALLING_NBR})
ON CREATE SET a.first_name = line.FIRST_NAME, a.last_name = line.LAST_NAME, a.full_name = line.FULL_NAME
ON MATCH SET a.first_name = line.FIRST_NAME, a.last_name = line.LAST_NAME, a.full_name = line.FULL_NAME
MERGE (b:PERSON {number: line.CALLED_NBR})
MERGE (c:CALL {id: line.ID})
ON CREATE SET c.start = toInt(line.START_DATE), c.end= toInt(line.END_DATE), c.duration = line.DURATION
MERGE (d:LOCATION {cell_tower: line.CELL_TOWER})
ON CREATE SET d.address= line.ADDRESS, d.state = line.STATE, d.city = line.CITY
MERGE (e:CITY {name: line.CITY})
MERGE (f:STATE {name: line.STATE})
//Setup proper indexing
DROP CONSTRAINT ON (a:PERSON) ASSERT a.number IS UNIQUE;
DROP CONSTRAINT ON (a:CALL) ASSERT a.id IS UNIQUE;
DROP CONSTRAINT ON (a:LOCATION) ASSERT a.cell_tower IS UNIQUE;
CREATE INDEX ON :PERSON(number);
CREATE INDEX ON :CALL(id);
CREATE INDEX ON :LOCATION(cell_tower);
//Create relationships between people and calls
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM “file:c:/Users/Jean/Downloads/call_records_dummy.csv” AS line
MATCH (a:PERSON {number: line.CALLING_NBR}),(b:PERSON {number: line.CALLED_NBR}),(c:CALL {id: line.ID})
CREATE (a)-[:MADE_CALL]->(c)-[:RECEIVED_CALL]->(b)
//Create relationships between calls and locations
MATCH (a:CALL {id: line.ID}), (b:LOCATION {cell_tower: line.CELL_TOWER})
CREATE (a)-[:LOCATED_IN]->(b)
//Create relationships between locations, cities and states
MATCH (a:LOCATION {cell_tower: line.CELL_TOWER}), (b:STATE {name: line.STATE}), (c:CITY {name: line.CITY})
CREATE (b)<-[:HAS_STATE]-(a)-[:HAS_CITY]->(c) |
Exploring the phone records
What we need first is to identify the criminal who made the phone call. We are going for the sake of this story to assume that the robbery was perpetrated at 2524 Thelma Avenue in Sacramento on the 25th of November, 2014 around 10:40am.Find the potential suspect
In that case, the police officers would ask the phone operators for the phone calls made 10 minutes before and after 10:40am near 2524 Thelma Avenue. Here is how a phone operator could quickly answer that question using Cypher, the query language for Neo4j :MATCH (a:CALL)-[:LOCATED_IN]->(b:LOCATION) WHERE b.cell_site = ‘0101’ OR b.cell_site = ‘0102’ AND 1416904730 < toInt(a.start) AND toInt(a.start) < 1416911930 WITH a, b MATCH (c:PERSON)-[:MADE_CALL]->(a)-[:RECEIVED_CALL]->(d:PERSON) RETURN c.full_name as caller, d.full_name as called, a.start as time, a.duration as duration, b.address as address |
caller | called | time | duration | address |
DavidMccoy | RachelCarpenter | 1417746372 | 12 | 2524 Thelma Avenue |
TimothyStevens | SharonAllen | 1417015626 | 9 | 2524 Thelma Avenue |
IreneGreen | ElizabethRamirez | 1414917918 | 4396 | 2524 Thelma Avenue |
What is the network of our suspects?
Let’s say that as a police investigator the names is the list of suspects do not ring any bells. We need further digging to identify our perpetrator. We could interview the different suspects and check their background but we are going to use data to speed up our investigation :MATCH (a:CALL)-[:LOCATED_IN]->(b:LOCATION) WHERE b.cell_site = ‘0101’ OR b.cell_site = ‘0102’ AND 1416904730 < toInt(a.start) AND toInt(a.start) < 1416911930 WITH a, b MATCH (c:PERSON)-[:MADE_CALL]->(a)-[:RECEIVED_CALL]->(d:PERSON) WITH c,d OPTIONAL MATCH (c:PERSON)-[*2]-(e:PERSON)-[*2]-(d:PERSON) RETURN e,c,d |
MATCH (c:PERSON)-[:MADE_CALL]->(a)-[:RECEIVED_CALL]->(d:PERSON) CREATE (c)-[:KNOWS]->(d); MATCH (a)-[r]-() WHERE NOT a:PERSON DELETE a, r; |