Using Neo4j for Call Detail Records (CDR) Analytics [Community Post]


[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

Introduction


Analysis of call detail records (CDR) provides greater insights into the telecommunications activity like inbound calls, outbound calls, dropped calls, abandoned calls and unanswered calls. This helps organizations track details on extensions, toll fraud and multiple locations, and it also helps them prevent productivity loss and control expenses.

CDR data can reach big data proportions for some enterprise businesses. Updating very large tables in Microsoft SQL Server can be a time-consuming task and sometimes it might take hours to finish. In addition to this, it might also cause blocking issues.

Neo4j has been used in the telecom industry before for both churn reduction and event log data, but both of these applications are at the telecom provider level.

Customer churn reduction in a telecom network


Here, I use Neo4j at the business level that consumes the services provided by the telecom provider.

Business-level call detail records (CDR) analysis


Solution


Call detail records data files come with about 120 columns. For this gist, I selected call time, calling number, called number, call forwarded number, call duration, call type (incoming/outgoing/voicemail), call answered flag, call forwarded flag and company name.

Incoming calls: Answered or went to voicemail.

Incoming calls: Forwarded, answered or went to voicemail.

Outgoing calls (Extension to Extension): Answered or went to voicemail.

Outgoing calls (External): Called.

Direct to voicemail: Voicemail.

This graph data model stores call detail records (CDR) in nodes, relationships and properties. Call date is modelled as an integer following the style of this article.

Data Model


Graph data model for call detail records analysis


Metadata


Metadata for call detail records (CDR)


Setup


Here is a sample of data in the CSV file:

TME: Epoch time, CLNG: Calling number, CLED: Called number,
     FCLED: Call forwarded number (80000 is voicemail), 
     TYP: Call type, ANS: Answered or Voicemail, FWD: Forwarded number, 
     CPNY:Company name

TME,CLNG,CLED,FCLED,DUR,TYP,ANS,FWD,CPNY
1449158574,4045551212,4041212555,80000,70,Incoming,VM,FWD,AmeyaSoft
1449159132,9195551212,8191212555,8191212555,458,Outgoing,Ans,Orig,AmeyaSoft
1449159135,8045551212,5401212555,80000,304,Incoming,VM,FWD,AmeyaSoft
1449159146,2705551212,8151212555,8151212555,451,Incoming,Ans,FWD,AmeyaSoft

A Cypher query to setup the database (I used only 50 call records):

CREATE CONSTRAINT ON ( connect:Connect ) ASSERT connect.id IS UNIQUE;

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/.../CDR_Gist3.csv" As line
WITH line, apoc.date.format((ToInteger(line.TME)),('s'), ('yyyy/MM/dd'), ('PST')) as dates
WITH line, dates, REPLACE(dates, "/", "") as dt

MERGE (company:Company {name: line.CPNY})
MERGE (calldate:CallDate {id: toInt(dt)})
MERGE (company)-[:CALL_DATE]->(calldate)
MERGE (connect:Connect {id: TOINT(line.TME)})
MERGE (calldate)-[:CONNECT_TIME {date: toInt(dt)}]->(connect)


FOREACH(ignoreMe IN CASE WHEN trim(line.TYP)="Incoming" THEN [1] ELSE [] END | 

  FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="Ans" THEN [1] ELSE [] END | 

    CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED})
    CREATE (calling:IncomingFrom {id: line.CLNG, duration: TOINT(line.DUR)})    
    CREATE (connect)-[r:INCOMING]->(calling)
    CREATE (calling)-[t:CALL_TO]->(called)  
    SET r.ans = CASE line.ANS WHEN 'Ans' THEN true ELSE false END,
        r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END 

    SET t.fwd = CASE line.FWD WHEN 'Orig' THEN false ELSE true END

  )

  FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="VM" THEN [1] ELSE [] END | 

    CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED})
    CREATE (callingvm:IncomingVM {id: line.CLNG, duration: TOINT(line.DUR)})    
    CREATE (connect)-[r:VOICEMAIL {call: "Incoming"}]->(callingvm)
    CREATE (callingvm)-[:CALL_TO]->(called) 
    
    SET r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END
  )


)

FOREACH(ignoreMe IN CASE WHEN trim(line.TYP)="Outgoing" THEN [1] ELSE [] END | 

  FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="Ans" THEN [1] ELSE [] END | 

    CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED})
    CREATE (calling:CallFrom {id: line.CLNG, duration: TOINT(line.DUR)})    
    CREATE (connect)-[r:OUTGOING]->(calling)
    CREATE (calling)-[t:CALL_TO]->(called)  
    SET r.ans = CASE line.ANS WHEN 'Ans' THEN true ELSE false END,
        r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END 

    SET t.fwd = CASE line.FWD WHEN 'Orig' THEN false ELSE true END

  )

  FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="VM" THEN [1] ELSE [] END | 

    CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED})
    CREATE (callingvm:VoiceCall {id: line.CLNG, duration: TOINT(line.DUR)})    
    CREATE (connect)-[r:VOICEMAIL {call: "Outgoing"}]->(callingvm)
    CREATE (callingvm)-[:CALL_TO]->(called) 
    
    SET r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END
  )

);

// Added 152 labels, created 152 nodes, set 422 properties, created 151 relationships

// graph

Graph analytics of call detail records (CDR)


Call detail records provide detailed logs about every phone activity and this allows a business to make better management and personnel decisions by analyzing patterns and trends. A report can also be generated on money spent on calling minutes by a specific department or a business unit at a different location.

Incoming Calls


Employee utilization is an extremely important part of a business. A report on all incoming calls received by each employee can show call volume and minutes for each employee. Higher call volume indicates the need to add more resources for better customer service.

MATCH path=(c:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]->
      (n:Connect)-[r:INCOMING]->(n1)-[]->(n2)
RETURN path;

//graph_result

Call detail records analysis for incoming calls


Voicemail


A busy network or not enough personnel may result in a high volume of calls going to voicemail. This may lead to lost calls and ultimately this will lead to losing good customers. Analysis of all unanswered calls allows a business to make better management and personnel decisions.

MATCH path=(c:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]->
      (n:Connect)-[r:VOICEMAIL]->(n1)-[]->(n2)
RETURN path;

//graph_result

Learn how to use Neo4j to analyze call detail records (CDR) data as a graph


Call Distributions


An organization can use CDR data to verify billing disputes and keep track of the usage of phone systems for the entire organization. A report on total calls grouped by date, call type (incoming/outgoing/unanswered/toll free calls/conference calls) gives an aggregate view of all the calling activities.

Total incoming, unanswered, and outgoing calls:

MATCH (n:Connect)
WITH n,
size((n)-[:INCOMING]->())  as Incoming, 
size((n)-[:OUTGOING]->())  as Outgoing, 
size((n)-[:VOICEMAIL]->()) as Voicemail
RETURN count(*) as Total, sum(Incoming) as Incoming, 
       sum(Outgoing) as Outgoing, sum(Voicemail) as Voicemail;

//table

Total call data for an organization

Total incoming, voicemail, and outgoing calls by each employee. This way employees performance can be evaluated by managers.

//Total calls by phone number
MATCH (n:Connect)-[]->()-[:CALL_TO]->(n2:CallTo)
RETURN "Total" as Type,n2.orig as PhoneNbr, COUNT(*) as Count ORDER BY Count DESC
UNION
MATCH (n:Connect)-[r]->(n1)-[:CALL_TO]->(n2:CallTo)
RETURN type(r) as Type, n2.orig as PhoneNbr, COUNT(*) as Count ORDER BY Count DESC;

//table
Phone call count per user

Total call count per employee

Incoming call count per employee

Incoming calls per employee

Outgoing call count per employee

Outgoing calls per employee

Voicemail count per employee

Voicemail count per employee

10 out of 11 calls to 4041212555 went to voicemail: Perhaps a signal for poor performance by the employee?

More Details


In a multi-site organization, there will be a central location which receives the call detail records from every site. The data from each site will have a field to store a code that identifies the site.

For a single-site organization, CDRs contain one record for each call time. For multi-site organizations, there can be multiple records (one from each site) for one call time. The following query displays this multi-site scenario (with a bigger dataset).

MATCH (n:Connect)-[r:VOICEMAIL {call: 'Incoming'}]-(n1:VoiceCall)
WITH DISTINCT n
MATCH (n1:Connect)-[r1:VOICEMAIL {call: 'Outgoing'}]-(n2:VoiceCall)
WHERE n1.id = n.id 
WITH DISTINCT n1
MATCH path = (x:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]->
             (n2:Connect)-[r2:INCOMING|VOICEMAIL|OUTGOING]->
             (n11)-[:CALL_TO]->(n21)
WHERE n2.id = n1.id
RETURN path;

Call detail record clusters for a multi-site organization


Conclusion


Call detail records provide a wealth of information on telecommunications activity for an entire organization. Modeling the data with Neo4j provided orders of magnitude improvement in generating statistics of all call types compared to Microsoft SQL Server.

In this gist, I covered only a few metrics. There are many other metrics to consider that I did not cover here, including:
    • Conference calls
    • Toll-free calls
    • Returned voicemail messages

Resources


Dataset: https://github.com/kaisesha/cdrgraph/blob/master/CDR_Gist3.csv


Want to learn more about what you can do with graph databases? Click below to get your free copy the O’Reilly Graph Databases book and discover how to harness the power of graph technology.

Get My Free Copy