[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]
Introduction
Analysis of call detail records (CDR) provides greater insights into the telecommunications activity like inbound calls, outbound calls, dropped calls, abandoned calls and unanswered calls. This helps organizations track details on extensions, toll fraud and multiple locations, and it also helps them prevent productivity loss and control expenses.
CDR data can reach big data proportions for some enterprise businesses. Updating very large tables in Microsoft SQL Server can be a time-consuming task and sometimes it might take hours to finish. In addition to this, it might also cause blocking issues.
Neo4j has been used in the telecom industry before for both churn reduction and event log data, but both of these applications are at the telecom provider level.
Here, I use Neo4j at the business level that consumes the services provided by the telecom provider.
Solution
Call detail records data files come with about 120 columns. For this gist, I selected call time, calling number, called number, call forwarded number, call duration, call type (incoming/outgoing/voicemail), call answered flag, call forwarded flag and company name.
Incoming calls: Answered or went to voicemail.
Incoming calls: Forwarded, answered or went to voicemail.
Outgoing calls (Extension to Extension): Answered or went to voicemail.
Outgoing calls (External): Called.
Direct to voicemail: Voicemail.
This graph data model stores call detail records (CDR) in nodes, relationships and properties. Call date is modelled as an integer following the style of this article.
Data Model
Metadata
Setup
Here is a sample of data in the CSV file:
TME: Epoch time, CLNG: Calling number, CLED: Called number, FCLED: Call forwarded number (80000 is voicemail), TYP: Call type, ANS: Answered or Voicemail, FWD: Forwarded number, CPNY:Company name TME,CLNG,CLED,FCLED,DUR,TYP,ANS,FWD,CPNY 1449158574,4045551212,4041212555,80000,70,Incoming,VM,FWD,AmeyaSoft 1449159132,9195551212,8191212555,8191212555,458,Outgoing,Ans,Orig,AmeyaSoft 1449159135,8045551212,5401212555,80000,304,Incoming,VM,FWD,AmeyaSoft 1449159146,2705551212,8151212555,8151212555,451,Incoming,Ans,FWD,AmeyaSoft
A Cypher query to setup the database (I used only 50 call records):
CREATE CONSTRAINT ON ( connect:Connect ) ASSERT connect.id IS UNIQUE; USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/.../CDR_Gist3.csv" As line WITH line, apoc.date.format((ToInteger(line.TME)),('s'), ('yyyy/MM/dd'), ('PST')) as dates WITH line, dates, REPLACE(dates, "/", "") as dt MERGE (company:Company {name: line.CPNY}) MERGE (calldate:CallDate {id: toInt(dt)}) MERGE (company)-[:CALL_DATE]->(calldate) MERGE (connect:Connect {id: TOINT(line.TME)}) MERGE (calldate)-[:CONNECT_TIME {date: toInt(dt)}]->(connect) FOREACH(ignoreMe IN CASE WHEN trim(line.TYP)="Incoming" THEN [1] ELSE [] END | FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="Ans" THEN [1] ELSE [] END | CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED}) CREATE (calling:IncomingFrom {id: line.CLNG, duration: TOINT(line.DUR)}) CREATE (connect)-[r:INCOMING]->(calling) CREATE (calling)-[t:CALL_TO]->(called) SET r.ans = CASE line.ANS WHEN 'Ans' THEN true ELSE false END, r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END SET t.fwd = CASE line.FWD WHEN 'Orig' THEN false ELSE true END ) FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="VM" THEN [1] ELSE [] END | CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED}) CREATE (callingvm:IncomingVM {id: line.CLNG, duration: TOINT(line.DUR)}) CREATE (connect)-[r:VOICEMAIL {call: "Incoming"}]->(callingvm) CREATE (callingvm)-[:CALL_TO]->(called) SET r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END ) ) FOREACH(ignoreMe IN CASE WHEN trim(line.TYP)="Outgoing" THEN [1] ELSE [] END | FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="Ans" THEN [1] ELSE [] END | CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED}) CREATE (calling:CallFrom {id: line.CLNG, duration: TOINT(line.DUR)}) CREATE (connect)-[r:OUTGOING]->(calling) CREATE (calling)-[t:CALL_TO]->(called) SET r.ans = CASE line.ANS WHEN 'Ans' THEN true ELSE false END, r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END SET t.fwd = CASE line.FWD WHEN 'Orig' THEN false ELSE true END ) FOREACH(ignoreMe IN CASE WHEN trim(line.ANS)="VM" THEN [1] ELSE [] END | CREATE (called:CallTo {orig: line.CLED, fwd: line.FCLED}) CREATE (callingvm:VoiceCall {id: line.CLNG, duration: TOINT(line.DUR)}) CREATE (connect)-[r:VOICEMAIL {call: "Outgoing"}]->(callingvm) CREATE (callingvm)-[:CALL_TO]->(called) SET r.Abandon = CASE line.DUR WHEN 0 THEN true ELSE false END ) );
// Added 152 labels, created 152 nodes, set 422 properties, created 151 relationships
// graph
Call detail records provide detailed logs about every phone activity and this allows a business to make better management and personnel decisions by analyzing patterns and trends. A report can also be generated on money spent on calling minutes by a specific department or a business unit at a different location.
Incoming Calls
Employee utilization is an extremely important part of a business. A report on all incoming calls received by each employee can show call volume and minutes for each employee. Higher call volume indicates the need to add more resources for better customer service.
MATCH path=(c:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]-> (n:Connect)-[r:INCOMING]->(n1)-[]->(n2) RETURN path;
//graph_result
Voicemail
A busy network or not enough personnel may result in a high volume of calls going to voicemail. This may lead to lost calls and ultimately this will lead to losing good customers. Analysis of all unanswered calls allows a business to make better management and personnel decisions.
MATCH path=(c:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]-> (n:Connect)-[r:VOICEMAIL]->(n1)-[]->(n2) RETURN path;
//graph_result
Call Distributions
An organization can use CDR data to verify billing disputes and keep track of the usage of phone systems for the entire organization. A report on total calls grouped by date, call type (incoming/outgoing/unanswered/toll free calls/conference calls) gives an aggregate view of all the calling activities.
Total incoming, unanswered, and outgoing calls:
MATCH (n:Connect) WITH n, size((n)-[:INCOMING]->()) as Incoming, size((n)-[:OUTGOING]->()) as Outgoing, size((n)-[:VOICEMAIL]->()) as Voicemail RETURN count(*) as Total, sum(Incoming) as Incoming, sum(Outgoing) as Outgoing, sum(Voicemail) as Voicemail;
//table
Total incoming, voicemail, and outgoing calls by each employee. This way employees performance can be evaluated by managers.
//Total calls by phone number MATCH (n:Connect)-[]->()-[:CALL_TO]->(n2:CallTo) RETURN "Total" as Type,n2.orig as PhoneNbr, COUNT(*) as Count ORDER BY Count DESC UNION MATCH (n:Connect)-[r]->(n1)-[:CALL_TO]->(n2:CallTo) RETURN type(r) as Type, n2.orig as PhoneNbr, COUNT(*) as Count ORDER BY Count DESC;
//table
Total call count per employee
Incoming calls per employee
Outgoing calls per employee
Voicemail count per employee
10 out of 11 calls to 4041212555 went to voicemail: Perhaps a signal for poor performance by the employee?More Details
In a multi-site organization, there will be a central location which receives the call detail records from every site. The data from each site will have a field to store a code that identifies the site.
For a single-site organization, CDRs contain one record for each call time. For multi-site organizations, there can be multiple records (one from each site) for one call time. The following query displays this multi-site scenario (with a bigger dataset).
MATCH (n:Connect)-[r:VOICEMAIL {call: 'Incoming'}]-(n1:VoiceCall) WITH DISTINCT n MATCH (n1:Connect)-[r1:VOICEMAIL {call: 'Outgoing'}]-(n2:VoiceCall) WHERE n1.id = n.id WITH DISTINCT n1 MATCH path = (x:Company)-[:CALL_DATE]->(n0:CallDate)-[:CONNECT_TIME]-> (n2:Connect)-[r2:INCOMING|VOICEMAIL|OUTGOING]-> (n11)-[:CALL_TO]->(n21) WHERE n2.id = n1.id RETURN path;
Conclusion
Call detail records provide a wealth of information on telecommunications activity for an entire organization. Modeling the data with Neo4j provided orders of magnitude improvement in generating statistics of all call types compared to Microsoft SQL Server.
In this gist, I covered only a few metrics. There are many other metrics to consider that I did not cover here, including:
- Conference calls
- Toll-free calls
- Returned voicemail messages
Resources
Dataset: https://github.com/kaisesha/cdrgraph/blob/master/CDR_Gist3.csv
Get My Free Copy