Welcome to Understanding Link Analysis. The purpose of my site is to discuss the methods behind leveraging visual analytics to discover answers and patterns buried within data sets.

Visual analytics provides a proactive response to threats and risks by holistically examining information. As opposed to traditional data mining, by visualizing information, patterns of activity that run contrary to normal activity surface within very few occurances.

We can dive into thousands of insurance fraud claims to discover clusters of interrelated parties involved in a staged accident ring.

We can examine months of burglary reports to find a pattern leading back to a suspect.

With the new generation of visualization software our team is developing, we can dive into massive data sets and visually find new trends, patterns and threats that would take hours or days using conventional data mining.

The eye processes information much more rapidly when information is presented as images, this has been true since children started learning to read. As our instinct develops over time so does our ability to process complex concepts through visual identification. This is the power of visual analysis that I focus on in my site.

All information and data used in articles on this site is randomly generated with no relation to actual individuals or companies.

Using Visual Analysis to Combat Call Center Fraud

Your company's call center processes thousands of transactions each week. They are the face of your company and in most cases, are empowered to access customer data, financial data and grant concessions to dissatisfied customers.

Preventing and detecting internal fraud and data leaks occurring in your call center can be a daunting task based on the sheer number of customers and agents that are involved. To add complexity to proper oversight, is that a great number of these call centers are outsourced, often overseas, making access to complete audit trails of activity more difficult.

From the perspective of executives and managers of call center companies, protecting your clients data and property is one of your top operational priorities. Leaks of client data or fraud involving your clients merchandise could damage your reputation, put in risk your current contracts and leave you open to potential litigation.

Well thought out and enforced privacy and operational policies go a long way to protecting your call center or your clients data, however it is unrealistic to think that all the employees in your call center are going to play by the rules. Just as in every business, there is going to be a few who are looking to profit off their position.

This is where a solid fraud analysis and investigation program will provide a further layer of protection that can proactively identify potential fraud and data loss at the earliest stages. Because of volume of transactions, leveraging visual analysis is the best way to look for associations between call center agents and the customers they interacting with.

The Threat Levels That Exist In Call Centers

There are three distinct threat levels which exist in most call centers. The level of investigative analysis and investigation should be based on the potential threat level which exists in the center.

  • Low - Agents who mistakenly leak information or allow concessions to customers by failing to follow proper procedures
  • Medium - Outsourced or temporary agents who have little or no company loyalty and have no incentive in the company success, but have access to customer information or the ability to send concessions (free merchandise, repair replacements ect)
  • HIGH - Fraudulent agent groups within a call center - call center agents representatives by criminal organizations or friends of corrupt agents for the sole purpose of stealing customer information or converting concessions for personal use.
Visual analytics can effectively address medium and high threat risks within the call center organization by identifying clusters of interrelated activity, customers and customer attributes and activity logs between the call center agents themselves and the individuals they are having contact with or sending merchandise too. Always remember, fraud follows the path of least resistance. By shoring up your fraud prevention defenses through visual analysis, those organizations wanting to penetrate and corrupt your call center's organization will search elsewhere.

Identifying Call Center Concession Fraud

Probably one of the most difficult analytical tasks will be the identification of call center agents who are converting merchandise or concessions for personal use. This type of activity costs companies thousands of dollars every week in misappropriated goods.

The difficulty in discovering this activity through data mining is because the activity itself is three dimensional involving the call center agents themselves and the customers. To accurately analyze the activity we have to look at the not only the activity of the agent but also the relationships of the customer to their attributes and even the relationship between the call center agents and the customers.

This often involves the extraction and mining of data from multiple data sources, and the layering of that data in an visual analysis. This is the scenario we are going to employ in this example because if you are able to leverage visual analysis to proactively identify this activity, any internal issues such as data or financial theft, which is one source visualization, will be much easier to accomplish.

So lets start with a scenario for this analysis. I am an fraud investigator for a call center and conduct a monthly analysis of concessions sent out by my agents to detect any fraud or theft which may be occurring.

Like all analysis, the first step is the planning, extraction and cleaning of the data for import into our visual analysis tool. Since I am analyzing concession fraud, I am going to need to extract data out of my agent activity database which will give me the service requests, type of concession and date of concession. Next, I will need to extract data from my customer and shipping database to find relationships between the customers and where the concessions where shipped.

In all fraud analysis, we are looking to leverage the weakest link of the scheme. In concession fraud, the weakest link in the scheme is the shipping address. If you are a call center and either converting your companies merchandise for personal use or sending concessions to friends, the one piece of information that will have to be accurate is the shipping address. This is the main entity in my visualization that I am going to focus on.

In my first step, I download all transactions from the call center which are coded as concession transactions including the service request number, the service request date, the customer ID, the agent name or number, the concession which was sent out and if possible, the tracking number of the concession package.

In my next step, I download the customer and shipping information from my database. I want to ensure that I capture all fields in the shipping database which will accurately identify and make unique, the location the concessions where sent to. If the agent is involved in fraud, they are going to alter the names and phones, however the address will have to be correct for the scheme to work. Agents who are very good at committing this type of fraud will alter the addresses enough to avoid detection but to still ensure delivery. We can counteract this tactic in visual analysis by conducting semantic matches between entities which will detect patterns of inverted numbers, names, slight misspellings or the addition of small pieces of information in the shipping address.

Now that I have downloaded the data I need for my visual analysis, since this data came from two different sources, I am going to need to join the two tables or files of data to make a flat file or view for import.

Once I have joined and cleaned my data of nulls and bad values I am ready to import my data into my visualize it. The schema that I will use for this analysis will be the call center agent linked to the service request to the customer to the customer's shipping address, phone and email.

After completion of the initial import of all my data, plan for the visualization to appear as the example below. The reason for such large clusters of data is because of the call center agent entity. Through normal transactions, multiple agents are going to link together by joining customers, this is not an indicator of fraud.

The good thing is that for my first cluster analysis of this data, I am not even going to use the call center agent at all in my visualization. By filtering out the call center agent and temporarily hiding that entity I can focus on customer clusters linked by address and service request. By looking for groups of customers which are linked together, I can identify possible destinations for fraudulent concessions, friends of agents which are being shipped merchandise or in the case of external fraud, individuals who are taking advantage of my call center reps to gain free merchandise.

I am moving on to my next step and hiding my call center agents to look for customer clusters, don't worry we will bring the agents back shortly. I am going to focus on the largest clusters of interrelated customers which my visualization tool is going to sort for me left to right in my chart.

My largest cluster of interrelated customers is involved in ten different service requests in which the customer was shipped a concession by the call center agent. There are seven different names associated with this cluster, however they are all linked to the same address which is extremely suspicious.

My next step is to leverage visual semantic searching across all my customer entities and attributes to detect attempts to create the appearance of two separate addresses by changing small details in the data such as 123 main street and 123 main st. A strong visual analysis program, including the tool used in the example, i2, will incorporate smart matching to detect these entities such as in the example below.

Once smart matching is completed and ensure that my clusters contain all linked entities and data, I am going to break out my largest cluster and incorporate in the call center agent or agents which created the service requests linked to the customer cluster.

Now before we go further, there are two possible scenarios which may be occurring with large clusters of interrelated customers, both of which will be detectable when we bring back our call center agents. First, there is a group of customers who are taking advantage of my company and call center by acquiring concessions or merchandise through false pretenses. If this is the case, then the service requests will be linked to different to different agents. The second will be an agent sending out concessions or merchandise to individuals fraudulently in which case there will two indicators, all of the service requests will linked to the same agent and the customer profile will have been created by the call center agent because no service call ever existed.

Lets bring in the call center agent entity and see which scenario exists. From the visualization we see below, all of the interrelated customers and service requests are linked to two agents in which multiple concessions were sent to the same address.

The same call center agent linked to multiple service requests

Group of customers all linked to the same address

Call center agent linked to multiple service requests to customers linked to same address

To complete my analysis I am going to examine who created the customer profiles for each of the customers in my visualization and also incorporate the call logs for the agents during the dates and time these service requests were created to determine if an actual call was inbound to the agent when the service requests were created.


From the visualization examples shown, we have identified two call center agents who are actively engaged in fraudulently shipping concessions (merchandise) to individuals for the purpose of converting it for their own use.

All of the service requests in this cluster occurred over a two week period and all were linked to different individuals living at the same address.

For a strong proactive deterrence to this type of fraud, a regular schedule for visualizing concessions should be performed based on the velocity of calls, numbers of agents and locations so that the analyst doesn't end up in information overload. For example if in my organization, I have ten call centers with 100 agents in five different countries handling 1000 transactions a month, I might want to schedule my analysis on a weekly basis to best identify the activity without being overwhelmed by the data.

By leveraging visual analysis in my fraud investigation and deterrence program, I can add another level of security for my call center, company and client, allowing for timely identification of fraudulent internal and external schemes.

Solving Crimes Through Multi-Source Data Visualization

There are very few cases where all the data you need to complete your analysis resides in one single database. In most link analysis examples, including the majority in my site, I give examples of visual analysis through the import of one data set. While this helps explain the theory behind the analysis being performed, in real time situations, the answers you are looking for rarely reside in one place.

The more complicated the crime or threat, the more disparate the data sources to arrive at a solution through visual analysis. For example, in eCommerce fraud, I rely on data from my transaction platform, order platform, account platform and log in records to provide a complete analysis of the threat being analyzed.

For this example I am going to use a scenario where the analyst is investigating a series of hotel burglaries taking place at a hotel property. I am going to show that by importing and layering data from multiple sources provides a complete picture of the activity that is occurring and a solution to the crimes.

Inventorying The Data and Data Needs

Approaching a case from an analyst standpoint is similar to the way an investigator approaches a new case. The analyst needs to understand the scenario of the threat and then conceptualize the possible sources of information that can be obtained to complete the analysis. The first step is to inventory all the data on hand and the data that you will need to begin a visualization of the case. Just like any investigation, there are going to be additional data needs to complete the analysis, but getting your arms around what you need to start will save you time and aggravation, especially some of the data you are going to need requires subpoena or access by other data administrators.

In this example, I am being to asked to perform a visual analysis of hotel burglaries. I know that there have been multiple burglaries from rooms over the past week. I know the hotel has an electronic key system that logs all entries into rooms into the hotels server, my first source of data.

From reviewing the incident reports, one of the items that is stolen frequently are cell phones. There is a good chance that whomever is responsible for the burglaries has also made calls on the stolen phones which can assist me with my analysis, my second source of data.

Another commonly stolen item from these burglaries is jewelry. Knowing that thieves often pawn stolen jewelry for case, I can access my departments pawn ticket database and integrate that into my analysis, my third source of data.

I have access to my departments case management system so I can download all of the incident report data and integrate that into my analysis, my fourth data source.

Starting The Visualization

Now that I have inventoried and obtained the data I require to perform my analysis, my next step is to decide the best way to integrate all the data from the different sources I have into one visualization.

One of the issues that arise when using data from multiple sources is that the formatting and structure of each data source is going to be different, requiring cleaning and planning prior to importing it into you visualization. The type of analysis and threat is also going to dictate the type of visualization you are going to need in order to produce a result.

In this example I have two options. This is a series of hotel burglaries which may be committed by multiple individuals who are interrelated so an entity association chart might be an option. On the other hand, all of the data I have is time bound, hotel key logs, cell phone records and incident reports, so a time bound theme chart might be best.

Since all of my data is structured by date and time, I am going to begin with a theme chart layout that is time bound to a time line for my analysis. The first item I am going to import in is my RMS data (incident report data) to establish a base for my time line of events.

From this point, I can visualize the dates and times that the hotel burglaries occurred which will help me parse the other data I have by date. From the key log entry files I have obtained from the hotel, I can filter my import by limiting the access logs to the time span of the events.

This data contains the rooms, date and time of entry and if name of the person assigned to the card at the time. To best visualize this data I am going to incorporate the date and time stamp on the entry records to the theme line but link those event frames to the person gaining entry by creating a link association between the event frame and the entity who the card was assigned to.

Now that I have integrated the key logs and the RMS data into the theme line, I can examine and focus on those individuals who have the most associations between the data in the hotel key log and the RMS data.

I begin grouping together the burglary events by the type of event and the items that were removed. This will help me when I integrate in the data I have from the pawn ticket database and the cell phone records from the stolen items.

Now I have an overview of the incidents and those individuals which have links to more then three of the incidents. My goal now is to integrate the rest of my data and try to draw a link between the incidents and players involved.

At this points I am going to import in the cell phone records as a directional link chart to show the originating and destination phone numbers to see if I can link any of the numbers to anyone in my ring.

Once I have visualized the cell phone records of the stolen phones, there is another cell phone number that all three of stolen phones have called. Using my reverse directory I am able to link the cell phone that the stolen phones called to one of the employees at the hotel.

The employee that I identified who received calls from the stolen phones wasn't linked to the rooms where the phones were stolen, however the employee was linked to another employee who entered each of the rooms where cell phones were stolen.

Next I import in my pawn ticket data that matched the items that were stolen from the rooms where jewelry was removed. The pawn ticket was linked to an entity that I did not have on my chart from the room access log files so my next step is to import the NCIC report from the individual to see if I can link the name on the pawn ticket to one my suspects on the chart. If this were a commercial analysis as opposed to law enforcement, we could do the same with a lexis nexis report.

After importing the NCIC data into my chart I am able to link an alias from the palm ticket to the individual being called on the stolen cell phones, an employee at the hotel.


By importing multi-source data from my record system, hotel room log server, NCIC, pawn ticket database and cell phone records I am able to visualize and link the multiple hotel room burglaries to two individuals, both work at the hotel.

Multi-source analysis is almost impossible with a visual analysis tool. Drawing relationships between disparate data sources that can not be linked together by rational fields can only be accomplished through visual analysis.

In this scenario, if I had omitted even one of my data sources for my analysis, I would not have been able to link the incidents together or narrow the list of potential suspects to two. By carefully inventorying all of the data I had available for my analysis and then carefully planning how I was going to visualize it, I was able to produce a time bound theme line and link chart showing the entire investigation, the source data and the suspects for investigation.