Welcome to Understanding Link Analysis. The purpose of my site is to discuss the methods behind leveraging visual analytics to discover answers and patterns buried within data sets.

Visual analytics provides a proactive response to threats and risks by holistically examining information. As opposed to traditional data mining, by visualizing information, patterns of activity that run contrary to normal activity surface within very few occurances.

We can dive into thousands of insurance fraud claims to discover clusters of interrelated parties involved in a staged accident ring.

We can examine months of burglary reports to find a pattern leading back to a suspect.

With the new generation of visualization software our team is developing, we can dive into massive data sets and visually find new trends, patterns and threats that would take hours or days using conventional data mining.

The eye processes information much more rapidly when information is presented as images, this has been true since children started learning to read. As our instinct develops over time so does our ability to process complex concepts through visual identification. This is the power of visual analysis that I focus on in my site.

All information and data used in articles on this site is randomly generated with no relation to actual individuals or companies.

Solving Crimes Through Multi-Source Data Visualization

There are very few cases where all the data you need to complete your analysis resides in one single database. In most link analysis examples, including the majority in my site, I give examples of visual analysis through the import of one data set. While this helps explain the theory behind the analysis being performed, in real time situations, the answers you are looking for rarely reside in one place.

The more complicated the crime or threat, the more disparate the data sources to arrive at a solution through visual analysis. For example, in eCommerce fraud, I rely on data from my transaction platform, order platform, account platform and log in records to provide a complete analysis of the threat being analyzed.

For this example I am going to use a scenario where the analyst is investigating a series of hotel burglaries taking place at a hotel property. I am going to show that by importing and layering data from multiple sources provides a complete picture of the activity that is occurring and a solution to the crimes.

Inventorying The Data and Data Needs

Approaching a case from an analyst standpoint is similar to the way an investigator approaches a new case. The analyst needs to understand the scenario of the threat and then conceptualize the possible sources of information that can be obtained to complete the analysis. The first step is to inventory all the data on hand and the data that you will need to begin a visualization of the case. Just like any investigation, there are going to be additional data needs to complete the analysis, but getting your arms around what you need to start will save you time and aggravation, especially some of the data you are going to need requires subpoena or access by other data administrators.

In this example, I am being to asked to perform a visual analysis of hotel burglaries. I know that there have been multiple burglaries from rooms over the past week. I know the hotel has an electronic key system that logs all entries into rooms into the hotels server, my first source of data.

From reviewing the incident reports, one of the items that is stolen frequently are cell phones. There is a good chance that whomever is responsible for the burglaries has also made calls on the stolen phones which can assist me with my analysis, my second source of data.

Another commonly stolen item from these burglaries is jewelry. Knowing that thieves often pawn stolen jewelry for case, I can access my departments pawn ticket database and integrate that into my analysis, my third source of data.

I have access to my departments case management system so I can download all of the incident report data and integrate that into my analysis, my fourth data source.

Starting The Visualization

Now that I have inventoried and obtained the data I require to perform my analysis, my next step is to decide the best way to integrate all the data from the different sources I have into one visualization.

One of the issues that arise when using data from multiple sources is that the formatting and structure of each data source is going to be different, requiring cleaning and planning prior to importing it into you visualization. The type of analysis and threat is also going to dictate the type of visualization you are going to need in order to produce a result.

In this example I have two options. This is a series of hotel burglaries which may be committed by multiple individuals who are interrelated so an entity association chart might be an option. On the other hand, all of the data I have is time bound, hotel key logs, cell phone records and incident reports, so a time bound theme chart might be best.

Since all of my data is structured by date and time, I am going to begin with a theme chart layout that is time bound to a time line for my analysis. The first item I am going to import in is my RMS data (incident report data) to establish a base for my time line of events.

From this point, I can visualize the dates and times that the hotel burglaries occurred which will help me parse the other data I have by date. From the key log entry files I have obtained from the hotel, I can filter my import by limiting the access logs to the time span of the events.

This data contains the rooms, date and time of entry and if name of the person assigned to the card at the time. To best visualize this data I am going to incorporate the date and time stamp on the entry records to the theme line but link those event frames to the person gaining entry by creating a link association between the event frame and the entity who the card was assigned to.

Now that I have integrated the key logs and the RMS data into the theme line, I can examine and focus on those individuals who have the most associations between the data in the hotel key log and the RMS data.

I begin grouping together the burglary events by the type of event and the items that were removed. This will help me when I integrate in the data I have from the pawn ticket database and the cell phone records from the stolen items.

Now I have an overview of the incidents and those individuals which have links to more then three of the incidents. My goal now is to integrate the rest of my data and try to draw a link between the incidents and players involved.

At this points I am going to import in the cell phone records as a directional link chart to show the originating and destination phone numbers to see if I can link any of the numbers to anyone in my ring.

Once I have visualized the cell phone records of the stolen phones, there is another cell phone number that all three of stolen phones have called. Using my reverse directory I am able to link the cell phone that the stolen phones called to one of the employees at the hotel.

The employee that I identified who received calls from the stolen phones wasn't linked to the rooms where the phones were stolen, however the employee was linked to another employee who entered each of the rooms where cell phones were stolen.

Next I import in my pawn ticket data that matched the items that were stolen from the rooms where jewelry was removed. The pawn ticket was linked to an entity that I did not have on my chart from the room access log files so my next step is to import the NCIC report from the individual to see if I can link the name on the pawn ticket to one my suspects on the chart. If this were a commercial analysis as opposed to law enforcement, we could do the same with a lexis nexis report.

After importing the NCIC data into my chart I am able to link an alias from the palm ticket to the individual being called on the stolen cell phones, an employee at the hotel.


By importing multi-source data from my record system, hotel room log server, NCIC, pawn ticket database and cell phone records I am able to visualize and link the multiple hotel room burglaries to two individuals, both work at the hotel.

Multi-source analysis is almost impossible with a visual analysis tool. Drawing relationships between disparate data sources that can not be linked together by rational fields can only be accomplished through visual analysis.

In this scenario, if I had omitted even one of my data sources for my analysis, I would not have been able to link the incidents together or narrow the list of potential suspects to two. By carefully inventorying all of the data I had available for my analysis and then carefully planning how I was going to visualize it, I was able to produce a time bound theme line and link chart showing the entire investigation, the source data and the suspects for investigation.