Welcome to Understanding Link Analysis. The purpose of my site is to discuss the methods behind leveraging visual analytics to discover answers and patterns buried within data sets.

Visual analytics provides a proactive response to threats and risks by holistically examining information. As opposed to traditional data mining, by visualizing information, patterns of activity that run contrary to normal activity surface within very few occurances.

We can dive into thousands of insurance fraud claims to discover clusters of interrelated parties involved in a staged accident ring.

We can examine months of burglary reports to find a pattern leading back to a suspect.

With the new generation of visualization software our team is developing, we can dive into massive data sets and visually find new trends, patterns and threats that would take hours or days using conventional data mining.

The eye processes information much more rapidly when information is presented as images, this has been true since children started learning to read. As our instinct develops over time so does our ability to process complex concepts through visual identification. This is the power of visual analysis that I focus on in my site.

All information and data used in articles on this site is randomly generated with no relation to actual individuals or companies.

Creating A Learning Risk Model

Risk modeling is used extensively in all areas of commercial fraud analysis. Financial institutions were among the first to leverage risk modeling for financial transactions led by credit reporting agencies which assigned scores to establish indicators of credit worthiness. Slowly, risk modeling began being incorporated into insurance companies in an effort to identify claims which have a higher probability of fraud. As eCommerce began to grow, risk modeling was incorporated into online transactions to provide real time rules and risk scores to pay in and pay out transactions.

Commercial companies have become reliant on risk modeling to be the gatekeeper for transactions flowing in and out of their workspace, driven by the rising rate of fraud and the increase of cost related to manual review of transactions. The bottom line became, the better the model, the less per cost of transaction.

The downside to heavy reliance on risk modeling is that the rules and methodology the risk models rely on are derived from known risk transactions, investigation and analysis. Because fraud is constantly evolving, so must the risk model, to the point that the cost of manual review has been transferred to risk model evolution.

Companies must fine tune the balancing act between the amount of resources allocated to fraud and risk analysis, investigation and statistical analysis which feeds the risk model and the amount of time dedicated to updating the model itself.

When we talk about a learning risk model, it's important to be clear that a risk model itself never learns on it's own. The risk model simply incorporates the knowledge gained from analysis, investigation and lessons learned to prevent the same scenarios from reoccurring but it cannot stop new scenarios.

What can make the risk model more responsive is the development of a pipeline from your fraud analytical, investigation and review process into your risk model in real time. This allows companies to navigate the balancing act between analytical and investigation resources and fraud model maintenance more effectively.

The Partnering of Fraud Analysis and Investigation with Risk Modeling

Fraud analysis should be an ongoing process in commercial vector. Your fraud analysts and investigators and your best defense against new and emerging threats to your company and assets.

Resources are available to augment the ability and decrease the time it takes for your fraud analysts to detect, deter and investigate new threats such as visual analysis tools (link analysis, association analysis, time line analysis). Statistical analysis tools such as i2 Analyst Workstation which can identify anomalies or velocities which exists in attributes used in your business and bring those to a visual analysis tool for identification and verification. SQL Server and SQL analytical services can be used to build transactional cubes around massive amounts of data to detect irregularities in data which might be indicators of fraud or threats. The better equipped your fraud analysts are, the more data which can be examined in the shortest amount of time.

Time and resources is where risk modeling falls into the risk mitigation equation. Risk and fraud analysis is resource intensive endeavor. You want your analysts constantly focusing on new threats and trends, not fighting those trends they have already detected, that is the job of the risk model.

The problem is there has always been a lag time between when a threat or trend is identified by analysis, investigation or manual review, and when that trend or threat is mitigated through new rules within the risk model. This is the conflict in the balancing act because the faster rules are updated in the risk model, the more time your analysts, investigators or manual reviewers can spend on new threats.

Creating the Learning Model

The best way to illustrate the advantages and process of a learning model is to give you a real life scenario which happens to me all the time. I travel on a regular basis to the Philippines and like most travelers, I regularly use my credit card when I travel. Without fail, the first day I am in the Philippines and use my card, the authorization fails, usually when I am with a group of friends to maximize the embarrassment which is a whole separate issue. I must contact my credit card company in the U.S. and explain to them that I am traveling and provide validation information at which time they turn my card back on again.

This is an example of what happens if you are dependent on a non-learning risk model. Somewhere in time, an analyst at my credit card company determined a correlation between potential credit card transaction through geo-spatial analysis of the credit card holder's location. This was probably a really good idea as a person who lives in Washington and uses their credit card on Tuesday in their home state then on the same day makes a purchase in Manila has a greater chance of their account being compromised.

What if, however, a person lives in Washington and uses their credit card in their home State on Tuesday then travels to Manila and uses their credit card in that location on Wednesday? If that person had never been to the Philippines before then of course, that transaction would carry enough risk to cause a decline pending verification. But what if that person travels on a regular basis to the Philippines? That persons account would have a history of purchases made in both their home state and the Philippines, something that a learning model could leverage to make a better risk assessment and thus a better customer experience.

The scenario given is an example of poor risk modeling, though with all the right intentions. There are situations where the rate of fraud for an area is so high that an across the board risk rule is to validate any transaction coming from that area or to ban transactions completely from a high risk area in extreme circumstances. This is the exception to the rule however, and illustrates that making global risk rules, with all good intention at the time and based on the best analysis, can turn against you if your risk model fails to learn the patterns of it's subjects.

In this scenario, if it was my first time ever going to the Philippines, it would be crazy for the risk model not to flag my transaction. I had used my card less then 24 hours ago in Seattle, then used it again 24 hours later, half-way around the world. But lets throw in the concept of a learning model based on the assumption that this is my first time traveling to the Philippines. The risk model determines that I have never been to the Philippines before and declines my transaction, sending it to manual review. I call my credit card company, provide validation information and the representative approves the transaction, from that point on, every time I use my card in the Philippines until I leave, it's approved.

Two months go by and once again I fly to the Philippines for work and to see what the sun looks like again. I buy a latte at the Seattle airport with my credit card and wave goodbye to the rain as I board my flight. I arrive in Manila, and as a nod to the sun gods, I buy a pair of sunglasses at the Manila airport. This time the risk model cues up on the fact that I am geographically separated from my last my purchase and begins whirling. The difference is this time, the risk model leverages the validation information which was inputted by the credit card representative on my last trip. The risk model takes a look at the transaction, which is for $20, takes into consideration that I have a history of previously validated transactions from the Philippines, and decides this time to keep on eye on my account for unusual purchasing behavior, but allows my transaction to go through based on my history. The risk model has just learned from previous human review and analysis, that my transaction is not abnormal and I don't get embarrassed at Ninoy's House of Sunglasses by a decline.

Now lets talk about a learning risk model based on the discovery of a new fraud trend through visual analysis. I am a fraud analyst with Andrew's Credit Card Company and I am performing a visual analysis on the last 24 hours of confirmed fraudulent transactions to see if I can establish a pattern to the most recent activity which has eluded my fraud model.

During the course of my analysis, I begin looking at common purchase points for the fraudulent transactions (CPP analysis). I discover through link analysis, that I can connect 150 separate fraudulent transactions to a single CPP by visualizing the credit card holders transaction history. In this case, each of the card holders history indicates a purchase made at a single CPP just prior to the fraudulent transaction being made. A further attribute I discover is that each of the fraudulent purchases were Card Not Present transactions, however the account number and security validation code were entered. All of the connected transactions were made from a wide variety of eCommerce sites for electronic equipment over $500. The common purchase point for all the accounts prior to the fraudulent transaction was from Joe Bob's BBQ in Walla Walla Washington, quite a ways off from Dallas TX.

Here are the common attributes I have been able to identify surrounding the fraudulent transaction:

1. All the accounts affected by the fraudulent transactions had a CPP history of being used at Joe Bob's BBQ in Walla Walla Washington.
2. The next transaction after the CPP were for electronic goods over $500 at a wide variety of eCommerce sites within 24 hours of the transactions at Joe Bob's.
3. Each of the fraud transactions were "card not present" transactions but passed authentication through the security code.

As the analyst, I have identified a potential fraud breach at Joe Bob's. There is either an employee at Joe Bob's skimming credit card data from the customer's magnetic strip or Joe Bob's has a data leak that a hacker is using to steal credit card data from Joe Bob's network. I am going to forward my analysis to the investigators to follow up on, but most importantly I have discovered a new fraud trend which I need to quantify and feed into my fraud modeling as soon as possible to mitigate any further loss.

By utilizing the attributes from the fraud I discovered through visual analysis, I write a query into my transaction database which searches out all transactions made from my compromised CPP, where there is a transaction for >$500 following it. I could narrow it down even more by writing where the transaction was for electronic goods but I don't know that to be a positive indicator yet until I review all the transactions in my visualization.

Based on my query, I find 255 more transactions which share the same attributes but have not been reported as fraud by the customer yet. I pull this data into my i2 visualization and am able to establish confirmed links between the other 255 suspect transactions and the 150 confirmed fraudulent transactions.

My risk model is established as a learning model which pulls in attributes from transactions discovered as fraud by the analysts or manual review team through an indicator in the database, let's call it a "confirmed fraud" field in the transaction table. As the analyst I compose a query to write a "confirmed fraud" indicator into the field which is picked up by the fraud model and dumped into the "scum pond".

The scum pond is separate database which captures all the data attributes from confirmed fraud and allows the risk model to leverage the information to learn from past fraud trends. In this case, the analyst has marked 405 transactions discovered as fraudulent with a "confirmed fraud' indicator. The transactions which were marked included the actual fraudulent transactions as well as the transactions from the CPP which was discovered by the analyst. (Just a note, there is actually allot more involved in creating and marking transactions for the scum pond and the learning model).

The scum pond picks up these transactions through it's own process which looks for new transactions with the confirmed fraud indicator. The scum pond grabs all the attributes from the transaction and categorizes it based on the nature of the fraud. In this case the scum pond takes into account the relation between the CPP and a following transactions for >$500. These are attributes which the risk model can reference to make real time transaction decisions in the future. The next transactions that hits the risk model which is for electronic equipment over $500 where a past transaction had occurred at the compromised CPP, the risk model will elevate that transaction by referencing the attributes in the scum pond.

There was no need for new rules to be hard written into the risk model and there was very little lag time between the discovery of the new fraud trend by the analyst and the incorporation of the intelligence into the risk model. Most importantly, we are not escalating every transaction from accounts that had been used at the CPP, but only those which match the criteria discover by the analyst, saving money and customer aggravation.

If the risk model elevates more transactions which have the same attributes contained in the scum pond, emanating from the same CPP, the risk model can alert the analyst of a high percentage of account compromise from the discovered scenario and a decision to re-issue new cards to all the account holders can be made based on the exposure of the compromise.