Welcome to Understanding Link Analysis. The purpose of my site is to discuss the methods behind leveraging visual analytics to discover answers and patterns buried within data sets.

Visual analytics provides a proactive response to threats and risks by holistically examining information. As opposed to traditional data mining, by visualizing information, patterns of activity that run contrary to normal activity surface within very few occurances.

We can dive into thousands of insurance fraud claims to discover clusters of interrelated parties involved in a staged accident ring.

We can examine months of burglary reports to find a pattern leading back to a suspect.

With the new generation of visualization software our team is developing, we can dive into massive data sets and visually find new trends, patterns and threats that would take hours or days using conventional data mining.

The eye processes information much more rapidly when information is presented as images, this has been true since children started learning to read. As our instinct develops over time so does our ability to process complex concepts through visual identification. This is the power of visual analysis that I focus on in my site.

All information and data used in articles on this site is randomly generated with no relation to actual individuals or companies.

Introducing User Behavioral Analysis in the Risk Process

Many years ago when I was entering the intelligence community, I attended a class in Virginia where the instructor opened the session with a test that I will never forget and that I have applied to almost every analytic task in my career. At the beginning of the class we were shown a ten-minute video of grand central station at rush hour with tens of thousands of people and were asked if we could find a single pickpocket in the crowd by the end of the video.  At the end of ten minutes no one in the class was able to identify the individual.

The purpose of the class was to stress the importance of looking for behavior at as indication that something is wrong, looking for a person or thing which is not doing or behaving the same way everything else is.  By understanding what is normal in any given place or situation, identifying threats becomes much easier because outliers stand out.  If you want to find a pickpocket in a crowd of people you don’t attempt to look at every person, you understand what the crowd is doing and look for the person who is not doing what the crowd is.

By the end of the class, the video was shown again and every person was able to find the pickpocket within the first three minutes.  Everyone understood how this principal could be applied to everything from counter-terrorism to personal protection and as I realized after entering the field, fraud.  The person who is committing fraud does not behave the same way legitimate people do online.  Even their attempts to look “normal” make them stand out, because their goal is completely different from everyone else on your site.

Recently I conducted an experiment to see what percentage of users who had committed organized fraud transactions could have been identified by their behavior before they ever made a transaction.  By looking at these individual’s interactions with the platform from account inception through the transaction process, I examined if this group did anything that a legitimate user would not and if that action was exclusive of individuals committing fraud.

I used a year’s worth of fraud events, hundreds of thousands of fraud transactions and began dissecting their activity down to the click.  At the end, 93% of individuals who had engaged in organized fraud could have been detected before they ever made a transaction based on their behavior with the bulk identifiable at account creation.  Of that 93%, over half could have been identified by the first eight things they did on the site based on how the entered, their settings, their network and flow of interaction from entry to attempted activation.

Even worse, and it was something that I hadn’t planned on, 12% of rejected transaction for organized fraud were false positives and misclassified.  If behavioral analysis had been utilized, it would have detected that these were legitimate users and applied a different decision.

The impulse of every company engaged in online commerce is develop rules and models aimed at identifying fraud in the transaction flow.  We throw hundreds of rules that look at thousands of data points when a person is buying something, writing a post or review or conducting a financial transaction.  Over the years, these rules continue to grow and weave their way through the commerce or submission flow in our operation and they funnel every single person and every single transaction through them.  We continue to scale as transaction volume grows, adding resources at bandwidth to the rules platform as the people and the rules themselves continue to multiply.

Ultimate our fraud platforms struggle with latency and conflict because we are aiming every big gun we have at every individual who walks through our door and because we keep adding guns they eventually start pointing at themselves creating conflicts within the fraud infrastructure and friction to the user, regardless of who that user is because none of these rules simply look at the risk of the individual.  In the end, more transactions are pushed to manual review which adds cost or worse, falsely rejected due to a lack of intelligence about the user themselves.

Fraud rules, fraud filters and fraud models are essential and play an important part of any fraud prevention platform but they a reactionary not strategic.  They do not know the difference between an individual who entered your site three months ago shopping for a TV, returned to your site 20 times in the following weeks looking at the price, reviews, specs and pictures of TV’s you are selling and then finally makes a purchasing decision on the best and most expensive TV you have and the guy who appeared out of the blue, set up an account, went directly to the most expensive TV you have, threw it in the cart and hit the purchasing flow.  The rules are going to treat both of these transactions exactly the same because they are “new” users buying a high-risk item.

By starting the fraud identification process by analyzing the behavior of users, particularly new users who enter your site or establish an account we become strategic an solve any number of issues in our fraud scoring process.  Much in the same way we classify transactions in risk, our behavior model is looking for attributes and actions that run contrary to what legitimate users do on the site and has to begin with an understanding of what that is (I recommend you cozy up with your user experience team, they have piles of information on that very thing).

Once there is a good understanding of normal user activity and behavior, we start training models that look for the opposite, and since you know what good behavior is, finding the abnormal behavior becomes much easier (see paragraph one).  This can be visualized but scatter plotting behavioral attributes across millions of activities and users.

Our behavioral model is going to look at attributes and actions and begin scoring them so we can tell the difference between a risky new user and a good new user (new users on first transactions are the hardest to classify and the highest risk in the transaction flow).  It will look for things such as the users network, did they enter on a hosted service, proxy, botnet or in some way are they trying to disguise where they are and who they are.  Do they have java, flash or cookies disabled?  Do they enter the site and go directly to sign-up flow or account creation?  What is the delta between the amount of time it takes them to complete the sign-up flow, is it 20 seconds when ever legitimate user takes 2 minutes?  What type of operating system, browser and resolution (user agent attributes) are they on, is it unique?  What language localization are they using and does it match where the other attributes and user entered data say they are from?  What are the click patterns, are they too efficient in getting from entry to transaction for a normal user?

This could go on for some time and there are literally thousands of things to be gleaned from user and page logs that have enormous value in fraud detection that are hardly ever used anywhere.  The behavioral model should be looking at everything you can feed into it from entry stopping at transaction, this model doesn’t care about transactions its job is to make a decision on what to do with this user before they make it there.

A good example of strategic intervention over tactical or reactive intervention is the comparison of airline security in the United States to Israel. In the US, when you go to the airport that starts the screening process and everyone is lumped together and fed through the same security process to identify risk.  Everyone goes through the same security line, the same x-ray, the same scanner and even random selection does not take into account anything about you, its just a random number.  A five year old is basically looked at the same way a military age individual is when it comes to the overall process, which is why you always see the funny and yet disturbing videos of five year olds getting pat down, because their number was up.

In Israel, the airlines security knows as much as they can about you before you even show up at the airport.  By the time you arrive they know about you and your network, why your there, where you’re going, why you’re going and anything else they can possibly dig up about you.  They have already risked scored you from the minute you bought the ticket and decided what kind of screening you are going to have at the airport and if you are still high risk, which security agent is going to be sitting in the seat behind you ready to blow your head off if you jump to quick.  Strategic vs tactical risk management or in our case proactive vs reactive fraud detection.

Once the behavior model is built and beginning to score and surface risk, it can begin solving fraud and platform issues.  The first process would to begin cohorting users into different risk groups to determine what fraud rules and models apply based on the behavior.  A new user who exhibited normal or predicted user behavior before entering the transaction flow would not be subject to the same filters that a user who showed highly suspicious behavior.

By cohorting users, we can begin more accurately “vetting” users in the transaction flow and also redirect traffic to our fraud platform to improve scalability and friction.  An established user with a good behavior profile and predictable buying pattern could bypass all but exclusion rules freeing up bandwidth for users who exhibit high risk behavior that would run the gauntlet of our fraud platform.

Likewise, a user who exhibits known fraudulent behavior doesn’t need to be routed through the fraud platform at time of transaction, they can bypass the models to rejection.  We can tune our cohort groups to optimize manual review of those users in specific risk groups who have a higher likelihood of transaction completion.

By implementing behavior modeling, deep diving and analyzing the data from the users interactions prior to the transaction flow, we get a better understanding of the user’s intent on our site, we gain efficiency and bandwidth in our transactional fraud process and a greater accuracy when making risk based decisions.