Welcome to Understanding Link Analysis. The purpose of my site is to discuss the methods behind leveraging visual analytics to discover answers and patterns buried within data sets.

Visual analytics provides a proactive response to threats and risks by holistically examining information. As opposed to traditional data mining, by visualizing information, patterns of activity that run contrary to normal activity surface within very few occurances.

We can dive into thousands of insurance fraud claims to discover clusters of interrelated parties involved in a staged accident ring.

We can examine months of burglary reports to find a pattern leading back to a suspect.

With the new generation of visualization software our team is developing, we can dive into massive data sets and visually find new trends, patterns and threats that would take hours or days using conventional data mining.

The eye processes information much more rapidly when information is presented as images, this has been true since children started learning to read. As our instinct develops over time so does our ability to process complex concepts through visual identification. This is the power of visual analysis that I focus on in my site.

All information and data used in articles on this site is randomly generated with no relation to actual individuals or companies.

Introducing User Behavioral Analysis in the Risk Process

Many years ago when I was entering the intelligence community, I attended a class in Virginia where the instructor opened the session with a test that I will never forget and that I have applied to almost every analytic task in my career. At the beginning of the class we were shown a ten-minute video of grand central station at rush hour with tens of thousands of people and were asked if we could find a single pickpocket in the crowd by the end of the video.  At the end of ten minutes no one in the class was able to identify the individual.

The purpose of the class was to stress the importance of looking for behavior at as indication that something is wrong, looking for a person or thing which is not doing or behaving the same way everything else is.  By understanding what is normal in any given place or situation, identifying threats becomes much easier because outliers stand out.  If you want to find a pickpocket in a crowd of people you don’t attempt to look at every person, you understand what the crowd is doing and look for the person who is not doing what the crowd is.

By the end of the class, the video was shown again and every person was able to find the pickpocket within the first three minutes.  Everyone understood how this principal could be applied to everything from counter-terrorism to personal protection and as I realized after entering the field, fraud.  The person who is committing fraud does not behave the same way legitimate people do online.  Even their attempts to look “normal” make them stand out, because their goal is completely different from everyone else on your site.

Recently I conducted an experiment to see what percentage of users who had committed organized fraud transactions could have been identified by their behavior before they ever made a transaction.  By looking at these individual’s interactions with the platform from account inception through the transaction process, I examined if this group did anything that a legitimate user would not and if that action was exclusive of individuals committing fraud.

I used a year’s worth of fraud events, hundreds of thousands of fraud transactions and began dissecting their activity down to the click.  At the end, 93% of individuals who had engaged in organized fraud could have been detected before they ever made a transaction based on their behavior with the bulk identifiable at account creation.  Of that 93%, over half could have been identified by the first eight things they did on the site based on how the entered, their settings, their network and flow of interaction from entry to attempted activation.

Even worse, and it was something that I hadn’t planned on, 12% of rejected transaction for organized fraud were false positives and misclassified.  If behavioral analysis had been utilized, it would have detected that these were legitimate users and applied a different decision.

The impulse of every company engaged in online commerce is develop rules and models aimed at identifying fraud in the transaction flow.  We throw hundreds of rules that look at thousands of data points when a person is buying something, writing a post or review or conducting a financial transaction.  Over the years, these rules continue to grow and weave their way through the commerce or submission flow in our operation and they funnel every single person and every single transaction through them.  We continue to scale as transaction volume grows, adding resources at bandwidth to the rules platform as the people and the rules themselves continue to multiply.

Ultimate our fraud platforms struggle with latency and conflict because we are aiming every big gun we have at every individual who walks through our door and because we keep adding guns they eventually start pointing at themselves creating conflicts within the fraud infrastructure and friction to the user, regardless of who that user is because none of these rules simply look at the risk of the individual.  In the end, more transactions are pushed to manual review which adds cost or worse, falsely rejected due to a lack of intelligence about the user themselves.

Fraud rules, fraud filters and fraud models are essential and play an important part of any fraud prevention platform but they a reactionary not strategic.  They do not know the difference between an individual who entered your site three months ago shopping for a TV, returned to your site 20 times in the following weeks looking at the price, reviews, specs and pictures of TV’s you are selling and then finally makes a purchasing decision on the best and most expensive TV you have and the guy who appeared out of the blue, set up an account, went directly to the most expensive TV you have, threw it in the cart and hit the purchasing flow.  The rules are going to treat both of these transactions exactly the same because they are “new” users buying a high-risk item.

By starting the fraud identification process by analyzing the behavior of users, particularly new users who enter your site or establish an account we become strategic an solve any number of issues in our fraud scoring process.  Much in the same way we classify transactions in risk, our behavior model is looking for attributes and actions that run contrary to what legitimate users do on the site and has to begin with an understanding of what that is (I recommend you cozy up with your user experience team, they have piles of information on that very thing).

Once there is a good understanding of normal user activity and behavior, we start training models that look for the opposite, and since you know what good behavior is, finding the abnormal behavior becomes much easier (see paragraph one).  This can be visualized but scatter plotting behavioral attributes across millions of activities and users.

Our behavioral model is going to look at attributes and actions and begin scoring them so we can tell the difference between a risky new user and a good new user (new users on first transactions are the hardest to classify and the highest risk in the transaction flow).  It will look for things such as the users network, did they enter on a hosted service, proxy, botnet or in some way are they trying to disguise where they are and who they are.  Do they have java, flash or cookies disabled?  Do they enter the site and go directly to sign-up flow or account creation?  What is the delta between the amount of time it takes them to complete the sign-up flow, is it 20 seconds when ever legitimate user takes 2 minutes?  What type of operating system, browser and resolution (user agent attributes) are they on, is it unique?  What language localization are they using and does it match where the other attributes and user entered data say they are from?  What are the click patterns, are they too efficient in getting from entry to transaction for a normal user?

This could go on for some time and there are literally thousands of things to be gleaned from user and page logs that have enormous value in fraud detection that are hardly ever used anywhere.  The behavioral model should be looking at everything you can feed into it from entry stopping at transaction, this model doesn’t care about transactions its job is to make a decision on what to do with this user before they make it there.

A good example of strategic intervention over tactical or reactive intervention is the comparison of airline security in the United States to Israel. In the US, when you go to the airport that starts the screening process and everyone is lumped together and fed through the same security process to identify risk.  Everyone goes through the same security line, the same x-ray, the same scanner and even random selection does not take into account anything about you, its just a random number.  A five year old is basically looked at the same way a military age individual is when it comes to the overall process, which is why you always see the funny and yet disturbing videos of five year olds getting pat down, because their number was up.

In Israel, the airlines security knows as much as they can about you before you even show up at the airport.  By the time you arrive they know about you and your network, why your there, where you’re going, why you’re going and anything else they can possibly dig up about you.  They have already risked scored you from the minute you bought the ticket and decided what kind of screening you are going to have at the airport and if you are still high risk, which security agent is going to be sitting in the seat behind you ready to blow your head off if you jump to quick.  Strategic vs tactical risk management or in our case proactive vs reactive fraud detection.

Once the behavior model is built and beginning to score and surface risk, it can begin solving fraud and platform issues.  The first process would to begin cohorting users into different risk groups to determine what fraud rules and models apply based on the behavior.  A new user who exhibited normal or predicted user behavior before entering the transaction flow would not be subject to the same filters that a user who showed highly suspicious behavior.

By cohorting users, we can begin more accurately “vetting” users in the transaction flow and also redirect traffic to our fraud platform to improve scalability and friction.  An established user with a good behavior profile and predictable buying pattern could bypass all but exclusion rules freeing up bandwidth for users who exhibit high risk behavior that would run the gauntlet of our fraud platform.

Likewise, a user who exhibits known fraudulent behavior doesn’t need to be routed through the fraud platform at time of transaction, they can bypass the models to rejection.  We can tune our cohort groups to optimize manual review of those users in specific risk groups who have a higher likelihood of transaction completion.

By implementing behavior modeling, deep diving and analyzing the data from the users interactions prior to the transaction flow, we get a better understanding of the user’s intent on our site, we gain efficiency and bandwidth in our transactional fraud process and a greater accuracy when making risk based decisions.

Pushing Fraud Upstream is the Goal

The ultimate goal of any fraud program is push detection of suspect activity as far upstream as possible.  But in reality what often happens is companies become entrenched in reactive analytics at the transaction or loss level without figuring out how the threat made it through the door in the first place.  A large percentage of suspect or fraudulent activity can be detected at time of entry or account creation before a single transaction is made and often much easier then at the transaction level.

Transaction level fraud detection is an essential component of any fraud prevention platform and there is much to be leveraged at the transaction level that lends itself to robust protection, but in order for this layer to start working a fraudulent transaction has to take place.  If in your company transaction level detection is your first line of defense, then it’s the same as leaving your door unlocked for the burglar because you have a camera inside.  For the strongest protection from fraud, a layered approach to fortifying your platform starting with the behavior of users the minute they enter the door gives a first chance at profiling risk before risk can occur. 

Most of fraud prevention is pattern detection, interrelationships between activities that should be random increase the likelihood of fraud.  The more interrelationships the higher the risk of fraud and the best way to target these interrelationships is to aim detection on the activities that generate the most velocity, know your enemy as early as possible. 

Locking the front door

With any luck the vast majority of users utilizing your site are legitimate.  They browse your site, set up an account and transact in very distinct patterns based on the user funnel you have established for them.  Since most company’s goal is to establish the site with a specific user behavior in mind, it is likely that your usability engineers already have metrics on how users interact with the site in the way it was intended and the data can be as granular as page click and movement logging.

This is exactly where we want our fraud platforms first layer of security to start, I want to be able to detect when an account is created by someone with high probably for committing fraud.  Organized fraud has several weaknesses that can be exploited, they need a network that hides their identity and location, they need a financial network to execute transactions and pay for goods or extract funds (depending on your business) and to move that money to a safe harbor and third, because they are a business they need to create many accounts in a short period of time to get a return on investment.

The reason why building a strong fraud prevention system at the account creation layer is to take advantage of that weakness in the organized fraud scheme.  Fraudsters may spread attributes and velocities across transactions more effectively but in order to commit the fraud in the first place they have create multiple accounts in order to execute on your site and they must do it in a way that runs contrary to the behavior of legitimate users on your site out of sheer necessity.

Start with examining velocity signatures on your site by attributes captured at the account creation stage.  You should be able to establish a baseline of legitimate account creation based on attributes such as user agent, IP address, tracking cookie and device fingerprint. In my case I was examining a series of organized fraud activities  and started looking at user behavior on entry for indicators where I could tie the activity to a set or signature of actions and velocities from attributes.  I found that under the most extreme circumstances, normal user would never create more than X accounts from the same user agent and IP in any given session and anytime that I found a browser agent with an account creation velocity of more than X and tied to other behavior red flags there was a 95% chance that the account would engage in fraud.  In most cases the velocity was much higher than X and those were “low hanging fruit” detections and working my way down to 3 provided a reliable indication when coupled with several signature indicators and behavior (unfortunately I can tell everything, never know who is reading this).

In looking at the signature of the entry and account establishment behavior I could link multiple organized fraud instances together to gain an understanding of the scope of activity and the specific methods being used that differed from the way normal users would interact with the site.

By looking at this activity through visual analytics you can the multiple layers of interrelated attributes that are created and this represents the activity of a single organized fraud rings creating accounts in a one hour period.

Next look at the sequence of activities when creating the account.  Normal users take a certain mean minimum time when establishing a new account.  Normal users also don’t create an account and disappear, if they are going to take the time to create the account in the first place they are going to browse and interact with your site for a certain amount of time.  When your intention is organized fraud however you have to create a hundred accounts in a short amount of time.  There are two things to look for, do you have users who are creating an account in a fraction of the time your legitimate users are taking and two, do you have a number of account created within a short time frame where the page click logs show the exact (key word) pattern of account creation over and over.  Even for a simple form, ten people will not fill out the form in the same sequence, in the same method without making a mistake. 

When committing fraud at the transaction level, organized fraud can spread these transactions and attributes over a much wider set of data points but in account creation these indicators are much more condensed both from need and by design of the account creation process on most sites.

Another key indicator to look for at account creation is network attributes and the relationship between multiple accounts on certain domains and IP’s.  Again because of the nature of the account creation process, these network indicators become more condensed and dynamic when examining them through visual analysis.  Do you have abnormal user agent attributes jumping across multiple high risk domains and IP’s within a short time span, for example user agent creates 10 accounts on one IP, jumps to another IP and creates another 10.  In examining my own data, I never found one legitimate case where a user would create legitimate accounts or transactions while jumping across network attributes.  Key is understanding what the clear majority of legitimate users do in order to detect the much smaller percent that isn’t.

If you find a case where the network attributes are creating a pattern like the one below, that is an high risk cluster of interrelationships which shouldn’t exist (left)

It is an impulse to always through fraud detection at the threat “you can see”, at the transactional level.  Fraud detection and risk scoring at the transaction level has a number of challenges, the activity is much more disbursed so creating a pattern is harder, this is not true at account creation or site interaction.  Transaction level fraud behavior is much more close to normal user behavior, there is a difference but detecting and measuring it is much more complex because likely by the time a fraudster makes it to the transaction flow they are doing the same thing that normal users do and their pattern of activity is much closer, this is not true at account creation.

By coupling together behavior, risk attributes, network forensics and pattern analysis to create a high risk signature and running it against account creation or even earlier in the entry phase, your fraud system can begin risk scoring and mitigation before the first transaction is ever attempted.  I have worked at fraud prevention with many different types of companies but regardless if its social media, ecommerce or FinTech fraud the rule for account velocity has been true in targeting organized fraud across all of these business types.

In watching the news while writing this article, I heard that Facebook is hiring 3,000 people to moderate video in an effort to remove offensive or dangerous content from the site faster and I am thinking, could the people who create this type of content be cohorted and risk assessed based on the way they create and interact with their account?