Micro Focus is now part of OpenText. Learn more >

You are here

You are here

Get the most from machine learning: What's next for detecting data fraud

Kedar Samant Co-founder and CTO, Simility

The use of machine learning techniques to fight fraud has been on the rise of late. But as more organizations leverage machine learning techniques, they need to follow best practices and understand key levers and specific nuances pertaining to fighting fraud.

Beyond the basics, there are a few other techniques you can use to get the most from machine learning and artificial intelligence (AI) technologies, and to blend them effectively with your company's DNA. Here's what you need to know. 

Using machine learning in fraud detection

Machine learning systems can work on very large data sets seamlessly to detect things people aren't able to catch. These systems learn from user-provided feedback regarding the outcomes of fraud actions, and adapt by putting that information back into the system in a continuous learning feedback loop.

A risk engine for fraud management takes in an event data stream, augments it with third-party data sources, stitches and transforms that data, creates signals, makes features, runs rules and machine learning models on that data, and then makes a risk evaluation decision based on optimized strategies (e.g., approve, reject, etc). Confirmations from experts or end users then provide feedback to improve the risk engine.

Key aspects of machine learning techniques

Model types: Supervised and unsupervised

There are two types of machine learning techniques: supervised and unsupervised. The former requires label data; these models are useful for classifying data as good or bad based on historical data and labels. But unsupervised learning does not need label data; these models can detect anomalies, clusters, and so on.

Labels on earlier data help to build a supervised model, which is often far more accurate than an unsupervised model. The key consideration here is picking the right techniques and algorithms for your fraud use case.

Historical data and labeled data

Historical data helps a model learn patterns and validate itself with those patterns to be sure that it works. Richer and more accurate historical data will help you build a more accurate model.

Creating features that result in insights

While machines excel at pattern recognition, domain experts and analysts are good at interpreting and detecting anomalies in stats, charts, and other visuals. Through a process known as feature engineering, domain experts create features that will make the machine learning algorithms work to solve particular problems.

With a user's mouse-movement pattern on a web page in prior sessions, for example, even a non-expert can easily tell that  straight-line movements are bot-like behavior, very different from an actual person’s doodle movements in the past. This would lead to the conclusion that the session doesn't represent a real user and definitely is not the same user as before. Machine learning can help detect this.

Beyond basic machine learning

Machine learning can be massaged in different ways to make the most out of underlying patterns. Here are some examples.

Machines empowering people ... or people empowering machines?

Fraud fighting poses two challenges. First, fraud happens in pockets that are difficult for people to identify, but machines can do it easily. Second, fraud patterns are fairly complex and difficult for machines to codify, but for people it's much easier. Organizations need to shift their culture from using machine learning as simple assistants to one where it does all the heavy lifting while human experts perform fine-tuning.

Unravel hidden, complex insights

Fraudsters always mutate; fraud patterns and fraud strategies keep changing. Organizations can leverage emerging machine learning techniques to discover new patterns when looking at large amounts of data, while unsupervised techniques can help you mine large volumes of data to find anomalous patterns. And by leveraging graph databases for fraud data, you can help analysts better understand hidden patterns so they can be fed into machine learning models.


Build an effective feedback loop

To get the most of your machine learning models, generate labels that are as granular as possible. Go beyond the good/bad binary to focus on why something was fraudulent. For example, behavior from a scam is very different from that of phishing.

Next, label the right entity. Is it the user, the transaction, or the device that's bad—or all three? And understand that feedback and labels are not the same. Generating feedback requires capturing feature data and recomputing labels, scores, signals, and clusters both in real-time and historically (when decision labels are generated). 

Finally, keep your labels consistent. Labels reflect your workflow, your business, and the fraud that occurs within it.

Leverage artificial intelligence and deep learning

Deep learning and artificial intelligence (AI) are changing the landscape for detecting fraudulent activities. Building AI-centric, deep learning models that use device and user behavioral data, user historical and time series data, and large graph-link data will reap continued improvements in predicting individual behavior and detecting fraud patterns. 

Build multi-layer fraud prevention models

There is no such thing as one perfect model. Instead, organizations must create a mix of models that work together. The key is finding the right set of algorithms that best suit your needs. You also need an effective machine learning model pipeline and cascading techniques. For example, run model X if condition 1 exists, and run models X and Y if condition 2 exists.

Next, use auto-machine learning techniques—e.g., Driverless AI from H2O and equivalent techniques in scikit, tensorflow, etc.—that will help cast a wide net, generate features automatically, and reveal the best model configurations for your use case.

Defensible machine learning strategy with strong explainability

Your machine learning models might catch fraud, but it’s important to understand why the system made a decision, especially when that decision does not look intuitive and is not defensible.

As fraud management and fraud platforms come under heavy scrutiny from a compliance perspective, organizations should work on improving their ability to explain the rationale behind the decisions made by machine learning models (for example, through Local Interpretable Model-agnostic Explanations, or LIME). It’s also crucial to implement strong governance around your machine learning strategy, including hypothesis testing and the championing of challenger-based machine learning processes.

Build a fraud-centric data lake and strong visualization

Data lakes refer to systems that have a wide variety of data, ranging from structured and unstructured sources to those with multiple schemas, data from third-party sources, and so on. Using strong visualization on top of the fraud-centric data lake can help identify, conceptualize, validate, and operationalize team members' fraud intuitions more quickly.

Go beyond the machine learning checkbox

From artificial intelligence to explainability, visualization, and data lakes, technologies beyond machine learning are changing the way organizations fight fraud. By using an adaptive fraud prevention platform that goes beyond merely checking off machine learning boxes, organizations can stay relevant, future-proof themselves, and keep the fraudsters of today and tomorrow at bay.

Keep learning

Read more articles about: SecurityInformation Security