How to use AI to fight identity fraud

It’s no secret that identity fraud is a growing problem: A record 16.7 million US adults experienced identity fraud in 2017, marking an 8% increase from the year before, according to Javelin's 2018 Identity Fraud study.

The amount of fraudulent transactions, massive data breaches, and instances of identity theft continues to rise as hackers and fraudsters become more sophisticated. ID scanning solutions have various strengths; some simply scan an ID’s barcode, whereas more robust software performs forensic and biometric tests to ensure that an ID is not forged.

Artificial intelligence and its subsets of machine learning and deep learning make it possible to accurately process, verify, and authenticate identities at scale. Here's how.

How to scale ID authentication with machine learning

Identity documents, such as driver’s licenses and passports, are scanned to test various elements of an ID, either on premises or remotely with mobile devices. Some examples of authentication tests include confirmation of genuine microprint text and security threads, validation of special paper and ink, comparison between OCR and barcodes and magnetic strips, data validity tests, and biometrics or facial recognition to link the individual to the ID credential.

Using machine learning—or automation—creates a more efficient and accurate process than relying on an untrained human to look at the document. These solutions should contain an anonymous internal data-collection mechanism capable of storing information about the operation and performance of the software. Then, the data is automatically transmitted to the provider on a regular basis. This process—if automated—saves time and improves the quality of the results.

Why you’re only as good as your data

There are thousands of forms of ID—passports, DHS "Trusted Traveler" cards (Global Entry, NEXUS, SENTRI, FAST), military IDs, permanent resident cards, border-crossing cards—and obtaining significant quantities of each type to understand the variations, wear patterns, etc. is a monumental task.

Effective ID authentication relies on the collection of metadata on the document-recognition and authentication process. Instead of containing information about the document being processed, this metadata contains information about the processes that are run and the details and outcome of those processes. By analyzing this information, the software trains itself to detect complex patterns and output a prediction/result. It optimizes the performance of the software and library to improve the reliability of the document read and authentication processes.

Using machine learning to differentiate between good and bad IDs is extremely efficient. However, without supervision, the logic developed by the algorithm may exclude IDs that are not fraudulent. There are many reasons why an ID may not pass even though it is valid: wear and tear, other physical damage, manufacturing errors or defects, minor design changes, or even variations in production depending on where and how the card was produced (central versus instant issue).

Most states now use central issuance but have the ability to instantly issue an ID in some situations, for VIPs, for example. Given that IDs are often not printed in the same location by the same machine, there are often anomalies based on printing quality, misprints, or unclear images. It is not uncommon for an entire batch of IDs to contain a manufacturing error.

A computer must be taught that IDs that are worn or damaged are still valid. Accessing a program that allows the collection of operational and performance metrics for the software is useful for improving the recognition and authentication of documents supported by the document library of the provider.

A robust document library against which to compare captured IDs is vital; access to a comprehensive and regularly updated library cuts down the time that machines must process data on their own and maximizes data extraction and authentication capabilities. Semi-supervised machine learning enables adjustment of the direction of the logic without interfering with the insights that authenticate documents or slowing down data processing.

Data mining, semi-supervised learning, and regression analysis in the feedback loop

There are multiple models of ID authentication to choose from:

Data mining: Examining large databases to turn raw data into useful information. For more efficiency, be sure to extract clean data to save time with this process.
Semi-supervised learning: Relying on completely automated machine learning can result in “failing” documents for items that have experienced wear and tear or ones with document manufacturing errors.
Regression analysis: This method continually tests and analyzes the outcomes to improve the algorithm.

New data is fed into the algorithm to test outcomes–this process is often called the feedback loop. As new data is fed into the algorithm, the feedback loop tests that the outcomes are consistent and improving. The outcomes are then fed into the algorithm so that the software continues to learn and adjust.

When to use biometrics for identity management

Biometric identity verification methods implement biometrics, such as facial or voice recognition, to strengthen the identity verification process. Not only is it more passive for consumers, since it reduces the need to remember a password or enter PII information, but it’s also a more stringent security protocol.

Biometric security solutions utilize deep learning to mimic the way human neurons process extremely difficult information, such as faces and language. With deep-learning technology, the software provider can model large amounts of complex data, such as many images and faces.

Facial-recognition technology utilizes deep learning to learn to match the image on the ID to a person’s face. Then the algorithm looks for certain patterns such as basic shapes (eyes, mouth, nose) and complex shapes (complete faces and distinctive shapes), and finally returns an output that indicates whether the image matches the ID's face or not.

Machines + humans = the highest accuracy

Trained professionals using the software can step in to prevent bad customer experiences when the software flags a legitimate ID because it is damaged or worn. During the rare instances when the computer fails to identify what is wrong with an ID, the professional can apply an expert eye, determine what error occurred, and teach the computer how to spot the issue in the future.

The key is to quickly understand what data points are relevant to identifying a document as fraudulent versus results that are noise due to expected variations (wear, damage, manufacturing, etc.). A human eye can do this much more effectively than a computer. This creates a different method of learning, where new information is being input to the learning model, so the model can improve.

Keep learning

Learn from your SecOps peers with TechBeacon's State of SecOps 2021 Guide. Plus: Download the CyberRes 2021 State of Security Operations.
Get a handle on SecOps tooling with TechBeacon's Guide, which includes the GigaOm Radar for SIEM.
The future is security as code. Find out how DevSecOps gets you there with TechBeacon's Guide. Plus: See the SANS DevSecOps survey report for key insights for practitioners.
Get up to speed on cyber resilience with TechBeacon's Guide. Plus: Take the Cyber Resilience Assessment.
Put it all into action with TechBeacon's Guide to a Modern Security Operations Center.

Read more articles about: Security, Information Security

You are here