The state of AI and security: Tools emerge to take on adversarial attacks

Machine learning and artificial intelligence (AI) are related areas that are taking off, but they are also complicated. These systems often allow small changes, or perturbations, to change outputs—a massive security issue.

To most humans, adversarial attacks are not obvious. On a street, a stop sign could be tagged, for example, with two small white and two small black rectangles. Humans would still see a stop sign, but if the stickers are placed just right, autonomous vehicles would see something completely different. The stop sign would instead register as "Speed limit 45."

The result: self-driving cars that blow through an intersection rather than stopping.

The scenario is part of a research effort, known as adversarial machine learning, that demonstrates the dangers of learning algorithms that are brittle—that break outright rather than gracefully ward off attacks.

New tools aim to help developers and data scientists create robust, secure neural networks. Here's a look at the tools, and the state of AI and security. 

Application Security Research Update: The State of App Sec in 2018

The dangers of small changes

Companies that apply machine learning and deep neural networks (DNN) to a variety of datasets and applications often stop when they get the desired result. But they don't consider whether their algorithms will perform reliably in the face of malicious actors who attempt to find small changes that could entirely alter the output of a neural network.

In a paper (PDF) released at the Conference of Computer Vision and Pattern Recognition in June, for example, a team of researchers showed a technique for calculating subtle, yet simple, patterns that could change what machine-learning systems—including autonomous vehicles—see.

Adversaries "can physically modify objects using low-cost techniques to reliably cause classification errors in DNN-based classifiers under widely varying distances and angles," the researchers said. "For example, our attacks cause a classifier to interpret a subtly-modified physical Stop sign as a Speed Limit 45 sign."

Adversarial attacks are a fast-growing area of research in machine learning and artificial intelligence. While machine learning relies on algorithms, much of the innovation in applying machine learning and AI comes from the large amounts of data that companies can process.

Attackers are increasingly focusing not on attacking the code, but on attacking the data, said Lance James, chief scientist at business-risk intelligence firm Flashpoint.

"In security, inputs and outputs are everything. If there is no integrity, we have nothing to trust."
Lance James

How to harden your systems

Such attacks are nothing new. Spammers have attempted to sneak past pattern-recognition systems—such as anti-spam systems—by submitting their messages as "good" content from a variety of email addresses. They pose as legitimate users to attempt to affect classification.

A good library of recent research on adversarial machine learning can be found on GitHub. Developers and data scientists can choose from many tools designed to help them harden their systems and check datasets for adversarial examples, including the following.

1. Measure your algorithm's CLEVER-ness

Earlier this year, IBM and the Massachusetts Institute of Technology released a metric for measuring the robustness of machine-learning and AI algorithms. Called Cross Lipschitz Extreme Value for Network Robustness, or CLEVER, the measure can estimate the resistance of a neural network to being manipulated by adversarial attacks.

The CLEVER metric has two significant properties. The technique for creating the metric estimates a lower bound on a successful attack, irrespective of the type of attack. In addition, the metric can be efficiently calculated for large neural networks.

"Without invoking any specific adversarial attack, the CLEVER score can be directly used to compare the robustness of different network designs and training procedures to help build more reliable systems," Pin-Yu Chen, a research staff member with IBM Research, stated in a post on the technique.

"One possible use case is the 'before-after' scenario, where one can compare CLEVER scores to assess improvement in model robustness before and after implementing a certain defense strategy."
—Pin-Yu Chen

2. Adopt the Adversarial Robustness Toolbox for neural networks

In April, IBM announced a toolkit to help developers and data scientists understand the impact of adversarial attacks on machine-learning algorithms and neural networks.

Called the Adversarial Robustness Toolbox, the open-source library includes code for estimating a neural-network model's robustness, provides ways of hardening the code, and explains methods for protecting neural networks from manipulation at runtime.

The runtime component is important because attackers only have to repeatedly try different combinations of inputs to try to fool a neural network, said Leigh-Anne Galloway, cybersecurity resilience lead at Positive Technologies, a security- and threat-assessment firm.

"We don't need to create malicious samples. We can just use some fuzzing techniques to find the real data when our system produces an incorrect decision."
Leigh-Anne Galloway

The library includes features for calculating the CLEVER metric and other estimates of robustness. The library supports TensorFlow and the Keras deep-learning frameworks. Future releases will extend support to other popular frameworks, such as PyTorch and MXNet. 

3. Create adversarial attacks with Foolbox

Three researchers working at the University of Tübingen, Germany, created Foolbox, a Python library, for generating adversarial attacks against common machine-learning frameworks such as TensorFlow and Keras.

The framework builds attacks by giving users a way to define a goal for the attack, such as misclassification. To date, researchers have implemented more than 20 different types of attacks in three major categories: gradient-based attacks, score-based attacks, and decision-based attacks.

The researchers hope to raise the bar for attackers who aim to fool neural networks.

"Even today's most advanced machine learning models are easily fooled by almost imperceptible perturbations of their inputs. Foolbox is a new Python package to generate such adversarial perturbations and to quantify and compare the robustness of machine learning models."

The code is available under the MIT license.

Other tools and considerations

Other libraries are available for developers to use, but most are focused on computer-vision systems. The Shapeshifter tool, for example, creates physical objects that can be added to a targeted sign—such as the stop sign mentioned at the beginning of the article—to change the results from a neural network.

At this point, researchers are ahead of attackers, and most of these techniques are not being used in the wild. But get ready for that to change, said Flashpoint's James.

"Over time, tools develop and become available and will likely be employed in the future to attack these systems."
—Lance James

Topics: Security