How machine learning boosts application security testing

One of the biggest headaches with application security testing, especially static code analysis, is the time it takes to triage the results.

Raw scan results can be noisy. Before the results of a static scan can become actionable, they need to be audited. Your security auditors, developers, and testers need to go through the scan results and manually confirm the issues that are exploitable and relevant to your organization and separate them from the ones that are not. 

This is harder than it sounds. Seemingly exploitable issues might really be false positives, and apparently relevant ones might be vulnerabilities that aren't really applicable to your situation.

Machine learning–assisted auditing can significantly reduce application security test times, a key consideration in continuous delivery and continuous integration environments. Here's how.

Application Security Research Update: The State of App Sec in 2018

The triage challenge

Static analysis results can contain a very high rate of false positives, or results that incorrectly suggest a rule violation. Some of those apparent rule violations may really reflect organizational preferences. If you don't weed out those false positives, your developers could spend a lot of time chasing down non-existent or irrelevant code issues.

Other identified vulnerabilities might be real enough but simply don't apply in your case. An identified vulnerability, for instance, may not be exploitable because you have a web application firewall or other mitigation in place. Alternatively, a particular determination may not be something to worry about because the code is not reachable from the external attack surface.

Worse, some scan tools spew out reports that run into thousands of pages, with little to no business context. All of this means that triaging results can take a lot of time, and auditing time is absolutely not something that developers want to allow for, especially in frenetic DevOps and agile environments.

Incremental testing

Some companies have hyped incremental application security testing as an approach to speeding things up. The idea is that, rather than taking your entire code base and running it through various analyzers each time you add something to it, you simply test the portion of the code that was changed.

However, at my company, Micro Focus, incremental testing is believed to be incomplete and lacks data flow analysis. If you are not able to trace data through the whole application, you are not necessarily going to see everything. Incremental scanning also does not absolve you of the need to triage static scan results.

Machine learning–assisted auditing of security scan results can help dramatically reduce the time you need for testing. By training machine-learning classifiers using data and knowledge from your previously audited tests, you can automate the processing of new scan results.

And you can apply machine learning and embedded analytics to identify the vulnerabilities that are most important based on your organization's risk tolerance and preferences.

Machine learning–assisted auditing

Applying machine learning to your application scan results can significantly reduce the manual labor involved in trying to determine if an issue is exploitable or not based on thresholds that you set for the organization. Of course, human intelligence will always be required to review findings that are indeterminate or where the machine analysis is unable to predict with a high degree of confidence whether an issue is exploitable or not.

However, the key benefit with machine learning is that it can distill the massive amount of information in scan data down to a much smaller set of actionable, high-confidence results, thereby enabling huge time savings.

With machine learning, my company's managed scan analytics service has been helping organizations enrich their application scan results with audit predictions that are up to 98% accurate. Customers have reported reductions of up to 58% in scan audit times and reductions of between 25% and 90% in findings that are non-issues.

By using machine-learning classifiers that have been trained using anonymized metadata from previously audited scan results from across the service's community, we can now help organizations triage static code results. With thousands of scans every month and over the course of years customers now benefit from a large knowledge base. On-premises tools let you audit an static code scan using your own knowledge base so that the results are a lot more contextual.

Let the machines do the labor

Machine learning can significantly reduce the triage pain. Triage takes time because of the manual labor involved. A few years ago, the application inventory at a decent-size organization would be in the hundreds. These days, companies are dealing with thousands of applications. Some scan their applications every month. On the other side, if you look at information security teams, they are trying to reduce attack surface and trying to figure out how to keep up in a fast-paced continuous development environments.

If you had to do manual triage for everything, you would need a staff of tens, if not hundreds, of people, just to go through your scan results. A manual approach is just not scalable for security needs. Embedded analytics and machine learning can help improve the value you can derive from static code analysis. By automating the audit process, you don't have to spend so much time identifying the security issues that matter most to your organization.

Topics: Security