You are here

You are here

Data science takes on exploits: 4 lessons for security teams

Rob Lemos Writer and analyst

While spearphishing and stolen credentials still account for most data breaches, the exploitation of vulnerabilities is still a common threat. But determining which vulnerabilities are most at risk of being exploited by attackers remains hard to predict.

Data science and machine learning are seemingly ready to offer some answers. Two research efforts could provide insights into those vulnerabilities you'll most likely need to patch quickly. The information can make companies much more efficient in their patching efforts, closing the security holes that would be the most probable vectors for future attacks.

At the USENIX Security Conference in August, for example, a group of researchers from the University of Maryland at College Park, the University of Michigan, and Harvard University found that, using community structures for both attackers and defenders, they could determine whether a vulnerability was being exploited with 90% precision, using at most 10 days of data.

The research holds lessons for application security professionals and infrastructure operators. Because only 2% of vulnerabilities are exploited in the wild, trying to patch every vulnerability results in wasted effort, according to Kenna Security. Even looking at a basic factor—the existence of an exploit—can reduce the total population of attacker-exploited vulnerabilities.

A vulnerability with published exploit code is seven times more likely to be exploited in the wild, Kenna Security and data-science firm Cyentia found in its report, Prioritization to Prediction: Analyzing Vulnerability Remediation Strategies, published in May.

Here's how to determine on which vulnerabilities your team should put its focus.  

1. Criticality is not a good measure of likely exploitation

The Common Vulnerability Scoring System offers one measure of the severity of a vulnerability; it's intended to indicate the potential of a vulnerability to create havoc in affected software.

The best strategy—patching all vulnerabilities with a score of 7 or higher—catches about 53% of all vulnerabilities that are eventually exploited in the wild, compared to 38% for a strategy of randomly selecting vulnerabilities. But just 31% of the vulnerabilities identified as expected to be exploited will actually be exploited in the wild, leaving defenders to patch more than 25,000 vulnerabilities, according to Kenna Security's research.

Michael Roytman, chief data scientist for Kenna Security, said that when you look at all the vulnerabilities, the chance of a specific vulnerability actually being exploited in the wild is rare.

So there is a high probability that any yes-no slice of the data, such as 'we remediate all criticals,' is likely to have a lot of wasted effort and is going to be imprecise."
Michael Roytman

2. Vulnerabilities on lists may be more likely to be exploited

You'll find vulnerabilities discussed on a variety of mailing lists and databases, including Bugtraq, BID, Microsoft's list, Sectrack and others. While no single list is a great predictor of whether a vulnerability will eventually be exploited, many of these lists do correlate with a higher likelihood, according to Kenna Security's research.

"What we have found to predict exploitation is whether it has exploit, the number of vulnerabilities in the wild, and whether it is on the Microsoft reference list or other vulnerability lists. Is it in Metasploit? These all make a difference in our model."
—Michael Roytman

3. Look for lessons in the crowd

The academic research team had a different focus: Could it tell if a vulnerability was being exploited, without relying on an actual incident report? The group focused on blacklists and other signs of malicious activity, building a model of both the attacker and defender communities.

Tudor Dumitras, associate professor at the University of Maryland at College Park and one of the authors of the USENIX paper, compares the technique to trying to determine if a particular population has a specific disease, such as Ebola. By dividing up the potential infected populaces into different groups, researchers can look for signs that one group is infected and another group is not.

If you compare the rates of infection and you see that it is significantly higher in one population, then chances are that the pathogen is spreading there, he said.

"With our exploitation problem, this is a pretty good signal—it is the best signal we have discovered so far. … So the analogy here is that the symptoms of infections are bad but observable. We used IP addresses that are blacklisted as a symptom of infection and how those hosts got infected."
Tudor Dumitras

4. Developers need to take a lesson from Spider-Man

For developers, the lessons are a bit more subtle. Because research shows that attackers are more likely to exploit the most popular software, developers of those programs should take more care with security, said Kenna's Roytman, who paraphrased the famous quote from Spider-Man below.

If you are writing an open-source library that you think will be used by only a few people, you don't have as much responsibility. But if you are Adobe or Microsoft, the more prevalent your software is across enterprises, the more responsibility you have to consider security, he said.

"With great prevalence comes great responsibility. Time and time again, attackers have an incentive to write exploits for software that is widely deployed."
—Michael Roytman

How to model good threat modeling

While these factors can give hints of the likelihood that a vulnerability will be exploited, both research efforts found that combining all the factors into a single model was the best predictor.

Using 250 different features of vulnerabilities—from the attributes mentioned along with more basic information, such as vendor and product name—Kenna Security found that it could build a model that correctly identified the exploited vulnerabilities 79% of the time, although it would only catch 29% of the actual total that would be exploited in the wild. Reducing the precision of the model—so that it identified exploited vulnerabilities 50% of the time—led to better coverage; the model caught 82% of all exploited vulnerabilities.

The researchers' efforts produced a different result. Using information gleaned from watching the telltale signs of malicious attacks and potentially infected corporate systems, the group was able to create a model that predicted whether a vulnerability was being exploited in the wild with a 90% true positive rate using 10 days of activity.

A strategy of just patching vulnerabilities with published exploits will still lead to too much work. And companies that focus on clearing the easiest-to-patch vulnerabilities are wasting time, said Kenna Security's Roytman.

"Cycles wasted applying patches just because they are easy are wasted cycles because they are not reducing risk."
—Michael Roytman

The University of Maryland's Dumitras said companies surely would like to know what the odds are of getting hacked tomorrow.

"This is useful in doing a cost-benefit analysis. If you know you have a bunch of vulnerabilities in your infrastructure, you can figure out what the cost of patching will be for just the most critical flaws."
—Tudor Dumitras

Keep learning

Read more articles about: SecurityApplication Security