You are here

Finding the supernovas: How machine learning can lift security operations' game

public://pictures/swm.jpg
Stan Wisseman, Security Strategist, Micro Focus

I’ve always been intrigued by cosmology and the evolution of stars. One of the more spectacular, and rare, events that occurs during the last stellar evolutionary stages of a massive star's life is a type of explosion called a supernova. Identifying a distant supernova in our vast starfield can be a challenge, despite how massively powerful they are. How can you distinguish a new bright light from all of the “normal” starlight before it winks out? The scientists at Los Alamos National Laboratory were recently in the news for having discovered an exceptionally powerful supernova, ASASSN-15lh, that may have otherwise gone undetected.

"The grand challenge in this work is to select rare transient events from a deluge of imaging data in time to collect detailed follow-up observations with larger, more powerful telescopes," said Przemek Wozniak, principal investigator of the project that created the software system used to spot ASASSN-15lh. "We developed an automated software system based on machine-learning algorithms to reliably separate real transients from bogus detections."

 

The technology the lab developed will soon enable scientists to find 10 or perhaps even 100 times more supernovas and explore truly rare cases in great detail.

The State of Analytics in IT Operations

Machine learning back on Earth

While I may have desires to be a cosmologist, my career is in information security. And just as cosmologists have recognized the need to augment their manual stargazing with automated machine learning, we need to take similar approaches to more quickly identify threat actors. Today, security operations centers (SOCs) are being flooded with a deluge of security event traffic, and it’s difficult to keep up withmuch less identifythe more advanced threat actors.

In July, Robert Lemos wrote an article for TechBeacon on how cognitive computing is well suited to sift through the massive flow of data needed to detect anomalies that could be indicators of a security incident. Machine learning-based technology helps analysts identify the potential security supernovas (advanced attacks) from all the background radiation of daily event traffic. As a result, highly skilled security team members can then be utilized for more specialized hunt and analytics-focused work.

"When you're looking for security issues, what you're typically looking for is deviations from a norm," said Scott Crawford, research director for information security at 451 Research. "Something doesn't look like the normal, accepted, or authorized use of IT. This does not look like the normal, accepted, or authorized use of data or applications."

He explained that machine learning brings a new technological approach to a discipline that's long been a part of security monitoring management: anomaly detection.

Before machine learning began to be applied to security problems, anomalies were detected by watching network flows for aberrant activity. For example, when a worm becomes active on a system, it creates spikes in network activity as it tries to spread itself. "It's a form of quantitative analysis because you'd see numbers spike with the number of NetFlow interactions from one host to another with a worm outbreak," Crawford said.

"That's a fairly obvious way to detect an anomaly because the numbers are big and it's a different activity compared to normal," he continued. "A lot of attacks are more subtle than that, and when you're trying to tease out the subtlety of anomalous behavior, you really need a much more finely honed approach to analysis. Bringing statistical analysis approaches to bear on understanding what's normal in an environment, that's where the realm of machine learning begins to have a real affect."

In January, Hewlett Packard Enterprise released its 2016 State of Security Operations report. The findings around use of security analytics show that the most mature security operation teams are layering on capabilities to hunt for unknown attacks and using advanced analytics as an aid to detection. When implemented properly, these teams and tools are helping organizations ferret out complex and stealthy threats—from advanced attackers and insiders—that have bypassed traditional security controls.

An example is when a high-privilege account (HPA) user goes rogue, whether by malicious intent or the account being compromised by an external threat attacker. Traditional security controls are unlikely to detect the threat because the HPA user operates with the necessary credentials, entitlements, and access permissions to perform their role. User behavior analytics (UBA) removes the invisibility cloak that allows most HPA users to operate with a measure of anonymity by detecting anomalous behaviors associated with insider and external attacks. UBA solutions can automatically flag abnormal account activity, and they are risk ranked with context-rich intelligence that correlates user, network, system, and physical data with HR tips and clues.

[ Webinar: 5 Things Every SecOps Team Wants Their NetOps Team to Know ]

All eyes on malware

Another example of where machine learning can help is around the malware detection challenge. Security operation organizations everywhere are stretched thin trying to get a handle on the growing malware problem. On average, enterprises have 17,000 malware alerts per week and spend an average of $1.27 million annually in time and resources responding to inaccurate and erroneous event data. Due to the volume of data that SecOps analysts must monitor, approximately 4 percent of all malware alerts are actually investigated, leaving a significant gap in remediation of infected endpoints.

One potential approach to identifying infected hosts with higher fidelity is to detect the malware callback messages in DNS traffic. Of course the issue with using DNS traffic is the significant data volume DNS servers produce. HPE Labs took on this big data security use case with HPE’s internal Cyber Defense Center. The results is a solution that is an algorithmic-driven service that automates the analysis and detection of infected platforms (servers, desktops, and mobile devices). Analysts are empowered to detect threats in near real time and to dramatically decrease malware dwell time, often reducing troubleshooting down from days and months to minutes.

Meanwhile, Dell just announced the availability of a product that includes the integration of technology from Cylance that uses artificial intelligence and machine learning to more proactively prevent advanced persistent threats and malware. Cylance's antivirus technology applies machine learning to malware detection by training the software via legitimate files and malicious ones, and teaching the algorithm which is which. The application can then take files it's never seen before and spot malware.

Take the right approach to analytics

As promising as machine learning is for security solutions, it has its limitations, too. "When you think about true anomaly detection or behavior analysis, the challenge is that security is grasping at straws because it wants the algorithm to figure out if something is normal or not," Alex Pinto, chief data scientist and founder of the MLSec Project, told InformationWeek's Dark Reading in July 2014. MLSec Project is a community working on using machine learning and data science in information security. 

"That works well if you're only measuring one variable," he continued. "But if you increase that and try to analyze, say, the NetFlow of 1,000 different machines talking to each other, today's theoretical mathematical capabilities have no chance."

Nevertheless, a broad range of security solutions using machine learning continue to be developed. Some security analytics solutions require teams of dedicated data scientists while others operate from proprietary algorithms or threat intelligence sources. Other solutions are little more than log storage solutions that support after-incident forensics activity. The State of Security Operations report indicates that the value from security data analytics solutions is most apparent where findings are operationally integrated with security operations capabilities. The bottom line is that as the attack speed continues to increase, automated defenses like these are a necessity in detecting these hot spots before damage is done.

 

"In IT security, we have relied heavily on static rules to detect threats based on known attack patterns," Travis Greene, identity solutions strategist at NetIQ, wrote at SecurityWeek. "But if the steady revelation of new victims is any indication, that approach has long ago reached its limits."

"We don’t lack security information. On the contrary, we are overwhelmed with data that, given time, could produce meaningful threat disruption. It’s the time, particularly of qualified security professionals, that is lacking." —Travis Greene

"So the automated analysis of security data, or analytics, is critical to regaining some semblance of control over the ocean of data that is generated and dumped into SIEM tools daily."

The promise of machine-learning algorithms is that they will learn and predict based on experience and results. It means that what today takes 24 hours will tomorrow take 20 hours, the next day 12 hours, and so on. Use of machine learning effectively scales the effort to a level human teams cannot do or sustain, especially when dealing with automated tasks.

Image credit: Flickr