Beat advanced persistent threats (APT) with machine learning, analytics

Advanced persistent threats (APT) represent the most critical cybersecurity challenges facing governments, corporations, and app developers. Compared with cybersecurity concerns such as dedicated denial-of-service (DDoS) attacks, the stealthy, continuous, and targeted nature of APTs make them particularly difficult to detect. Case in point: On average, it takes 240 days to detect an APT-related enterprise cybersecurity breach. As a result, most breaches are discovered only after being reported by third parties, when the larger ramifications of the breach begin to play out.

A particularly insidious APT is data exfiltration, which is the unauthorized transfer of sensitive or critical information out of a network or application. The targeted data can include state secrets, intellectual property, personally identifiable data, and financial information. Some of the largest cybersecurity breaches in recent years have involved data exfiltration, including attacks on Target in 2013 and Home Depot in 2014. And essentially any Internet-connected device or application with access to sensitive data is a potential target for these high-profile breaches.

Observing the steady occurrence of data breaches, application developers began taking steps to prevent data exfiltration via standard protocols such as FTP and HTTP, typically outright blocking access. In response, attackers shifted tactics to extracting sensitive data through seemingly benign inbound and outbound DNS traffic. My team is working to fill that gap with software that utilizes machine learning and real-time data analytics to monitor DNS and other network traffic. The idea behind our software is to identify potential data exfiltration using multiple detectors, including Snort for intrusion detection, AVG for malware detection, Splunk for network traffic analysis, and a state-of-the-art exfiltration detector.

Understanding a data exfiltration attack

Given that DNS helps to form the backbone of Internet traffic, the sending of DNS queries and results is rarely restricted or blocked by administrators, leaving most networks (and the data they contain) vulnerable. Furthermore, detecting and protecting against such an attack is extremely difficult. DNS exfiltration attacks are often lost in the high volume of network traffic and can closely resemble normal network activity. As such, it is challenging to distinguish legitimate user activity from malicious attacks.

A DNS-based data exfiltration attack consists of three stages:

Intrusion. The network or application is breached so as to give the attacker remote control to the system. Techniques include social engineering, password cracking, or Trojans.
Discovery. Now, with remote access to the application, the attacker installs malware and scans the network for servers, databases, etc., containing sensitive information.
Exfiltration. Once a network has been intruded and exfiltration malware installed, the compromised application will send numerous DNS queries like the following:

OHDOBHDAGOOESDUGBOOH.attackerserver.com

HBSGGCDAGOOES.attackerserver.com

EHSHHJFHUAAOOGKKSDDAHAUBBJDCCKG.attackerserver.com

The attacker is savvy enough to not transmit the data as plaintext and employs a variety of techniques, including compression, encryption, and chunking, to obfuscate the exfiltration. Thus, the fully qualified domain name in each DNS query contains only a small portion of the encoded data to be exfiltrated, e.g., “OHDOBHDAGOOESDUGBOOH” and “HBSGGCDAGOOES”.

These queries are first directed to the internal DNS server within the network to resolve the *.attackerserver.com domain (conspicuously named for illustrative purposes). The internal DNS server cannot answer the query, as it does not contain an entry for *.attackerserver.com in its database. A recursive lookup is then performed, which forwards the DNS request out of the internal network until eventually it is received by an external authoritative name server operated by the attacker. This name server records all of the data and content from the DNS queries it receives. Over time, the attacker is able to decrypt and reassemble the exfiltrated data. As avoiding detection is a primary goal for the attacker, this process can play out over the course of months or even years.

Approaches exist to detect each of the three stages of a DNS data exfiltration attack. However, they have typically been stovepiped, focusing on just one aspect of the problem—intrusion, detection, malware detection, or exfiltration detection. With each detector operating in isolation, critical situational awareness is missing. Instead, building that awareness creates the potential to connect multiple anomalous observations and detect an attack that would otherwise fail to trigger any one detector.

Machine learning, real-time analytics level the playing field

My team is working to fill that gap with software that utilizes machine learning and real-time data analytics to monitor DNS and other network traffic. The idea behind the software is to identify potential data exfiltration using multiple detectors, including Snort for intrusion detection, AVG for malware detection, Splunk for network traffic analysis, and a state-of-the-art exfiltration detector.

Our software will use artificial intelligence to dynamically configure the sensors to provide better detection with minimal overhead to performance. Actions could include dialing up the sensitivity of deployed detectors or blocking a particular protocol or port. For example, raised suspicions could cause additional (and more sensitive) rules sets (e.g., exploit-kit, indicator-compromise) to be activated within Snort. Another possible action could be to reduce the cap imposed by the data exfiltration detector on the maximum amount of information content allowed in a stream of DNS queries (e.g., from 4 KB/day to 1 KB/day).

The software will automatically evaluate the potential benefits and consequences of each available action. By taking a centralized approach, the algorithm can determine the appropriate level of scrutiny to apply with each detector so as to achieve a desired balance between performance, security, and false alarm rate. Upon determining the best course of action at each time step, the software can be set up to either automatically perform the selected action or submit to an administrator for approval.

With an ever-increasing reliance on mobile devices, cloud computing, and the Internet of Things, the threat of data exfiltration, whether by DNS or other channels, will only continue to grow. Proactive tools for addressing the threat of data exfiltration by intelligently analyzing data and leveraging a suite of existing cybersecurity products will be key, as will situational awareness that enables improved early detection of exfiltration attacks, helping to minimizing the time and resources needed for mitigation.

What's your approach to dealing with APTs? Do you need better tools? Share your wish list.

Image credit: Flickr

Keep learning

Choose the right ESM tool for your needs. Get up to speed with the our Buyer's Guide to Enterprise Service Management Tools
What will the next generation of enterprise service management tools look like? TechBeacon's Guide to Optimizing Enterprise Service Management offers the insights.
Discover more about IT Operations Monitoring with TechBeacon's Guide.
What's the best way to get your robotic process automation project off the ground? Find out how to choose the right tools—and the right project.
Ready to advance up the IT career ladder? TechBeacon's Careers Topic Center provides expert advice you need to prepare for your next move.

Read more articles about: Enterprise IT, IT Ops

You are here

You are here

Counter security threats with machine learning, real-time data analytics

Understanding a data exfiltration attack

Machine learning, real-time analytics level the playing field

Keep learning

Subscribe to TechBeacon

Get the latest delivered straight to your inbox.