You are here

You are here

How to discover and stop security breaches fast by tracking the dark web

Danny Rogers Co-founder and CEO, Terbium Labs

In the face of exponentially increasing threat sophistication, purely defensive measures are no longer sufficient to guarantee the integrity of applications and networks. There are myriad ways data can leak from an application or leave an organization. We’ve observed in recent years everything from inadvertent data spills or database misconfigurations to sophisticated attacks that inject malicious code directly into an IDE, poisoning entire ecosystems' worth of applications with malware. Attack vectors that used to be the sole province of those engaging in sophisticated espionage are now being used in more mundane, economically motivated attacks to the tune of billions of dollars per year in losses to legitimate businesses around the globe.

In light of such numerous and sophisticated threats, it’s imperative that companies take new approaches when thinking about how to protect their data and application integrity. One trend gaining steam right now is the adoption of a risk-management mindset in which one assumes that sensitive data will eventually leak out of an organization and plans accordingly. Actions include the implementation of robust breach remediation plans, the adoption of comprehensive information security programs, and recently, the expansion of data breach insurance coverage. 

The result has been that organizations are more resilient in the face of inevitable breaches of security. This is true even when fraudulent activity causes loss for a company that hasn't been breached itself, in which data stolen from one organization is used to exploit another. Good risk-managed approaches that go beyond network and application defense can do wonders in the face of such threats. These phenomena are why organizations must look outside their networks and take a more proactive approach to finding weaknesses in their application and network defenses. 

Early warning: One key benefit of tapping data intelligence 

The most effective way to monitor the landscape outside of both an application’s and a network’s borders is via a broad category of activities usually captured under the moniker of "threat intelligence." However, not all threat intelligence is created equal, and different approaches have varying utility when it comes to application security. Of course, situational awareness of the latest exploit techniques will help an organization avoid common pitfalls and vulnerabilities. But focusing on where one’s actual data ends up is often the most effective way to identify vulnerabilities in one’s networks and applications. 

One of the more common places for data to appear is the dark web, a loosely defined subset of the Internet that is most often affiliated with less-than-legitimate activities such as the sale of stolen data or illicit goods. Often, these areas are somewhat difficult to access; some live on the Tor network, an anonymous proxy network that overlays the traditional Internet, while others exist behind credential-protected forums and marketplaces or among obscure paste sites.

By monitoring these areas of the Internet, one can often glean early warnings of vulnerabilities in network and application defenses. Often, samples of data, when correlated with source information, can indicate that a previously unknown vulnerability exists. Such monitoring in recent years has uncovered everything from misconfigured databases to malicious insiders leaking sensitive data to fraud schemes perpetrated by user data compromised elsewhere. By detecting these samples or leaks the moment they appear on the dark web, one can glean an early warning of vulnerabilities within one’s applications or network defenses.

The three A's: Automatic, affordable, actionable

A number of challenges arise when attempting to gather intelligence on these parts of the Internet. For starters, they are extensive and expanding rapidly. The number of Tor hidden services has grown by 50 percent just since the beginning of 2016, for example. As the extent of this part of the Internet expands, humans, who have powered the bulk of the intelligence derived from the dark web to date, just won’t be able to keep up. This is why automation, our first A, is key when it comes to dark web intelligence, and fortunately, we live in an era in which technologies for performing large-scale, automated data collection are widely available. For example, cloud computing platforms and NoSQL technologies such as Apache Hadoop make it possible to index enormous volumes of data in an efficient and affordable way.

Automating data collection that was previously done by humans has another advantage: cost. Essentially, human-powered intelligence vendors become outsourced shops for analysts. These services are necessarily expensive, since paying people to read these areas of the Internet scales poorly for the customer. As such, automating these tasks can provide intelligence that is not only more extensive in its coverage, but also more affordable, the second A.

The final challenge in gathering intelligence is identifying what data is important, the so-called signal, from what is not, the so-called noise. The dark web is rife with fake data and false claims, and gathering information from it can be challenging, especially when using automated techniques. Human-powered collection and analysis can help significantly in reducing the noise level, but it doesn’t always help with identifying which data is real and which is fake.

The best way to identify real data is by comparing the data collected in the intelligence-gathering process to one’s own real data. However, most of the data in question in these situations is considered highly sensitive, otherwise it wouldn’t need protecting. This presents a unique challenge when engaging with data intelligence providers. Fortunately, techniques such as data fingerprinting now exist to allow searching for information on an organization’s behalf without requiring access to that information. By using these techniques, one can tell immediately which data requires action and which doesn’t, achieving the final A in our trio of effective data intelligence elements: actionability.

Discovery times are critical

Ultimately, what results are achieved when one focuses on the three A’s? Typically, data breaches take over 200 days to discover, and those discoveries are more often than not made by parties outside of the organization, be it journalists, law enforcement personnel, or even customers. Furthermore, aggregated breach data confirms that the single most important factor in determining damage from a data breach is the time between when the breach occurs and when it is discovered. Very simply, the sooner one discovers a breach, the less damage will occur. 

With fully automated, private data intelligence, we’ve observed detection times in the range of minutes. Organizations can become the first—and in some fortunate cases, the only—ones to know when they have a problem, providing early warning and ultimately reducing the financial and reputational damage incurred.

Automated data intelligence provides a strategic advantage. Identifying the data involved provides the first clue as to how it got out, and this approach allows organizations to pinpoint inevitable vulnerabilities in their defenses. This is the core tenet of a risk-managed approach to information and application security—assuming you’ll be breached and employing private, automated data intelligence techniques to identifying those breaches quickly and quietly.

Keep learning

Read more articles about: SecurityInformation Security