Micro Focus is now part of OpenText. Learn more >

You are here

You are here

Weaponized machine-learning tool adds punch to pen testing

Dan Petro Lead Researcher, Bishop Fox
Gavin Stroy Senior Security Analyst, Bishop Fox

As a pen tester, what do you do when trying to hack into the external web perimeter of a massive company when the scope entails over 100,000 domains and machines to inspect? All the low-hanging CVE fruit has long been picked clean by automated scanners, so there are no obvious ways in. Yet they exist, even if you don’t know exactly what you’re looking for.

You know it when you see the kind of page that will still bear fruit: some old-looking custom web app that can be exploited, some administration page with a login that could be brute-forced, something “interesting.” But how do you get to the “interesting” more quickly? There’s too much to test, and not enough time.

The answer lies in screenshots. While large web perimeters return too many screenshots of their pages for a human to review, a neural network can visually identity the web pages most likely to contain a vulnerability. And that’s where Eyeballer comes in—it’s the engine that brings the “interesting” to the testers.

What's in a screenshot?

Introduced at Black Hat, Eyeballer is an AI-powered tool designed to assist penetration testers in assessing the external perimeter of large-scale engagements. Unlike traditional web scanners, Eyeballer can "look at" rendered web pages to identify the ones that are most likely to contain actionable leads.

This can be very useful in cases where fuzzy logic needs to be applied, for example when seeking out sites that looks old, or ones that look like a homepage. Currently, penetration testers perform this task in a laboriously manual process, looking at each screenshot one by one. But with the latest advancements in machine learning and deep neural networks, the heavy lifting of visually inspecting web pages can now be automated.

While Eyeballer is a hacking tool, it notably doesn't actually "hack into" anything. Its whole job is to identify which screenshots most likely indicate vulnerabilities, and then present those results for a human hacker to review.

Eyeballer doesn't replace traditional web scanners. Instead, it’s designed to be used in conjunction with them to help focus manual review efforts.

How Eyeballer works

Take the following website, for example:

Figure 1. The 2001 version of makemytrip.com, practically daring you to hack it

It looks old, doesn’t it? Super old. The blocky frames, simplistic design, and general “dot-com bubble” aesthetic. You know what else old websites have? Vulnerabilities. Lots of them. Finding old-looking websites is a huge deal when trying to find that initial foothold into a company’s environment.

And this is where traditional web scanners fail. You can’t make a signature for something that “looks old.” By using machine learning, however, Eyeballer can find these bug-ridden fossils of a website automatically.

Another common useful example would be custom 404 error pages. A web scanner may state that a page exists. However, when users manually visit the page, they are presented with a cutesy image captioned with "Page not found." Check out the website below:


$ curl -I http://517japan.com/thispagedoesnotexist
HTTP/1.1 200 OK
Server: nginx/1.8.0

Figure 2. Custom 404 error page responding with 200 OK

As you can see, this is a 404 error page. But interestingly, the page actually returns an HTTP 200 status code, messing things up for our automated scanning. Furthermore, the digits “404” never actually appear anywhere on the page itself! It’s only behind a mostly obscured image meant to be human-, but not machine-, readable.

But your brain had no trouble looking at that image and identifying it as a 404 error page, right? That’s because the information is there; it’s just not a task that heuristic-based signatures are suited for solving. But an AI can identify these, no problem.

Under the hood

Eyeballer works by looking at screenshots of web pages that have been automatically taken by a screen capture tool. Then, it scans the page using a deep convolutional neural network (CNN). From there, Eyeballer returns a confidence measurement of the types of features it has recognized in the page.

"Does this page have a login prompt?" "Is this a custom error page?" "Does this web page look like it was made in the early 2000s?" These are all questions that Eyeballer tries to assess.

CNNs work by taking an image and breaking it up into chunks. At each layer of the network, it groups neighboring chunks together and looks at their pixel values. It then takes what it learns from that chunk and passes it as the input to the next layer. This process is repeated for each chunk of the image.

Figure 3. CNN layout. SOURCE: Courtesy of www.mdpi.com (Creative Commons)

In its first few layers, a CNN can only recognize basic features: lines, curves, and angles. In the next few layers, it will start to recognize shapes such as circles and squares. Toward the end, the network will be able to recognize generalized features of a class of images. In Eyeballer terms, it means it can recognize that two input boxes next to a submit button indicates a login form.

Figure 4. Heat map of a login page produced from Eyeballer

The above image shows a “heat map” of where on the image the AI thinks the login prompt exists (the purple sections are “highly login-like”). It’s produced by repeatedly covering up sections of the image in order to see what parts of it are important to becoming a login page. This kind of visualization is useful in debugging the tool, and for getting a better idea of what Eyeballer is thinking.

How dependable is Eyeballer?

Eyeballer can currently recognize several categories of web pages, including custom 404 pages, login pages, homepages, and old-looking websites. Using a real-world evaluation dataset that Eyeballer has not been trained on, the latest version is hitting a benchmark of about ~92% overall accuracy.

As it stands now, Eyeballer is a valuable addition to the pen-testing toolkit, especially when used to augment existing scanning practices. Our research team is still actively working on adding to the existing dataset to include many more important categories. Future enhancements include improved accuracy, more granular identification buckets, and better integration with existing web scanners.

Eyeballer was released as open-source software on the Bishop Fox GitHub page at the company's Black Hat USA Tools Arsenal presentation earlier this month. 

Keep learning

Read more articles about: SecurityApplication Security