You are here

You are here

5 key AIOps capabilities that will boost your IT performance

public://pictures/ericka_c_0.jpg
Ericka Chickowski Freelance writer
 

AIOps—a practical blend of artificial intelligence and machine learning applied to the tools and practices of IT operations—promises to transform the way IT organizations manage their assets.

If your organizations is beginning to explore how it can leverage AIOps to improve IT efficiency and system performance, here are the five AIOps capabilities you should consider building out first.

1. Dependency mapping

IT Ops teams can use AIOps to take information about systems, applications, and services and start mapping out the dependencies across numerous domains.

Dennis Drogseth, vice president of research at Enterprise Management Associates (EMA), wrote in a recent report on AIOps:

"One of the features that sets AIOps and advanced IT analytics overall apart from classic big data analytics is an awareness of interdependencies across the application/infrastructure." 

This is a huge boon for the process of change management and configuration management, which has always struggled from a lack of real-time information on existing configurations and the hidden interdependencies that could have a bearing on decisions about whether, when, or how to make system changes.

Organizations using the automated discovery and dependency mapping capabilities of AIOps can start to augment their configuration management database (CMDB) with enough up-to-date information to make the best-informed change management decisions.

Dependency mapping is crucial not just for change management, but also for a range of other cross-domain service management tasks that require deep visibility into the entire IT topology—including incident management, which stands as one of most obvious use cases for AIOps today.

Michael Procopio, a product marketing manager for Micro Focus,  said that AIOps systems make it possible to rank services by importance to the business, and that dependency mapping can then be used to automatically push system problems affecting those high-priority services to the top of operators' list of open tickets.

"This is your move to a service-centric approach."
Michael Procopio

2. Event and incident management

The discipline of IT incident management offers some of the easiest ways to take advantage of AIOps capabilities. A hallmark of AIOps platforms is the ability to ingest and manage monitoring data from a range of different sources and analyze them with a depth and speed-to-insight that would be unmatched by human analysis. This provides tremendous opportunity for IT teams to break through some of the perennial barriers they face when it comes to managing IT incidents.

These include siloed data from dozens or even hundreds of different monitoring or logging platforms, crushing alert fatigue, and high numbers of dependencies that hide the ultimate source of an issue amid a myriad of related symptoms.

A survey from nonprofit group AIOps Exchange found that one in four organizations manage 50 or more different IT monitoring tools, while 40% of IT organizations deal with over a million event alerts a day. One of the most obvious uses for AIOps technology is to use it to automatically sift through alerted or ticketed events, de-duplicate tickets, correlate issues, prioritize important systems, and deliver fewer, more actionable alerts.

Rich Lane, senior analyst for Forrester Research, said AIOps can help your team stay on top of what matters most.

"You lose a core switch in the data center and you get 5,000 or 6,000 tickets opened or alerts, and you can't keep up with it. The machine learning bubbles up the important alerts to the surface to get that one alert you're supposed to be starting with."
Rich Lane

According to a recent study conducted by Digital Enterprise Journal on behalf of Micro Focus, top-performing organizations using AIOps can automatically discard non-actionable alerts at a rate that's nearly double that of those not using the technology. This helps teams operate much more efficiently as they respond to incidents.

Matt Stratton, head of digital and data in the Americas for Orange Business Services, said that, without AIOps, an organization can have 10 people chasing 10 different symptoms, only to arrive at the same root cause.

"[With AIOps,] you can just have one person go to the root cause and take care of that. That way your team's freed up."
Matt Stratton

Not only is this efficient, with fewer people wasting time while handling incidents, but it also greatly increases the speed of response. The Digital Enterprise Journal study found that organizations using AIOps experience a 63% reduction in time to isolate root causes of cloud performance issues. As a result, AIOps-using top performers are resolving incidents more quickly—posting a mean time to resolution that's nearly four times that of all others, the DEJ study found.

3. Predictive maintenance and capacity management

AIOps isn't just about fighting immediate fires through incident management. Some of the biggest ROI results involving AIOps projects are from those that help IT organizations take more preemptive action, before an incident ever occurs.

The DEJ study found that 66% of organizations say their top goal for AIOps is to proactively prevent performance issues. This comes by using AIOps' predictive analytics capabilities to identify patterns in usage, compute cycles, application performance, and so on that could indicate impending component failures, temporary usage spikes, or other issues.

Forrester's Lane explained that these are incidents occurring because staff didn't see temporary spikes in usage coming.

"Predictive capacity management is one of the more powerful use cases, given that if you look through ITSM system data, we still have a lot of outages driven by running out of compute power."
—Rich Lane

In this situation, an organization can leverage AIOps to automatically let an operator know that a service is likely to run out of capacity next Tuesday, while suggesting the option of approving an automatic spin-up of resources on Monday and an automatic spin-down of resources a few days later. "I don't want to have to remember that myself. I don't want to have to write a script to do that. I want the system to be smart enough to recognize it," Lane said.

He added that the capacity adjustment action can be automated, or it can offer a decision point to take action.

4. Automated remediation

DEJ study found that three in four organizations ranked the importance of automation for IT performance as one of the highest drivers for implementing AIOps. Organizations seeking to get maximum value out of AIOps establish projects that use AIOps insights to drive automated remediation of issues caused by both current incidents and predicted problems.

These kinds of self-healing capabilities are a huge allure for AIOps, since they use the analytics and learning models of AI, finally making good on the promise of automation that has loomed like a mirage to IT operations teams for years.

Nancy Gohring, senior analyst for 451 Research, said that fear of automation is a theme that people in IT Ops have been talking about for years.

"There's been this perennial concern you're going to accidentally kick off some sort of action that's going to make the problem worse or actually create new problems. But if you've got these machine-learning tools in place that are intelligently and accurately diagnosing a problem, then you can much more smartly initiate an automated response that can solve a problem for you."
Nancy Gohring

5. IoT gets its own ops

Another use case where experts believe enterprises will increasingly need to apply AIOps is in managing Internet of Things devices in the field. David Linthicum, chief cloud strategy officer for Deloitte Consulting, said the complexity and volume of these devices make the operational management of them nearly impossible without AIOps.

"Whereas traditional data center IT systems number in the thousands or tens of thousands, IoT deployments require operations teams to manage hundreds of thousands of devices at a time. And also they're more problematic than things that sit in the data center. They typically don't operate in controlled environments."
David Linthicum

For example, a farming enterprise may have tens of thousands of sensors and farm equipment connected across thousands of farms throughout a country or the globe. They all report back to IT systems in the data center about where watering is occurring, where weeds are being picked, and so on. This data is fed into other automated systems and decision-making platforms for farmers and business leaders in the organization. If operations staff can't keep those field devices running optimally in inclement weather, then the farming organization might not water at the right places or right times, sow seeds at the perfect time, and so on.

"Remember that all of these things are interdependent, so if an IoT device stops transmitting information to some sort of centralized database, and if enough of them do that, then the application that you're leveraging those IoT systems for is going to cease to work," Linthicum explained.

With so many variables in play, operators need machine learning and advanced analytics to keep all of the metaphorical plates spinning.

"You have to really put your faith in AIOps, so you're able to leverage IoT and so you're able to monitor and manage and abstract yourself away from the complexity. It's a huge issue to solve as we're rolling out these devices—be they drones, be they tractors, or be they thermostats, or any of the thousands and thousands of applications of these connected devices."
—David Linthicum

Common patterns emerge

At the end of the day, the earliest AIOps projects are all about consolidating IT Ops information streams and squeezing the most value out of them possible.

"AIOps tools have some common patterns. No. 1, they're pulling information from multiple devices, and No. 2, they're making decisions on that information."
—David Linthicum

Whether those decisions are to trigger an automated action, to correlate one event with another, to make a change in a specific way, or to preemptively change capacity, operators are able to act in a more informed and speedy manner than ever before.

Keep learning