You are here

You are here

Cloud management headaches? How AIOps is a force multiplier

David Linthicum Chief Cloud Strategy Officer, Deloitte Consulting

As multi-cloud and hybrid cloud become the norm, enterprises have come to understand that they need a tool set that sits above the cloud brands to manage all public and private clouds. 

The ideal tool set uses a single abstraction layer and a single pane of glass. This tool set should encompass traditional systems, such as mainframes, and existing LAMP-based servers, as well as the purpose-built systems and databases that currently run the business.

So, what tools work best as you move to complex, federated, and widely distributed cloud deployments with two or three major public cloud brands? How do you manage all these moving parts as if they were in a single cloud or data center?  

Understand that siloed management tools are not the answer. They were created to operate cloud-native services, and they worked fine for that task. But most don't have the ability to see outside of their native cloud, nor should they. 

In fact, cloud-native ops tools are not an option for managing complex cloud deployments. Third-party tools that span clouds are the best fit for the new CloudOps

Here's what you need to know.

Modernize your thinking about cloud management

To manage all this effectively, here are some key points to keep in mind:

  • Cloud management is now a cross-cloud technology focus, and no longer cloud-native. This means rethinking everything you know about cloud management for single public cloud deployments, to deal with today's hybrid-cloud and multi-cloud systems. 
  • You need to learn how to implement and plan management and operations for all systems, cloud and non-cloud. Expand your operations abilities to all systems, no matter whether they are traditional systems such as mainframes, x86 cores in racks in a data center, or public and private clouds that typically number more than one. 
  • Leverage technology such as AIOps as a force multiplier for cross-cloud management. The only way to win this game is to work smarter, not harder. 

Tossing tools at the problem will only make complex and distributed deployments more complex to operate, especially if you use native tools on traditional or cloud-based platforms. This configuration will never scale. Here's how to approach the problem so it will.

Understand the current limitations

As covered in this recent whitepaper, six things are required for service-aware monitoring:

  1. The ability to monitor other cloud services from other public cloud providers
  2. Centralized event management, aggregation, consolidation, and analysis
  3. Support for traditional systems that are in wide use today such as mainframes and other core systems
  4. The ability to monitor inside virtual machines and application-level events
  5. The ability to monitor user experiences
  6. The ability to remediate across all cloud and traditional systems

It's also reasonable to believe that monitoring will extend to data-aware, platform-aware, network-aware, and other core resources that must be centrally monitored.

Core to the issues we want to solve (and to Nos. 1 and 2 on the list above) is the ability to centrally monitor all cloud services and address issues in any operational domain. This includes incident management, aggregation of operational data, consolidation of that data, and doing the right analysis—sometimes aided by AI systems—to determine the root cause. 

You also need the ability to self-heal using automated processes, such as automatically rebooting a server or automatically routing around a networking problem.

Why you need cross-cloud tools

The core lessons learned here are about centralization and independence. The native tools that are purpose-built to manage a single cloud brand cannot and should not be leveraged to manage other brands.

The reasons are twofold: If the tool can reach out to monitor and manage other non-native cloud services, it must do so from a native cloud. This means the operational tool is coupled to that cloud, and thus the tool depends on that cloud being operational. 

Even if the tool is deployed from cloud brand A and is also gathering operational data from cloud brands B and C, when cloud brand A goes into an outage, so does the management and monitoring tool's ability to manage the recovery of that outage. This is where cloud-native is a disadvantage, either as a single tool that supports a single cloud or as one that supports multiple clouds. All your operational eggs are in one basket.

These "ABC" tools are beginning to appear in the marketplace, and they will likely see limited play beyond enterprises that have a very specific type of multi-cloud deployment.

Features to look for

These are the attributes to look for in a centralized and independent CloudOps tool:

  • Platform options for the CloudOps tool, cloud and not; the tool should run on platforms that are independent of the systems under management. 
  • The tool must centrally gather data, fix issues, and learn by using an onboard AI system. This increases reliability and removes a great deal of risk.
  • Independence from specific cloud and non-cloud platforms. Cloud providers that offer tools to manage systems outside of their native cloud platform will claim a level playing field. Because their tools run on their competitors' products, at a minimum they have a conflict of interest. 

Be more inclusive

Here's another issue: Existing systems have populated enterprise data centers for the last 20 years—everything from traditional mainframe to modernized mainframes to many x86 systems. How do we put them under this new operational management umbrella that now includes public clouds?

The advantage here is fewer tools to manage more types of systems. The core idea of emerging AIOps tools is that they abstract the resources. They deal with each resource type, including storage, compute, database, and networking, the same across platforms. They also support cloud, non-cloud, legacy, traditional, and basically all technology that must be operated to support the business. 

The AIOps tool can account for the differences in the native resources, which makes life easier for the CloudOps team. It can also deal with different native resources using a common interface—a single pane of glass—in the same way across different native systems, which simplifies a very complex operation. 

The technology benefits are easy to define. The core business benefits would be:

  • Lower cost of operations—Fewer humans are needed to deal with more complex deployments. Most of the cost and risk of operations is a function of its complexity. Remove the impact of complexity from the ops teams, and they can do more with less. 
  • Reduced downtime—Beyond its ability to abstract deployment complexity, the AIOps tool uses a centralized data-gathering mechanism paired with an AI engine and self-healing processes. Downtime is reduced much more over time as the knowledge engine becomes smarter through experiences. 

How to find the right tools

As covered in the whitepaper, if you understand all you can about the systems you plan to monitor, then selecting the right tool or tools is just a matter of backing your requirements into the right set of service-aware monitoring technology.

When selecting ops tooling, the paper recommends support for:

  • Heterogeneous operations across all platforms you've deployed, cloud or not
  • Observability across all systems and clouds
  • Integration with other tools, such as security and governance
  • Automated remediation or self-healing processes
  • Service-level monitoring, both at the service and microservice levels
  • Monitoring of virtual machines and the applications and data residing inside them
  • The integration of deep analytics and AI that allows you to leverage the data you've gathered in unique and innovative ways to provide insights that will allow your systems to improve over time

Keep future needs in mind

The trick to picking the right AIOps tool, or any ops tools, for that matter, is to consider not just the systems that are now under operations, but also where they are likely to go. While most observers expect the expanded use of cloud computing moving forward, a few customers have experienced an unexpected growth of traditional systems. 

The operational profile changes the systems under management, as well as how they should be holistically managed. 

Also keep in mind that this is an evolving science. Management and monitoring have been around since computers and networks became commonplace for businesses. We have 30 to 40 years of experience in keeping systems running. It's time to combine that experience with the emerging trends toward heterogeneity and distribution. 

This new evolution in management and monitoring is best described as the ability to manage any number of systems with any number of resources using tools that simplify operations, even though complexity has reached an inflection point. This is where technology shines. 

Keep learning

Read more articles about: Enterprise ITCloud