You are here

You are here

Al and cloud management: Key benefits, tools, and practices

David Linthicum Chief Cloud Strategy Officer, Deloitte Consulting

A movement is under way to integrate machine learning with multi- and hybrid-cloud management technology, especially with IT Ops analytics (ITOA).  So what's in it for you?

If you're in IT operations management, this is a good thing, as these new systems use an evolving learning model that stores experiences for later processing in much the same way humans do. 

In a world that demands continuous delivery, what are the benefits from mixing machine learning with multi-cloud and hybrid-cloud management?

Here are five key wins, and what your team needs to know about the tools and practices.

1. Persistent learning

The chief advantage is the persistent learning that takes place in management and monitoring solutions. Machine learning provides ongoing stimulus of public, private, and traditional systems under management, and then learns from the results of that stimulus. 

If you hire experienced people, they can use their experience and knowledge to spot potential problems before they become problems. Similarly, machine learning (when mixed with monitoring and management) can build upon an ongoing learning model that simulates hundreds of experts monitoring the systems at all times. This is the reason to mix machine learning with monitoring and management. 

2. More predictive and proactive systems

You want a monitoring and management system that, like experienced people, can find problems before they become problems. For instance, the automated learning model will understand that end-of-month processing often saturates existing compute servers in the public cloud that are under management, so it will preemptively add servers to the cluster to handle the spike in load, and then return those servers to the pool once they are not needed. 

The ability to remove or add cloud or non-cloud resources as needed, and have those resources configured, managed, and monitored nearly automatically, abstracts change from those charged with Cloud Ops, meaning that they are protected from volatility. They can focus on high-level activities because automated learning takes over the tactical tasks such as provisioning and configuration. 

3. More consistent operations

Every Cloud Ops team does things differently. This includes ITOA, and teams can even come to different conclusions based upon the analytics presented. For instance, Harry believes that a saturation of the network means restarting the hubs, whereas Sally thinks it means shutting down some compute instances. However, the right answer is to reconfigure the hubs on the fly. 

Because humans do things using their experience instances, this creates a pattern of inconsistency. Machine learning, however, leverages a central brain that does things consistently, and hopefully in the right way. This means you can expect the same types of resolutions from a similar set of stimulus and circumstances.

4. Shared learning

Since machine-learning models are built by enterprises through months of experiences and learning, they may be shared with other core systems that need the same types of monitoring and management. In other words, you can copy the learning models over to other monitoring and management instances, and apply all of the experiences learned in the other parts of the business. This is analogous to cloning your monitoring and management expert and moving him or her to another part of the company.

5. You'll do a much better job with ITOA 

When IT Ops analytics leverages machine learning, these activities can be automated and scale. Keep in mind that this will mean a shift from logging just a few megabytes of data per day to gigabytes. While the information gathered is meaningful to system monitoring and management tasks, it’s becoming way too much for one or two humans to process and understand in the context of keeping the systems healthy and doing so proactively. 

By leveraging machine learning for this process, no matter how many data streams you have coming in from the systems under management, the machine-learning engine should be able to keep track of the complex array of data streams and understand what they mean, in terms of actions required now or in the near future. Also, machine learning will understand trends and produce predictive analytics.

Should you adopt it? Challenges and opportunities

Looking forward, there’s a lot of work required to get ready for the use of machine learning with hybrid-cloud and multi-cloud operations. Although the tools are beginning to show up from management and monitoring providers, they need time to mature before they are ready for prime time. Providers need time to integrate the machine-learning technology with traditional tools, and, more importantly, IT Ops needs time to learn how to use these tools.

This will be both a major opportunity and a major disruptor in the management and monitoring space. But as mixed cloud and on-premises deployments get more complex, this technology seems to be arriving just in time to meet enterprise monitoring and management requirements.

There are some downsides to leveraging these tools now or in the future. The largest is just the cost of buying and deploying machine-learning technology, which is about twice that of traditional technology. You need to set up the learning models and the data points, as well as test before deployment. Moreover, machine-learning professionals are costly, and they’ll have to spend a lot of time up front to ensure that you get the initial models right. Once those are established, it’s a pretty automated process from there.

Another downside is the possibility that you might misapply this technology. While machine learning is hot, and this seems like a cool solution, it’s not right for all problem domains. Management and monitoring within relatively simple and static architectures could be overkill. 

Applying a complex ITOA solution to a simple problem means you’ll spend more money for no benefit, since your problem probably didn’t need an AI solution in the first place.

Finally, sometimes there are not enough data points to justify a machine learning–based management and monitoring tool. Machine learning is fed by data. The more data that machine-learning engines accrue, the better they work. If you don’t have the ability to gather data from your existing cloud or non-cloud solutions, and data collection retrofitting is out of the question and budget, then this may not be the right technology for you.

Your machine-learning engine won’t have the right value without real-time data being fed into it, and the ability to learn from the data. A good piece of advice is to put your data collection infrastructure together prior to moving into machine learning–driven management and monitoring.

IT operations is heading toward increased complexity. You're probably already standing up public and private clouds in hybrid-cloud and multi-cloud configurations that are more complex to manage. Yes, the addition of cloud computing does drive more productivity, agility, and operational cost reduction, but management, monitoring, security, and governance all become more difficult.

Machine learning: No magic bullet

While many view any shiny new technology that comes down the line as a magic bullet to solve an impending problem, there are good reasons to use machine-learning technology, and other good reasons not to use it.

Machine learning is most effective and cost-justifiable if you have the needed data gathering and monitoring already in place. Again, machine learning–based management and monitoring are worthless without the data that feeds the brain—the data that allows it to adjust, react, self-heal, and keep learning how to do all this better over time.

One of the biggest mistakes enterprises typically make around the use of this technology, no matter if it’s machine learning in a management and monitoring use case or not, is data starvation. You need stimulus, or data gathered from your environment, to learn how to properly react. Machine learning–enabled monitoring and management tools are no different.

Access to data feeds, both real-time and historical, must be on the critical path, as well as the ability to automate the management and monitoring of all non-cloud and cloud-based resources. Also, decide how deep to go. Many enterprises are okay with just monitoring infrastructure such as networks, storage, compute, etc. However, there is a business case to be made for monitoring and management tools that go down to the application and database levels as well. 

Perhaps it makes no sense to build, deploy, and leverage microservices unless you’re willing to monitor them on an ongoing basis. Keep in mind that you could have a machine learning–enabled management and monitoring tool set that keeps track of 10,000-plus data points and historical data from those data points. That’s much too complex for a single human to keep up with.

Make your move when complexity demands it

The question is not whether you need this technology; it’s when. Most of those who push out management and monitoring tools have already begun looking at machine-learning technology to augment capabilities, or they’ve already integrated it within their offerings. It’s not a trend, but a solution to a problem that almost all enterprises face: how to deal with complexity. 

Keep learning

Read more articles about: Enterprise ITCloud