Cloud performance management: 3 secrets to success

As more applications migrate to the cloud, effective performance management is becoming a top concern.

Take the case of a small Midwestern bank that moved to the cloud. It migrated several business-critical applications to a public cloud in hopes that performance would increase and costs would go down.

Like many businesses that move to the cloud, the bank was in for a surprise. The applications ran slower on the cloud-based platforms, and the bank then discovered that the decision to keep the data on premises was not a good call. Employees and customers were angry that it took more time than it had before the move to complete their work or do transactions on the bank's website.

So, what happened? Failure to understand overall performance was the root cause of this problem, and the resulting lack of cloud performance management was the mistake that sealed the poor performance deal. The bank's IT team did not know the underlying secrets of cloud performance management (CPM) or what was needed to avoid such issues in the future.

For the bank and everyone else, understanding how to approach CPM first requires understanding the core patterns of management: using agents or not using agents, being predictive versus reactive, and focusing on patterns of response, including self-correcting systems.

Here are three secrets to success with cloud performance management.

Multicloud Monitoring: How to Ensure Success the First Time

Take the right approach to CPM

There are a few ways to approach CPM, both as a logical and physical concept. You first must develop a conceptual understanding of what CPM needs to be about. This means building an approach and architecture without just tossing tools at the problem.

You do this for a few reasons: First, many mistakes are made when enterprise IT moves to tools and technology too quickly. Tools are important, but they won't save you if you make incorrect choices because you do not yet understand the problem domain that CPM will serve. 

Second, this becomes durable over time. In other words, while tools and technologies will change as updates occur or companies go out of business, the overall CPM concept should remain consistent. 

Bottom line: Just add in the tools and technology necessary to support CPM. While this seems simple, when you consider that the tool stack will be different from company to company and cloud to cloud, it presents a set of tough choices in terms of the required monitoring, management, and self-healing capabilities.

Consider the tradeoffs

The core tradeoff with CPM is the ability to proactively find and solve performance problems versus the cost of doing so. 

Many people say, "With enough time and money, most problems can be solved." While that's true with CPM, try to tell your CFO that your budget will triple this year.

In the real world, CPM usually means doing the best you can with the resources you have. While CPM can be an issue when both people and tools are underfunded, part of the problem is finding a happy medium between effectiveness and costs. 

As a rule of thumb, you should spend about 15% of the total operational costs of private and public clouds on CPM operations. If you underspend, you probably won't invest as much as you should. Overspend, and those who create and deploy a CPM plan and tools may need to find more cost efficiencies. 

Another tradeoff is technology configuration. As with the bank example above, those who choose to do things such as leave the data on premises with the processing in the cloud will find that the latency over the open Internet may be a tradeoff for leaving the data on-site. The question is, Are these tradeoffs acceptable, or are you trading peace of mind for performance?

CPM's impact on efficiency

CPM itself may cause its own performance issues. Monitoring, management, real-time analytics, and the use of agents mean that these tasks will take up too much processor and I/O time and themselves cause poor performance. 

This is the case about 30% of the time, when CPM tools and technology are layered into a set of cloud-based systems and all of the logging and analytics options are turned on. Things slow down because CPU and I/O resources must contend with a heavy set of processing, along with resource usage to run the core applications. If you think it's ironic that CPM tools and technology slow down the cloud-based systems, you're right.

It's the business, stupid

IT's objective is to meet the needs of the business, and the tasks necessary to leverage CPM are no different. You need to understand the business requirements to ensure that what is important is treated as important. 

Priorities must be set, such as tagging the applications that are most important to the business, and decisions must be made on which are the higher priorities for CPM, considering that more will be lost if they do not perform up to expectations. 

It's common for those charged with CPM to treat each application the same. In reality, some applications are more important than others, and they should be put at the top of the priority list when performance issues are diagnosed. Then resolve those issues with either automated mechanisms or humans. 

A prioritized CPM approach and tooling should directly reflect the needs of the business.

I/O meets CPU

You can boil the hundreds of functional applications and data stores that make up the notion of performance down to two fundamental building blocks of CPM: I/O and CPU. 

When you do CPM, you can separately monitor I/O (storage) and CPU (processing), which is how it was done back in the on-premises days. Overall cloud performance is limited by performance issues of the I/O or CPU and how you leverage each from the application workloads. 

Think about CPM at a less primitive level, looking at the functions of the applications themselves, not just how applications leverage the platform subsystems. This is more complex than with on-premises systems, considering that the resources come from a pool that leverages a multi-tenant approach. 

The result is that your application's performance may vary based on the time of day, because applications leverage resources with performance characteristics that constantly change. Therefore, your CPM should be set up and operate with these assumptions and learn to diagnose performance issues at the primitive platform and application levels. This will result in very difficult-to-determine conclusions and recommended fixes.

In many cases, applications are not written for the cloud. The result is an application that can't properly leverage the primitive platform features. The end recommendation to solve a performance problem: Refactor or modify the applications to work around these performance issues.

The three-step plan

To remedy these fundamental problems of cloud performance, here's what to keep in mind:

1. Make sure to start CPM from a logical level, not a physical one  

You risk making huge mistakes when you just toss tools at the problem. Let’s face it: Most IT pros would rather evaluate tools than try to understand core problems and create a conceptual framework to solve them. 

Each cloud domain is unique. The tools other companies pick are unlikely to be the tools you need. Work from the abstract to the physical to define your macro requirements and micro requirements, as well as future patterns.

2. Consider the impact of CPM on performance  

It's a bit disturbing that the CPM system itself could cause cloud computing performance issues, but that’s often the case. Why? Enterprises are more likely to turn on everything, including the use of software agents that exist on the same platform as the applications, or log everything, or perform other invasive activities that take more CPU and I/O time, competing for resources with the applications themselves.

There are tricks to this, of course. For example, you could move the logging and agents to a separate machine instance, or turn off the more invasive, performance-killing features of the CPM tools. But that can cost more money. 

What you gain is real-time performance data, which allows you to spot and correct performance issues, as well as the system that gathers and analyzes the performance data that’s actually the root cause of the problem.

3. Continuously improve  

If you’re doing CPM the same way and using the same tools as you did two years ago, you need to address this first. CPM is about balancing performance with cost and business needs. You'll need to adjust as you go. 

For instance, turn off logging on the machine instance that stores the data to increase I/O performance, and log storage behavior from another machine instance. This performance issue was probably discovered through trial and error, and you should discover similar issues on a weekly basis. In other words, you need to continuously improve CPM, and you'll never be done. 

CPM is becoming important because cloud is becoming important. It's not about success with the first cloud solution instance; it's about success in ongoing operations. 

[ Upcoming Webinar (Oct. 23): Simplify Discovery and Change Management for Cloud and Container Environments ]