Top container monitoring challenges and how to overcome them

Docker’s popularity is unsurpassed. However, the complexity of containerized application environments is preventing many companies from reaping the technology's lauded benefits.

More than half of organizations surveyed by Gatepoint Research and CA say their use of Docker container technology is "just getting started." The skills gap is the biggest hurdle in Docker monitoring, according to nearly half of the respondents.

So all you need to do is boost your team's Docker capabilities, right? Not so fast: When deciding what special skills might be required to monitor containers, keep in mind that clearing this hurdle has more to do with approach than abilities.

Monitoring: The same, but different

"Monitoring for containers really isn’t that different from monitoring traditional deployments," said author and Netflix senior software engineer Paul Bakker. "You need metrics, you need logs, you need health checks and service discovery. But we needed all these things in the past, as well."

Bakker said the main difference is that for a lot of people the introduction of containers also seems to imply a different architecture of their systems—typically an architecture with more individual deployments. Although containers can make it easier, faster, and potentially cheaper to deploy such architectures, he said, teams should be careful conflating deployment tooling with architecture.

"When it comes to monitoring, at Netflix we integrated our container deployments with the same tooling we use for EC2 deployments. From a monitoring perspective, it doesn't matter if a service is deployed in containers or not."
—Paul Bakker

[ Also see: 30 essential container technology tools and resources ]

Be prepared for the challenges

That's not to say that a container-based architecture doesn’t bring its share of monitoring challenges. Containers are ephemeral, Red Hat vice president of cloud platforms Joe Fernandes said; in other words, they can be restarted by the container orchestration scheduler (Kubernetes)—and can be restarted on the same machine or different machines.

This makes monitoring an application running in containers more complex than running applications running in fixed VMs or on bare-metal servers.

Also, multiple containers will typically run on a single Linux OS host—either a virtual or physical machine—with each container running in its own sandbox. As such, monitoring tools must be able to account for that.

"Real enterprise applications typically run across multiple containers, with cloud-native applications potentially spanning many different microservices. So analyzing the performance of these apps means being able to trace requests across all of the containerized services that are part of the app."
—Joe Fernandes

Daniel Barker, chief architect and DevOps leader at the National Association of Insurance Commissioners, agreed that there are several technical challenges that need to be addressed. But he said he believes that many issues that arise are less technical and more of an artifact of how most companies think about systems.

Before containers, most companies were very concerned about keeping individual servers up and running well, he said.

"Unless a customer called, many monitoring systems didn’t know there was a problem unless memory, CPU, or disk space were affected. We need to shift the thinking away from static hosts and the related metrics and toward customer-facing metrics."
—Daniel Barker

The other factors are still good inputs to a system, "but they are no longer the criteria by which performance is graded."

Relearning what you already know

Barker said he thinks organizations have to relearn the fundamentals and look at their systems more holistically. DevOps can be a great helper here.

IT teams need to learn more about the applications that are being developed and which metrics are important for the business, he said. At the same time, developers need to "learn more about the struggles of IT teams so they can code in a way that helps operations function."

A great way to facilitate this knowledge exchange is to pair developers and operators for a couple of weeks, such as in a rotation program where you swap one team member from both sides each week, he said.

Teams also need to understand that monitoring is no longer reactive—it has to be proactive. The outputs of your system now need to create automated inputs and predictive analysis, Barker said. If your response times are degrading, then your monitoring system should identify that and tell your operating environment to stand up more instances. It should then do the opposite as load decreases.

"One thing I did many years ago in a public cloud, was to stand up double the number of servers we needed, run a performance test against them, and then kill off the bottom half. This was all automated and driven through the monitoring data."
—Daniel Barker

Don't use legacy monitoring tools

Barker warned, though, that not all legacy tools can handle this modern infrastructure, so it might be best to use a separate environment from your current systems to avoid having your analytics crash everything that's making money.

In an orchestration system such as Kubernetes, you will "cause yourself great pain if you try to treat it like your legacy data center," Barker said. In a legacy data center, everything is fairly static, but in Kubernetes, almost nothing is static.

If you're trying to monitor individual instances, they may be gone before you have a chance to do anything with them. You have to learn how to "let Kubernetes handle the stuff below the line, which means you need to fully understand it, and then you can effectively manage what's above the line," Barker said.

What you need to get up to speed

Both Barker and Fernandes stressed the importance of understanding Kubernetes as a means of orchestrating and managing your containers. Just learning the basics of how it deploys and runs applications can provide a lot of insight into how all cloud systems are built, which is necessary information for effective monitoring. Then IT teams need to consider new tools and approaches for monitoring.

It’s most important that organizations have a metrics aggregation system, log aggregation system, alerting and visualization systems, and distributed tracing system, Barker said. He generally advises that companies use a SaaS tool for metrics, and both he and Fernandes like AWS Elasticsearch Service for logging.

Adding Istio knowledge will help teams understand even more layers of modern architectures, Barker said, particularly the networking layer, and show how to easily wrap applications with basic monitoring when they may currently have none.

IT Ops teams also need to learn the basics of machine learning, Barker said. Many tools are now being introduced that include machine-learning algorithms, but it’s important to understand how they work so teams will know when they aren't operating properly.

Though it’s preferable to implement all these tools at once, that can be cost-prohibitive for many organizations. In those cases, Barker typically recommends companies roll them out in this order based on their cost and the likely benefit derived: metrics, logs, alerting and visualization, and, finally, distributed tracing.

Don't neglect training

The right tools will only get you part of the way there, though. The tougher step is training the team. Barker often recommends Practical Monitoring author Mike Julian's training course to provide a thorough understanding of the fundamentals of effective monitoring.

Barker also suggested then running a hackathon soon after the course, to give the team members a chance to use their new skills in a low-risk environment, then follow up with tool-specific training or online training at the pace of each individual.

"The important part is creating a shared understanding within the team of effective monitoring strategies, and then letting them grow from there."
—Daniel Barker

[ Also see: The state of containers: 5 things you need to know now ]

Container-based monitoring: Tool up wisely

Ultimately, container-based monitoring doesn't involve any serious additional challenges that users of traditional monitoring tools aren't probably well aware of already.

While you will need more modern tools to keep tabs on containers than you may be used to, the strategies of deploying them and mastering them, ideally through a formal training process, largely remain the same.

However, getting up to speed with emerging technologies such as machine learning—a vital component of today's container monitoring tools—will streamline this process considerably, speeding you on your way to containerized success.

Keep learning

Choose the right ESM tool for your needs. Get up to speed with the our Buyer's Guide to Enterprise Service Management Tools
What will the next generation of enterprise service management tools look like? TechBeacon's Guide to Optimizing Enterprise Service Management offers the insights.
Discover more about IT Operations Monitoring with TechBeacon's Guide.
What's the best way to get your robotic process automation project off the ground? Find out how to choose the right tools—and the right project.
Ready to advance up the IT career ladder? TechBeacon's Careers Topic Center provides expert advice you need to prepare for your next move.

Read more articles about: Enterprise IT, IT Ops

You are here