You are here

Microservices, containers, and operations: Guess who's responsible now

public://pictures/Bernard Golden Photo 3.png
Bernard Golden, CEO, Navica

One of the benefits of the containers DevOps teams tout is that by sharing the same container throughout the application lifecycle, you simplify the relationship between development and operations groups. This ability to share is quite different from how things work when you develop applications in traditional bare metal or virtualized environments. It also changes who's ultimately responsible when the code moves over to production.

In a traditional development scenario, many IT organizations can't provide development and QA groups with the same infrastructure as will be used in the production environment, so they test on cut-down versions of it—or even completely dissimilar ones. For example, in a VMware shop, developers might use the Vagrant utility to write and test code. 

The challenge comes when the development team delivers the code to operations for placement into production and it doesn’t work properly. “It worked on my machine” has traditionally been the common refrain in these kinds of situations, but that never did much to solve the problem of how to get properly working code from developers to production more quickly. In the new world of DevOps and containers, developers must assume more responsibility for the end product.

[ Digital transformation can be a costly failure without proper controls. Find out how IT4IT value streams can help in this upcoming Webinar. ]

Containers have redefined delivery

Containers have changed the dynamic. For the first time, the application code and environment in which the developer works can be delivered, unchanged, to operations. Indeed, in reengineering your IT organization for cloud, such agile artifacts are a prerequisite for next-generation IT organizations.

The benefits of delivering application code and environment in one neat package is one reason why there's such a frenzy of interest in containers. The phrase you hear about the appropriate practice for this is “immutable code,” meaning that no code changes are performed once the application leaves the development group. If code changes are required to fix a problem or change a configuration, you need to create a new code release upstream and then forward that to a downstream group for use.

This is a powerful paradigm. Instead of operations groups retrofitting changes into applications to get them to operate properly, operations focuses solely on placing the transferred container into a production execution environment. Some people have praised containers for allowing operations to no longer be responsible for the correct operation of applications, enabling them to focus instead on ensuring that the container execution environment is stable.

But that’s not as simple as it sounds. Actually operating containers with next-generation applications can be challenging. The dynamic nature of the applications themselves, with their constantly shifting execution topologies, along with the challenge of inevitable infrastructure failures, means providing reliable container execution is no small feat. And obviating operational responsibility for application execution poses another problem: Somebody must be responsible for ensuring that the application operates correctly. If that role does not go to operations, then who does hold that responsibility?

[ Looking to bring innovation into your enterprise? Learn from others' Enterprise Service Management (ESM) implementations—and get recommendations for deployment. ]

Developers take on more operational responsibility

It's the development organization. The logic of immutable infrastructure means that any operational issues that require code changes are sent to development, which must address them and create a new artifact. In some respects, this makes sense. After all, who understands an application's issues better than its creators? Also, the knowledge that the developers will bear responsibility for any issues that surface with the code is likely to make them more careful. Netflix dubbed this transfer of responsibility for application operation “NoOps” to indicate that operations has no role in keeping elements of the video streaming service up and running. That's up to the application groups.

In the era of DevOps and microservices, application groups ideally are the ones that stand behind their code, address issues when problems arise, and issue new artifacts to fix them. But placing responsibility for application operations with whatever group created the container is not a universal panacea for application issues, especially in the context of a microservices application.

Remember the Ginsu knives commercials where the narrator said, “But wait, there’s more” and then went on to present other benefits you’d receive by buying the knives? It's the same way with microservices application operations. There’s more to container-based application operations than forwarding issues to whatever group is responsible for the container in which an issue arose, because many microservices components rely on other services to perform correctly. For example, a microservice that calculates shipping rates may need to rely on another microservice that figures out distances between the customer’s location and the fulfillment center from which the good will be dispatched.

Who’s responsible for that service being up, available, and responding correctly? If you guessed another application group, you’d be right.

Every group must take responsibility for its own code

If an issue is raised with one application group and, after identifying the cause of the problem the developers discover that responsibility for the issue actually resides with a different group, the issue should be passed to that group to fix the problem. 

This is why the use of microservices forces a serious change in how you operate applications. The spread of operational responsibility poses significant challenges in how application issues are tracked, repaired, and fixed.

Most IT organizations need to work on how to shift responsibility for application issues "left," or upstream, to the development group. The foundation for this mode of operation is an automated DevOps capability that can speed code changes into production. That requires use of immutable artifacts, but it also requires that application groups have access to a production-like environment.

The fact that most microservices-based application components call other services means that development teams must have available versions of the services that the application will use when in production. Those services might not need to be as scaled or fully featured, but the team needs access to the basic functions at a minimum.

Moreover, good monitoring and logging functions are critical to enable root-cause analysis of application problems and correlate issues across organizational boundaries. If a problem gets referred to an application group, that group must have supporting capabilities that allow it to identify where the problem occurred in its service. Should the cause of the problem reside in another service, it’s important that the first group be able to provide data to the second that enables it to track the problem within its service and create a code patch to address it.

Erasing the operations-development divide

While it's clear that developers must take on more operational responsibility, just asserting that containers provide a neat dividing point between application groups and operations, and allowing the latter to focus on providing a great container execution environment, is a bit of an oversimplification.

Moving to immutable code and shared artifacts is a great advance over the traditional methods for placing code into production, but that should not blind you to the fact that operational responsibility for the application must lie with someone. It’s not as if containers make application issues go away. It just means that responsibility can be partitioned, with groups tagged with addressing the issues for which they are best suited. Yes, developers take on more responsibility. But there's plenty of responsibility to go around.

The only tenable solution, going forward, is divide and conquer—each group must take responsibility for its part of the application lifecycle. However, unlike the old "throw it over the wall and let them deal with it" world of applications, this new way of doing things also requires cross-group collaboration. In the end, we'll see operations responsibility become like the flight deck crew of an aircraft carrier—each group has a set of responsibilities for which it take the lead, but all groups collaborate toward a common aim.

 Image credit: Gareth Bellamy

[ Get Report: The Top 20 Continuous Application Performance Management Companies ]