Scaling containers: The essential guide to container clusters

Scaling containers may seem fairly straightforward, but it's not simply a matter of using clusters and cluster managers. There are a few processes and procedures around proper use of container clusters that many developers and application architects don't understand. What's more, the technology, which is still maturing, has some significant limitations. If you're not fully aware of these, your container-based applications could either fail to scale entirely, or fail to provide the resiliency required.

This week I'll take the mystery out of clustering and managing containers by examining the latest technology, how it's properly applied, and the emerging best practices that make container clustering work. I am not going to talk about containers in general, however, so if you need to bone up, be sure to read the essential guide to software containers article first. Then you'll be ready to focus on container clusters and container cluster managers.

Testing in the Agile Era: Top Tools and Processes

Clustering basics

So what are compute clusters? As you separate units of work into small batches, you need a way to manage those workloads within a management layer. This layer must let you share resources, schedule tasks, and treat many running processes as one unified, scalable, and well-behaved solution across all workloads. This is how transactional systems of years past have functioned. Examples include CICS, Tuxedo, and more recently, Java containers such as J2EE application servers, and other proprietary clustering technologies.

Compute clusters form a shared computing environment made up of servers (nodes), where resources have been clustered together to support the workloads and processes running within the cluster. As you combine processes within a cluster (called a task), they create the solution, which involves combining the tasks into a job.

To pull this off, you need to manage the cluster or clusters using a cluster management framework, which typically consists of a resource manager that keeps track of resources (memory, CPU, and storage). When an executing task needs a resource, it must go through the resource manager to obtain those resources. You have well-managed access to resources, which means you can manage the impact on the platform, and that allows the whole thing to scale, either virtually or physically.

Moving on, there are other components of the cluster manager you need to understand, such as the task manager, which is responsible for task execution and state management. Cluster managers also contain schedulers that manage dependencies between the tasks that make up jobs, and assign tasks to nodes. The scheduler is a core component of the cluster manager.

Now that you understand what cluster managers are in the abstract, it's time to consider the variety of offerings out there and what core values each will deliver to your containerized applications.

Docker Swarm

The Docker Swarm cluster manager offers clustering, scheduling, and integration capabilities that let developers build and ship multi-container/multi-host distributed applications. It includes all of the necessary scaling and management for container-based systems.

Think of Swarm as a native cluster manager for Docker that provides the common cluster management components identified above. This includes the ability to create and access a pool of containers, provide API access to container and cluster services, and scale multiple hosts. It also includes the ability to leverage Apache Mesos for large-scale deployments.

Swarm integrates with multiple discovery services, including hosted discovery with Docker Hub, Consul, Zookeeper, and a static list of IP addresses. Alternately you can bring your own discovery tools. In this way you can share and consume containers within your container cluster manager.

Another part of the Docker story, Docker Compose, provides orchestration and scheduling. It lets you define a multi-container application in a single file, as well as run an application in a cluster using a single command. Like Swarm, Compose integrates with the same array of discovery services. Want to take advantage of Compose? Running a Docker cluster is as simple as creating a YAML file and running a one-line command.

Google Kubernetes

When most people think of container cluster managers, they think of Kubernetes. Google was the first to market with a container cluster manager, and it set the standard for what container managers should do and how they should be designed.

Kubernetes consists of several architectural components, including pods, labels, replication controllers, and services.

  • Pods are ephemeral units that manage one or more tightly coupled containers. They also enable data sharing and communication among the constituent components. Pods can tightly group containers and schedule them into a node, with each pod obtaining its own IP address, as well as sharing the localhost and volumes. Containers that run inside a pod "share fate," meaning that if one container dies, they all die.
  • Labels are metadata that's attached to objects, including pods. Labels let you ask questions such as "What is the load on nodes marked as 'Asburn Data Center," or perform actions, such as rolling out a new version of SSL only for containers marked "client facing."
  • Replication controllers create new pod "replicas" from a pod template to ensure that a configurable number of pods are running. They do this by polling to insure that a specified number of Pods with a given set of Labels are running within the container cluster. The use of replication controllers makes Kubernetes a declarative system.
  • Services offer a low-overhead way to route requests to a logical set of pod back ends in the cluster, using label-driven selectors. Services provide methods to externalize legacy components, such as databases, with a cluster. They also provide stable end points as clusters shrink and grow and become configured and reconfigured across new nodes within the cluster manager. Their job is to remove the pain of keeping track of application components that exist within a cluster instance.

CoreOS Tectonic

CoreOS' Tectonic cluster manager is essentially Kubernetes as a service. It is available on Amazon Web Services, or you can obtain an on-premises version. Tectonic is compatible with both the Docker and CoreOS Rocket containers.

This product is eseentiall an out-of-the-box Kubernetes cluster with an easy-to-use dashboard. Think of Tectonic as a cluster manager for those who don't want to mess with the details required to build and implement a cluster manager. CoreOS also provides 24x7 support, as well as training.

Apache Mesos

The little known Apache Mesos cluster manager can take many containerized applications to the next level of scalability. It's known for stability, and you can use it with Docker to provide scheduling and fault-tolerance. Mesos uses a Web user interface for its cluster management dashboard.

Mesos is more than just a tool; it's a platform you can use to build and deploy distributed applications, and you can use it with or without Docker. Many organizations that need to scale, such as Twitter, AirBnB, Netflix, Ebay, and Hubspot, use Mesos.

But there is a tradeoff: With Mesos you must drive the configuration, and it can be complex and difficult to set up. To address that, you'll need Mesophere.

Mesos and Mesopher together provide frameworks for other technologies that have nothing to do with containers, such as Hadoop and Cassandra. These tools include schedulers for other cluster managers, such as Mesos-Kubernetes, and Mesos-Swarm. These frameworks are best suited for such as platform-as-a-service, long-running applications, big data systems, batch scheduling systems, and mass data storage.

Amazon EC2 Container Service

The Amazon Web Service EC2 Container Service (ECS) cluster manager uses shared state scheduling services to execute processes, which run as containerized applications on EC2 instances. A cluster that lives on a virtual private cloud makes up the container service, so the EC2 user is in complete control.

With EC2, container instances talk to ECS services through an agent—a Docker container that runs on the container. ECS uses a shared state model, including optimistic schedulers for short-running and long-running tasks, and uses a fully atomicity, consistency, isolation, durability (ACID)—compliant, distributed data store.

Like other cluster managers, ECS monitors the health of your cluster, including the availability of master nodes that provide scheduling and resource management services. The architectural objective with EC2 is stability, scalability, and high availability. Although it lacks massively scalable use cases today, those are coming.

In fact, ECS was built to exist in larger container-based cluster management ecosystems. As such, ECS is compatible with all of the cluster management systems presented above. If you use a framework to execute tasks on a Mesos-managed cluster, or example, it's a compatible framework for ECS. The system is also extensible though a low-level API that lets you access higher-level services above ECS.

Where container clusters fall short

Containers and cluster managers still aren't fully mature, and there's a long list of things that need to improve. Here are six key areas where you may run into issues.

  • Network and storage layers need to be more portable. Portability and scalability are critical to the true value of containers and cluster managers and there's room for improvement here. The organizations behind most cluster managers available today say that they plan to address this issue in future releases, but the ones that best solve the problem will quickly gain market share.
  • Security services are incomplete. The latest release of Docker (Docker 1.8) includes Docker Content Trust, which uses public key infrastructure (PKI). However, many cluster managers don't have even basic services to address security at the level required by enterprises. These tools need to be able to trust containers, and manage and the secure data associated with them. Identity and access management services also need to be more of a presence inside cluster managers than they are today. Currently, you have to integrate these services uisng third-party tools.
  • Cluster managers and orchestration tools should work better together. The ability to automate the use of containers and cluster managers should be driven by orchestration tools that place a layer of abstraction between cluster managers, containers, and the processes that manage them, and sequence them to create a solution. Unfortunately, there's still much work to be done here.
  • Cluster managers and container frameworks need more plug-ins. With Docker, you can extend the capabilities of the Docker Engine by loading third-party plugins using two plugin mechanisms (aka extension points). Network plugins allow third-party container networking tools to connect containers to container networks. Volume plugins enable third-party container data management tools to provide data volumes for containers that operate on data. What's missing are more middleware services related plug-ins, such as connections to messaging systems and transactional engines.
  • Open standards are lacking. While Docker itself is an open standard and open code-based, all of the Docker vendors are moving in different directions. You can see this most clearly with the container cluster manager offerings mentioned above. The mechanisms each vendor uses have similar patterns, but are, in fact, very different. The value of using containers derives from the technology's portability and scalability, and you achieve this by using container cluster managers. The industry needs to get behind a common set of basic standards, and continue on that path. Otherwise, containers won't deliver the value that enterprises expect.
  • Containers that support microservices still need improvement. A microservices architecture requires creating small services that are light-weight, with independent deployment, scalability and portability. Container technology provides an ideal environment for deployment of microservices, with respect to speed, isolation management, and lifecycle. But there are not yet a great deal of well-understood best practices, nor sound development mechanisms that you can use to create repeatable microservices within containers.

The number of potential applications for containers and cluster managers is vast, especially when you consider how quickly organizations are moving to public and private cloud-based platforms. Moreover, with lack of portability and lock-in on the minds of enterprise IT organizations, containers could become a quick response that gets many in IT organizations off the fence, and more developers coming onboard every day.

For containerized applications to be viable, however, they need to be scalable, fault tolerant, security-oriented, and manageable.

Cluster managers are the answer. That's why over the next few years enterprises will make massive investments in this technology.

Topics: Dev & Test