5 critical elements to building next-generation cloud-native apps
It's undeniable. The nature of IT is changing, from an inward-directed, process automation focus to an outward-directed, business offering focus. IT is changing from "support the business" to "be the business."
This poses many challenges to "IT as usual," including development and operations processes, learning the language and behavior of a minimum viable product (MVP), building teams that mix IT and business personnel, and finding the right skills mix.
Of all the challenges facing IT as it moves to its new role, perhaps none is more challenging than developing the applications to support "be the business." Simply stated, the traditional application architectures, operations, and pace of development are completely inadequate in this new world. One phrase often used for this new application paradigm is "cloud-native apps." That's not a bad way to characterize applications that typically have more rapid release cycles and highly erratic workloads, and use DevOps as an inter-group process methodology.
The five elements of a cloud-native app
The question is, what are the elements of a cloud-native app? There five critical aspects:
Application design: The move to microservices
API exposure: Internal and external access via standardized methods
Operational integration: Aggregating log and monitoring information to enable application management
DevOps: Automation across the application lifecycle
Testing: Changing the role and use of quality assurance (QA)
Each of these elements is an important part in bringing a cloud-native application into production. Failing to address any of them is likely to result in an application that fails to satisfy external users and internal constituencies. However, successfully addressing them increases the probability of delivering an application that meets the needs of an important business initiative.
1. Application design: The new microservices architecture
One of the biggest requirements of cloud-native applications is speed: rapidly delivering and iterating application functionality. One of the biggest impediments is the traditional application architecture. In such applications, all of the code that makes up the application is bundled into one monolithic executable.
This simplifies deployment—there's only one executable to install on a server—but it makes development much more complex. Any code change, no matter how small, requires a rebuild of the entire executable. That in turn requires integration and testing of the code from the developers or even development groups involved in building the application, even if their code did not change at all! In typical IT groups, integration and testing can take up to two weeks. Worse still is the fact that development teams can release new features only during a "release window," because the overhead of the integration process restricts how often releases can occur. The net effect is that monolithic application architectures reduce the frequency of updates and hamper the ability of the company to respond to market demands.
By contrast, the microservices approach, pioneered by video streaming provider Netflix, reduces the integration and testing burden by changing the executable deployment model. Instead of a single large executable, microservices deconstruct a single application into a number of separately executing functional components.
This deconstruction enables each functionality component to be updated and deployed without affecting the other parts of the application. In turn, this reduces many of the issues with the monolithic architecture:
Each microservice can be updated independently on a schedule appropriate to the functionality it contains, because all of the microservices executables operate separately. So, for example, if one microservice provides functionality associated with a business offering that is experiencing rapid change (e.g., an e-commerce user promotion offering), that microservice can be frequently changed without affecting parts of the application that change less frequently (e.g., user identity).
Integration overhead is reduced, which accelerates functionality deployment times. Because each microservice operates independently, there is no need to integrate its updated code with code from other microservices. This has the effect of reducing (or eliminating) integration efforts, making it much faster to roll out new functionality.
Erratic workloads can be handled much more easily. With monolithic architectures, traffic spikes require that you install the entire executable and join it to the application pool. With very large executables, this can take a significant amount of time, making it difficult to respond to erratic traffic loads. Moreover, if only part of the application (such as a video serving function) experiences heavy use, it is still necessary to deploy copies of the entire monolithic executable, which results in wasted resources. With microservices, it is only necessary to scale those components associated with executing a particular function (e.g., serving up video), which reduces scaling time and wasted resources.
Testing is simplified, since only the functionality associated with a particular microservice needs to be validated, rather than a code change requiring the functionality of the entire application to go through validation.
Of course, moving to a microservices architecture raises issues of its own. You'll face two primary challenges: how to partition functionality into individual services, and how to connect the individual microservices so that the aggregate serves as a complete application. You address the second issue with API connectivity. I'll get to that in the next section.
Partitioning functionality appropriately is extremely important. Fine-grained microservices enable rapid functionality iteration and reduced integration efforts, but impose application monitoring and management complexity. More coarsely grained microservices make application monitoring and management simpler, but can require integration due to the larger number of functions that must be aggregated.
One way to partition these functions is to look for natural clumps and make them separate microservices. For example, the functions associated with user identity and authentication are commonly self-contained and often used by multiple applications. It would make sense, then, to "Hive" it off into its own microservice.
Another strategy is to follow Conway's Law, which notes that system designs often mirror organizational structures. In the microservices context, this means a natural way of partitioning microservices is to see how your technical teams are organized. If you have a "payment transaction" team, that might be a good individual microservice, and so on, through the organizational structure. Of course, one has to look carefully to ensure that these structures make sense as a mirrored microservices architecture, but microservices that span organizations will be challenging as well.
Overall, the move to microservices is an important and growing trend. The microservices architecture lends good support to cloud-native applications, with their unique needs.
2. APIs: How microservices talk
One obvious challenge in microservices application architectures is getting different services to communicate?how to accept service requests and return data. Furthermore, it is necessary for the "front end" client-facing microservice to respond to user requests from a browser, a mobile phone, or another type of device.
Use RESTful APIs to handle communication in microservices-based applications. These APIs offer an interface that you can call by way of a standardized protocol. That makes it easy for external callers, whether from another service located on the same LAN or one located across the Internet, to know how to format a service request.
Each service treats its API as a "contract." If a call is formatted properly, has the appropriate identification and authentication, and carries the correct data payload, the service will execute and respond with proper data. Conceptually, APIs are quite straightforward, but in practice, there are several elements that are vital to make APIs a viable connectivity mechanism. These include:
API versioning: One of the great benefits of a microservices architecture is how it enables frequent updates of functions. Sometimes, new functions require a new API format due to additional required arguments, different return payloads, and so on. Just updating the existing API to require callers to interact with the updated information is a poor strategy, because until the API callers are updated to support the new format, everything will break. Therefore, you should keep the existing API format and behavior in place for each microservice while providing a new version of the API that supports the new format and behavior.
Throttling: Too many calls to an API can overwhelm the ability of the service to respond, reducing the performance of the overall application. Moreover, externally facing APIs can be hit with distributed denial-of-service attacks, which attempt to break applications with enormous traffic loads. For these reasons, it is important to monitor API traffic calls and reduce traffic in very heavy load periods by refusing calls.
Circuit breakers: Just as a service can find its API under too much load, a service may not be able to respond quickly enough to legitimate requests. Because performance of the application may be hindered by a slow-responding microservice, "circuit breakers" are an important part of each service. The circuit breaker puts a stopwatch on microservice response time. If the service takes too long to respond, it stops execution and returns standby data that allows the overall application to execute. Of course, this requires the application designer to build in standby data and have the service prepared to return it if complete request execution is impossible.
Data caching: This is the support mechanism for circuit breakers, but you can also use it in normal service execution. If portions of a service's data change infrequently (e.g., a user home address), it might make sense to cache that data and return it, rather than requiring a database lookup each time a user is validated.
These four elements demonstrate that moving to a microservices architecture imposes additional requirements beyond merely exposing an interface mechanism. Despite these complexities, APIs are here to stay. They are more useful, flexible, and convenient than any other interface mechanism.
3. Operational design: Simpler, yet more complex
One of the biggest challenges for operations in traditional environments is the overhead of moving new code releases into production. Because monolithic architectures incorporate all of an application's code into a single executable, new code releases require the entire application to be deployed. Frequently, this causes problems because:
The production environment differs significantly from the development environment. The classic refrain when bugs surface in production is a developer saying, "It worked in my environment!"
It's challenging to test new functionality in a production environment without shifting the entire environment over to the new application version. This makes it a major effort to release new code.
Most production environments have no way to back out code changes with new functions, so if there are issues with the code, it's an emergency to re-create the previous production environment
Microservices can simplify this situation enormously.
Because the environment is partitioned, code changes are segregated to specific executables, allowing updates to leave most of an application unchanged. This makes the process of rolling out code changes simpler, easier, and less risky.
Furthermore, because most microservices environments have redundancy for each microservice, it is possible to gradually roll out new functionality by shutting down part of the microservice pool and bringing up a replacement instance that represents the new code. Overall microservices remain operational during this changeover because there's always at least one instance running, and updates can be performed during normal working hours, when the most experienced staff are on duty.
In addition, because microservices communicate using APIs, you can expose new functions by exposing a new API version. If the new function proves failure-prone, you can configure the API management system to shut off access to the new API, ensuring that the older application version still is operational while you perform remediation on the new function. This enables backout of code changes, and addresses one of the most troublesome issues associated with monolithic code updates.
However, microservices are not a panacea. This new application architecture presents challenges in operations groups, which require IT organizations to prepare new monitoring and management systems that are capable of dealing with microservices-based applications.
One immediate challenge is that there are many more executables running that make up a given application. Because of that, monitoring and management systems must be ready to incorporate many more sources of data and make them comprehensible (and useful) to operations personnel.
Here are some elements of monitoring and management that need to be addressed in a microservices-based architecture:
Dynamic application topologies. In production environments, microservice instances come and go in response to code updates, application load, and underlying resource failure (i.e., the server on which the microservice executes fails or becomes unreachable on the network). Because instances are ephemeral, application monitoring and management systems must be flexible in terms of being able to have microservice instances join and leave the application topology.
Centralized logging and monitoring. The ephemeral nature of instances means that logs and monitor records are not persistent. To ensure application information availability, it is necessary to forward logging and monitoring records to a centralized location where they can be persistently stored. Commonly, this takes the form of log and monitoring record ingestion using a real-time event consumption service and storage in an unstructured database environment. This facilitates search and analysis, and also enables time-series comparison (e.g., looking for all days in which an application experiences 200 percent traffic spikes).
Root cause analysis. There's no denying that microservices architectures are more complex than traditional monolithic architectures. This means that a problem that surfaces in one service, for example, by way of a user-received error, may actually originate "further down" in the application in a service caching layer. This makes identifying the true cause of a problem more difficult, which, in turn, reinforces the need for centralized logging and monitoring. When you centralize and aggregate all of the application's different service logs, it is possible to see that when a user-visible error occurred, a service caching error also occurred, so you can pursue debugging in the proper place.
Despite the additional operational complexities associated with microservices, IT organizations should evaluate their current operational systems to identify areas that need to be upgraded or replaced. The benefits of the microservices architecture are so clear with respect to business requirements that this application approach will become the de facto architecture over the next five years.
4. DevOps: Table stakes for tomorrow's applications
Today's IT consists of distinct groups, each chartered with responsibility for one part of the application lifecycle, including development, application build, QA, deployment, and operations. Unfortunately, in most IT organizations, each group has unique processes that focus on internal optimization. This results in manual handovers from one group to another, often with each group creating an entirely new application executable in a new execution environment. Frequently, you have long gaps between when one group hands over to another and the second group takes up its task. These IT silos create extremely lengthy deployment time frames, which are disastrous in the new world of IT, where rapid deployment and frequent updates are the norm.
DevOps is an attempt to break down the barriers between these IT groups, and a key element is the substitution of automation for manual processes. In DevOps, your goal is to reduce, as much as possible, the time between when developers write the code and when you place it into production.
Implementing DevOps processes is not trivial. Most organizations start in one of two ways:
Address a widely known pain point in the application lifecycle. For example, QA groups often struggle to obtain sufficient resources and therefore delay testing while they locate servers, install and configure software, and test the new code. This can mean lengthy delays before they can give quality feedback to developers. Some organizations migrate testing to cloud-based environments, where QA can obtain resources more quickly. Many others go further, making the developers themselves responsible for creating tests that exercise any new code they write. That means quality assessment can execute as part of the development process rather than occurring later with the testing group.
Assess the overall application lifecycle via a technique referred to as value chain mapping (VCM). VCM reviews the entire lifecycle, identifying which groups are involved, what each group does, and how long each group's task takes. Each then develops a plan to streamline its individual process, while all groups collaborate to identify methods to avoid manual handovers.
Most IT organizations embarking on a DevOps initiative eventually find that they need to examine the entire application lifecycle using a VCM process. The reason: Streamlining or automating one group's work while leaving others undisturbed does not appreciably accelerate overall application delivery timescales. Only by performing a VCM, optimizing each silo, and then ensuring automation across groups can an IT organization respond to the new demands of "be the business."
The effects of DevOps can be quite dramatic. Many IT organizations find they can go from struggling to deploy new releases in less than six weeks to being able to do so weekly or even more frequently. Exemplars of DevOps, like Amazon, can deploy code changes hundreds of times per hour. While this is impressive, most enterprise IT organizations find that weekly or even daily releases are sufficient.
5. Testing: A dramatic restructuring
In most IT organizations, testing has been performed by a QA group that is thinly staffed and underfunded. That effectively restricts its testing to manual, functional testing to ensure that application functions operate correctly. Moreover, most IT organizations wait to perform QA until just before deployment, resulting in last-minute application rework or, worse, deployment of unsatisfactory code.
This approach may have seemed acceptable for "support the business" applications, but it is totally unacceptable for "be the business" applications. Application quality can't be an afterthought. QA testing must be a core part of the development process, and it must be performed earlier in the process so that issues can be identified and addressed before they cause a panic response or application outages.
The primary way to achieve this is to move testing earlier in the development lifecycle. Responsibility for functional test development moves to the developers, who create tests that exercise all new functions they write.
Doing this, however, entails more than merely shifting manual work from one group to another. You need to create an automated test execution environment that you can call as soon as new code gets checked in. When a developer checks in code (which includes the new tests), the code repository should automatically kick off a set of functional tests that test all of the code associated with the portion of the code on which the developer has been working.
This shift to developer-driven functional testing frees QA groups to focus on three other aspects of testing that traditionally have been neglected but are increasingly important for "be the business" applications:
Integration testing. This testing addresses end-to-end testing of an application, and exercises all portions of an application. Integration testing ensures that new code does not inadvertently impair existing functions when new functions are implemented. You can automate this testing to be performed at initial code check-in, and it can be extremely useful as a way to avoid unexpected errors in production. Implementing an integration testing environment requires an automated testing capability, dedicated testing resources, and investment in test development.
Client testing. With the increasing shift to mobile application access, it is critical to test applications on all of the most common mobile devices. Many IT organizations start with an informal "mobile lab" stocked with a collection of mobile phones for testing purposes, but they quickly find it to be inadequate, as it typically can't keep up with all of the new devices coming out. So most organizations turn to a mobile testing service, which provides an external collection of devices that's comprehensive and contains enough resources to support large testing volumes.
Performance/load testing. Cloud-native applications by their very nature tend to experience highly erratic load volumes. Many applications fail or perform poorly at higher volumes, either because some functions cannot handle heavy traffic or because the application was never designed for elasticity?the ability to grow and shrink in response to changing volumes. In a "be the business" world, application failure or poor performance can be harmful to a company's top line, so performance/load testing takes on a critical role for cloud-native applications.
Be the business
To paraphrase Charles Dickens, these are the best of times and the worst of times for IT organizations.
IT's charter is changing dramatically, from a "support the business" role to a "be the business" role as businesses move to a digital world. For IT organizations that traditionally have been shut out from core business discussions, this is an exciting prospect. But because many organizations are mired in legacy processes and organized into functional silos, the changes they need to make are daunting. The key to success lies in addressing the five key areas outlined above. Only in this way can IT organizations fully support their "be the business" role and transform the business.
Image credit: Flickr