How to reengineer your IT organization for cloud

The cloud computing revolution is upon us. Every large IT organization is embracing the cloud, drawn by its agile, fast access to on-demand resources and pay-as-you-go pricing. The mantra of the day is “be like Uber” (or Airbnb or a host of other agile startups born in the cloud) and move at hyperspeed to deliver new functionality to customers. Agility in the cloud is essential.

Unfortunately, many IT organizations struggle to emulate these cloud-native companies or haven't embraced the cloud at all. And it’s important to understand why, because if IT doesn’t figure out how to address cloud agility, it will affect the prospects for their companies. Put bluntly, most IT organizations fail to recognize that achieving cloud-native results means more than merely using agile infrastructure; it means reengineering the IT organization itself for agility. Unless you make the entire application lifecycle agile, having fast infrastructure will fail to improve overall IT performance.

Here are the four areas that you must reengineer to make IT capable of achieving cloud-native speed:

World Quality Report 2018-19: The State of QA and Testing

Agile development

If your development process is still stuck in a lengthy waterfall process, with extensive upfront specification efforts, feature-stuffed release plans, and no user involvement between specification and delivery, you’ll never accelerate your application lifecycle.

To be successful with cloud computing, your development process and tools need to be more agile. This means changing your approach to specification and development.

  • Instead of trying to define the complete application up front, start with a minimal, feature-poor prototype, and get user feedback for changes and improvements. Incrementally add new functionality to implement the feedback, and then confirm that the new features meet user needs.

  • Shoot for short-duration development updates of two to four weeks. This allows constant feedback to ensure that the final product meets user needs. A side benefit to ongoing user involvement is increased commitment to the ultimate deliverable—if users have been involved all along, it’s much harder for them to reject the final product.

  • Look to tools that support incremental development, frequent integration, and common documentation. GitHub is a popular tool that allows individual developers to work independently and then submit a request to have their code pulled into a common build. A new generation of application lifecycle tools tracks work requests, assignments, and status, fostering a shared understanding.

Agile QA

As an engineering head, I found quality assurance one of the most frustrating parts of the development process. Always short-resourced, our QA groups were challenged to get the product installed and operational, delaying actual quality assessment. Worse, QA happened at the end of the entire release cycle, causing it to be compressed and seen as the bottleneck for release.

Cloud-native applications require a different approach to QA. More frequent releases mean that testing can be done incrementally and as part of the development process, which is good. But there’s more that can be done. Here are some actions that can improve QA: 

  • Leverage the cloud for QA resources. Instead of QA fighting for a limited pool of servers for testing an application, use the cloud as the testbed for QA. The easy access, pay-per-use cloud model makes it a natural for QA, which is typically made up of short-duration bursts of work.

  • Move to automated tests that allow fast, repeated test suite execution. Too many QA groups are locked into manual testing approaches that take a long time to execute. Worse, every new release (and remember, there’s going to be many more of them based on the new incremental development approach outlined above) requires just as much effort. Automated tests require more work up front to create the tests, but this saves enormous amounts of time on an ongoing basis.

  • Move QA earlier in the development lifecycle. Even with cloud infrastructure and automation, QA will remain as an after-the-fact effort, performed once developers are finished. A better approach is test-driven development (TDD) in which developers create individual unit tests while developing new code. With TDD, you add unit tests to the automated suite, which executes upon code check-in. This means QA occurs during development, not after. Furthermore, moving functionality QA into development means QA groups can focus on other quality aspects that were previously unaddressed due to time and resource pressures: load and resiliency testing, security verification (e.g., application testing against SQL injection attacks), and so on.

Agile artifacts

One of the most time-consuming elements of traditional application lifecycle practices is the repeated re-creation of application artifacts. Each group in the application process builds and configures the application from fundamental components—that is, from source code. Even worse, different groups frequently use different tools to create the application. This can lead to endless confusion, as one group’s application uses, say, an older library version than the developers used. Tracking down these environmental differences is challenging and time-consuming, and frequently leads to strife and finger-pointing within the organization. It's far better to build the application once and use this canonical version across all groups involved in the application lifecycle. Each group takes the official version and performs its tasks upon it, ensuring that problems due to environmental differences and repeated application building are avoided.

But some things need to change in order to move to common artifacts:

  • Integrate identity and access management into the application so that only appropriate personnel can access artifacts at each stage of the lifecycle. Good practices prevent developers from accessing applications in production, so an identity and access management system that adds and drops access permissions as the application progresses through its lifecycle is important.

  • Implement security measures and tools using policy and automation. By sidelining the progress of an application to await a security review, you forfeit the acceleration obtained by way of common artifacts, and this is doubly worse given the more frequent release pace associated with agile development.

  • Give each group dedicated infrastructure upon which to move artifacts received from an upstream group. If groups need to fight over a limited pool of resources to perform their tasks, common artifacts are pointless. Cloud environments can mitigate this issue.

Agile operations

The need for agility extends even to applications that are in production. Too many IT organizations starting out with cloud infrastructure assume that application deployments are static and configurations are unchanging. That assumption is unworkable with a frequently updated application that's likely to experience highly erratic user loads.

Instead, operations must move to a set of tooling and practices that can accept constant change in execution resources and frequent updates to execution artifacts. The operational changes required include:

  • Application operation descriptions that use role names (e.g., cache server) that are symbolically linked to actual instances and can easily accept updates as resources join or leave the application environment. Hard-coded resource names are anathema in this world.

  • A centralized logging and monitoring system that allows important information to be collected and correlated across dynamic resource pools. This enables operations to track issues quickly, which can be challenging in constantly morphing application environments.

  • Application deployment practices that avoid “all at once” releases. As you deliver new functionality, best practices dictate installing the changes on a small portion of the application resource pool to confirm functionality and user acceptance. The overall application should gradually shift over to the new version. This requires deployment practices and tools that can add and retire individual resources, as well as track which resources in the pool have which application code version.

Reengineering your application lifecycle

Cloud computing brings with it a host of changes that are necessary to gain the full benefits of cloud adoption. Be prepared to examine every part of your application lifecycle to identify manual steps, bottlenecks, and practices tied to assumptions relating to static operational environments. Becoming a truly cloud-native IT organization requires reengineering the entire application lifecycle and each of the groups associated with it.