DevOps at scale: How to keep customers coming back
If your software development and delivery organizations are to sustain their competitive edge, rapid, agile software releases are essential, and continuous delivery should be the goal. That's why DevOps is so popular among today’s most effective software delivery teams—those that focus on the continuous roll-out of changes to subscribed customers.
But despite its popularity, DevOps exists more as a mindset than as a codified set of processes that teams must follow. Your main focus must be on customer need, and on satisfying that need with each update. Teams that focus on things like process, instead, do so at their peril.
If you’re part of a DevOps team that has enjoyed early success, what are the areas for improvement, or greater focus, that will most benefit your users, particularly as you scale?
Know your most valuable DevOps metrics
“As a part of agile’s evolution, DevOps represents a way to embrace the full spectrum of the software lifecycle,” says Anders Wallgren, CTO at Electric Cloud. “In some ways, the key metrics of DevOps are the ones that are closest to the customer.”
That means using the sort of metrics that draw the customer's attention. For instance, what’s the impact of downtime on the customer? And teams need to bear in mind what services within their microservices-based apps are more critical than others.
DevOps should help teams focus not just on the geek metrics, but the ones that have greatest impact on the customer.
“Let’s put this in perspective,” Wallgren continues. “If my avatar service is down, and my users don’t see their images in the forums where they’re discussing my product, no big deal. But if my shopping cart service goes down, that’s a huge deal. Not just to my customer, but to me.”
DevOps should help teams focus not just on the geek metrics—such as change lead time, up-time, mean time to recover, etc.—but on the ones that have greatest impact on the customer. If the impact is small, that’s ok. But if it’s big, then your priorities become clear.
“If you’re spending all your time perfecting an avatar service, that means you’re not spending your time doing something more valuable to your business and for your customer,” says Wallgren.
Speed versus quality: Decide your priority
Some customers need the speed that you can deliver from your efficient release cycles. Others, such as banks and insurance companies, rank quality over the speed of releases, since regulations and mandates demand high levels of accountability. In other words, speed is relative. But even slight improvements can lead to a quality shift.
Topo Pal, Director and Platform Engineering Fellow at financial services firm Capital One, is not worried about setting records for release cycle speed. Nevertheless, his team is faster than most in the banking industry. From code-commit to any non-production deployment environment, today’s process typically requires a few minutes, and if it involves a complicated code base, it can take 10 to 30 minutes.
“We moved from an average of one week to 20 minutes, which was quite an improvement,” says Pal. Before his team adopted a DevOps approach, getting a particular code base to build took as much as 48 hours. "That’s because someone was manually running repositories, and so on," he says. Then deployment would take another 72 hours, and some test cases ran for weeks. "And of course all the change management and approval processes added much more time, typically.”
Now, Pal says, both the business and the developers love it. "Because we can deploy faster, we can fix things faster—people are much more relaxed now. They’re not afraid that after a week of testing, the old finger pointing will return.”
One of the greatest benefit at Capital One is that developers take more ownership, because they can make corrections much more quickly.
Pal’s team has also moved to fully automated test cases going into production, which means it can usually correct errors within 30 minutes. “Automated test cases make everything predictable and repeatable, and our source code is auditable, so we are much more easily compliant than five years ago,” says Pal. That includes his internal policies, as well as Sarbanes-Oxley and Payment Card Industry (PCI) compliance.
Even in an industry where time to release is not the most critical goal, the efficiency of DevOps practices quickly leads to deployment quality. “One of the greatest benefit at Capital One is that developers take more ownership, because they can make corrections much more quickly.” That’s a big win for customers.
Scale DevOps with large-scale agile frameworks
While continuous delivery is the main focus of most DevOps practices, it’s also one of the core values of many larger-scale agile frameworks, including SAFe, LeSS, DAD, and Nexus. Steve Mayner, senior program consultant at SAFe Agile, agrees. His company's Scaled Agile Framework (SAFe) includes a DevOps practice, although SAFe does not prescribe on what DevOps teams ought to focus.
“Every organization has to come up with its own definition of continuous,” says Mayner. “That largely depends on your customers’ needs. If you’re Google, then you’re deploying to production every 11.5 seconds. That’s one definition. But if you’re an established brick-and-mortar organization, early in the stages of lean/agile transformation out of older methods, continuous delivery can be on a continuum, from once per year to once per month.”
"The infrastructure folks, and the security folks, and the sysadmins found out what was coming from the devs… none of that was anywhere on our radar." —Steve Mayner, SAFe Agile.
That includes large government organizations that Mayner’s team frequently coaches. The main difference has to do with secret security clearances and high levels of funding that come with government contracts, he says. Otherwise, he adds, “SAFe in government looks a lot like SAFe in commerce." These are typically large organizations that have been siloed in different domains, that have problems with frequency of delivery, customer satisfaction, and quality issues related to immature technology practices, and an immature architecture.
Recently, one of Mayner's government customers launched its first agile release train, and the feedback he received from practitioners there was extremely positive, he says. The organization had to overcome challenges with communication, and collaboration between operations and development that had been brewing for years.
"The infrastructure folks, the security folks, and the sysadmins in the room were saying ‘we met with these teams, and found out what was coming from the devs…none of that was anywhere on our radar." Now, he says, "we’re much more aligned in terms of our next release.’”
Eliminate your ops teams...but keep Ops
Scott Prugh, Chief Architect and Vice President of Software Development Operations at CSG International, took what sounds like a radical approach to team organization.
“My first task when I took over operations was to get rid of the operations teams. That didn’t mean operations went away. We still had operations engineers in charge of stability, deployment, running, and support of the software. They still participate in the build lifecycle, and they bring to the discussion all the pain points in running the software, but what we wanted ops to focus on was learning engineering skills.”
"The traditional ops role can meld into a true engineering role, where ops can begin to use engineering skill sets to solve problems at scale." —Scott Prugh, CSG International
This change of focus within the new world of DevOps “build and run teams” has meant that CSG’s ops teams have developed their skills across the engineering lifecycle. Some improved by learning how to write tests. “Others learned how to write deployments, or automate some of the tedious work so they could add higher value to the team,” he says.
“The traditional ops role can meld into a true engineering role, where ops can begin to use engineering skill sets to solve problems at scale.”
These kinds of changes can at first present hardships for transitional organizations that didn’t grow up as build-run teams. “It requires a lot of retraining, because lots needs to be fixed and managed differently,” says Prugh.
Two of his major recommendations include:
1. Find out who wants to change and learn
“This is a people and culture challenge” says Prugh. “Do you have the right people on the bus? Learning is key, and that needs to take place across all roles.” Traditional management styles in large organizations tend to be process focused, and that leaves out the learning piece. If your organization has been run like this for years, you need to identify the types of people who really like to learn and move them into DevOps roles.
2. Bring everything into one view so teams can work with a feature backlog
“This is about work management and visibility, which means you should consider using a Kanban-based change backlog,” Prugh says. The various types of work management systems created to manage different organizations have created an issue when it comes to understanding the full work value stream.
Delivering new features to customers as fast as possible while also managing change and other non-creative work aspects, requires a clear view into status and next-day planning. “How do you brings things into one view so that teams can work with a feature backlog, and use, say, a Kanban change backlog?” Prugh asks. You need to determine how to manage that effectively.
Executive buy-in essential to DevOps at scale
Sometimes agile enthusiasm first bubbles up into the larger organization from individual teams, and higher management then gets on board. But DevOps at scale is most often successful when business leaders become early advocates. They then explain to customers the new level of involvement and commitment to their needs.
"The ops folks, being invited for the first time to discussions like this, were able to speak up about the implications for other projects, things the dev teams weren’t aware of. " — Steve Mayner, SAFe Agile, Inc.
“In one case we saw, the leadership understood the challenges they had been facing historically,” says Mayner. They recognized that this new approach, involving SAFe and DevOps, required new thinking if they wanted a better outcome. So management carved out the time required for the overhead and training. "It was hard to get people all together in the same room, at the same time, to talk about what they want to accomplish in the release train over the next three months,” he says.
New meetings, new breakthroughs
As the customer-facing teams, including leadership, worked to help customers understand how progress happens in two-week sprints, the dev and ops teams at the company worked to better understand each other. Teams that had never worked together held collective breakout sessions to discuss the vision, priorities, and key deliverables.
“That included how they would decompose those features, and figure out what they could accomplish in each sprint,” says Mayner. It was in those team breakouts where the ops folks, being invited for the first time to discussions like this, were able to speak up about the implications for other projects regarding infrastructure and security— challenges about which the dev teams were unaware.
Face-to-face interaction between individuals, which is a core agile concept, make people realize they’re not just a member of a silo but part of a larger, integrated train. "This is when the breakthroughs happen,” Mayner says.