How they did it: 6 companies that scaled DevOps

Scaling DevOps is still a challenge for most organizations, but some companies have incorporated its concepts both broadly and deeply. DevOps thought leadership "is not about just going faster," said Rob Stroud, former principal analyst for DevOps at Forrester who is now chief product officer at Xebia Labs.

"Velocity is fine, but velocity without quality is unacceptable. I can deliver a million widgets quickly, but if they're all wrong, I've wasted my organization's money."
Rob Stroud

The DevOps transition is further evolving the way enterprise development is done. Operational requirements are being built into the code, and security is becoming a focus from end to end in the tool chain, Stroud said.

This year, DevOps will emerge as a board-driven initiative from the CIO down, Stroud said. DevOps leaders must deliver a strong and clear-cut message to the troops in development and operations: “We've got a job for you; it may be different than before, but at the end of the day it's going to serve our customers better and your career better," he said.

Still, there isn’t broad industry consensus regarding how to do DevOps best, said DevOps consultant Matthew Skelton.

"Some places are stuck with large DevOps teams, busy with all kinds of automation but not really facilitating joined-up communications between dev and ops." 
Matthew Skelton

They have decided that DevOps is mostly about automation, and are moving forward but not as effectively as they could and should be, he said. Meanwhile, other organizations are adopting the site reliability engineering (SRE) pattern from Google (especially Type 7), with mixed success.

The SRE pattern seems to be gaining traction in heavily outsourced situations with multiple suppliers, Skelton added, "so I expect we will see some new 'SRE-as-a-Service' offerings emerging over the coming years to meet the need for highly skilled operations and reliability engineers in the enterprise.”

Here's how some high-profile companies are successfully implementing DevOps at scale today.

How to Build a DevOps Toolchain That Scales

1. PayPal

PayPal has more than 200 million active accounts and processes more than $100 billion worth of payments each quarter.

Some 4,500 developers work on a 50 million-line code base, and 100TB of application artifacts get pushed into production across 2,600 applications. PayPal's systems receive 230 billion hits a day, according to a presentation by Rama Kolli, product manager for PayPal's middleware platform, at the most recent Enterprise DevOps Summit.

As recently as 2013, a PayPal developer had to file a dozen tickets in the course of creating a new application, including deploying test machines, getting more hosts for the production pools, and other steps along the way, Kolli said. "Developers were drowning in tickets. In fact, their primary responsibility was not to write code but write and follow up on tickets."

Prior to PayPal's DevOps journey, it took days to create a new application, weeks to deploy it on a test server, and months to get it into production, Kolli said.

PayPal's answer to this complexity and sprawl was a new, entirely self-service system for the software development lifecycle (SDLC). And it has worked: Developers report that they are getting code from the creation stage to production in less than two weeks, Kolli said. "At this point, I think I can say that rate of change is not bound by the limits of the platform but by how quickly developers can write their own code, and certify and deploy it."

So far, more than 1,000 new applications have been built with PayPal's SDLC platform, which has evolved to support public clouds, Docker containers, and other features.

But this freedom comes at a cost, Kolli noted. "They can do what they want and when they want to, but is that free?" There are some 12,000 test playgrounds, and if they're not being used effectively, that's a waste of money. PayPal is working on ways to mitigate this problem going forward, as well as adding significant new security measures into the application lifecycle.

2. Kaiser Permanente

Healthcare consortium Kaiser Permanente (KP) has more than 200,000 employees and 11.8 million customers. While KP has made some strides on the customer-facing services front—its website gets 300 million hits per year, and more than 6 million customers are signed up for digital channels—overall its IT operations weren't keeping up with modern healthcare needs, representatives said at the 2017 DevOps Summit in San Francisco. 

In the past, six "big batch" releases would be scheduled each year. And those were fraught with last-minute scope creep, siloed communications, no feedback for developers, and other familiar bogeymen of enterprise IT development.

The company nonetheless had to fight through inertia and cultural change as it rolled out its DevOps plan in April, 2017. It broke through barriers by working toward smaller successes first, building a coalition of top-down KP leaders and, in a bit of reverse psychology, assigning naysayers with the responsibility of finding solutions to problems.

KP hasn't quite finished rolling out DevOps "squads" across all of its the organizations, but the ones that are active are responding to service requests 47% faster and change requests 53% faster than industry standards.

3. Starbucks

The massive coffee chain has more than 330,000 employees across tens of thousands of stores. Enterprise development culture there had largely aligned around old-school waterfall practices. Sarah Shewell, Scrum master and agile lead, led an effort to move away from waterfall to Scrum beginning in 2015, as she described in a talk at the DevOps Enterprise event in November.

One unexpected dynamic came as development teams adopted a "push" mentality, delivering code to testing teams before they were ready to give it their full attention, said Suzanne Nielsen, application manager. This caused some "cognitive overload" and a rethink of how Starbucks did Scrum.

Work-in-progress (WIP) limits are a key tool for imposing discipline on the app-dev process during a given workflow, to ensure that quality results are pushed down the line. "It's really easy to keep yourself busy all the time and move onto other tasks, but if you don't hold tight to that WIP limit, you're not showing where you have gaps," Nielsen said. This is an area where Starbucks' DevOps leadership saw room for improvement and quickly moved to make changes.

Starbucks found massive value early on in taking the DevOps and Research Assessment survey (DORA), Neilsen said. "We're really early in our journey, and the gap between what's in the handbook and where we are is really wide. You need to go into it knowing you can get all this information that will be really great, but you need to develop a plan on how to act on it." Still, Starbucks has reduced the steps in its application development value stream by 41%, to 111, and its cycle time by 74%, to 22 days.

4. Yahoo

For several years, Yahoo has been ushering in a DevOps culture to serve its more than 1 billion users better. Kishore Jalleda, who now works for Microsoft but was then a senior production engineer at Yahoo, described the journey and some of the lessons learned during a presentation at DevOps Enterprise Summit 2017.

When Jalleda first arrived at Yahoo, he interviewed several employees in a fact-finding mission. One, whom he identified as "Bob," had on his own time been working on a stock-recommendation application. Bob expressed frustration over roadblocks in the way, citing familiar problems such as an inability to get server capacity provisioned quickly and a high-overhead approvals process.

Companies seeking to evolve their DevOps culture must identify and empower as many "Bobs," or "intrepreneurs," as possible, Jalleda said. These are the employees who want to be excited about coming to work every day and make "awesome" products for customers, he added.

Jalleda also described how he found a close ally or "co-founder" of Yahoo's DevOps initiative, someone with whom he had "radical" alignment on its goals. This is crucial beyond getting the usual top-down buy-in from the C-suite.

Any major cultural overhaul is going to have struggles and setbacks. The key is to turn failures, if not into successes, then at least into sources of insight. Conduct detailed postmortems after a failure, Jalleda advised. If a team is straggling, "infiltrate" it with evangelists and use a level of peer pressure to drive change. 

5. Capital One

Capital One has been aggressively moving its operations to the cloud over the last few years, mostly to Amazon Web Services. The process has involved breaking up Capital One's monolithic applications into hundreds of microservices. This presents a double-edged sword, with microservices providing more flexibility but also greater complexity in Capital One's environment.

As it has moved onto AWS, Capital One turned to the considerations of keeping its environments reliable, while placing an emphasis on continuous improvement. It developed a tool called Cloud Detour, which is similar to Netflix's Chaos Monkey. Both purposely inject failures into running application environments, so teams can examine what happens and then take steps to remedy weaknesses going forward.

A pair of Capital One team members gave a deep dive into Cloud Detour during DevOps Enterprise 2017. It's worth hearing their presentation, but the overarching point is that Capital One stands out as an organization that not only has entrenched DevOps in its culture, but has also been able to build out new types of infrastructure to support it.

6. Intel

The chip maker is another large enterprise at the forefront of DevOps enablement, having worked on a company-wide initiative for several years. (Go here for an in-depth 2015 interview with Sherry Chang, chief architect of Intel's DevOps initiative.)

A massive and crucial part of Intel's business is ensuring that software updates are released in timely and sound form across more than 20,000 combinations of chip variants, features, and platforms.

Prior to undertaking a project to automate testing, Intel was shipping code on a quarterly basis, with poor code coverage in the testing process. "We didn't know what we were shipping," said Manish Aggarwal, an Intel software engineering manager, during a recent webcast.

The project noted the need for changing roles in the DevOps journey. Rather than be "test executors," those same individuals became test engineers, Aggarwal said. Ultimately, Intel went from 4 to 100 releases per year, while dealing with 4 million more lines of code—1 million more than before. Code coverage rocketed to 87% known, compared to 85% unknown previously.

"There is less reliance on individuals and more reliance on automation," Aggarwal said. Going forward, Intel plans to expand the level of automated testing it does even further, he added.

Listen up

Trends come and go in enterprise tech, but DevOps is sure to continue to thrive and evolve as an evolution of agile combined with more productive, respectful communications among development teams, operations, and the C-suite.

Consultant Skelton offered a cautionary tale to enterprises just beginning their DevOps journey. "Lack of leadership buy-in will usually prevent DevOps initiatives," he said. "Leadership must want to improve, to go faster (safely), to change the culture within the organization."

However, DevOps transformations can founder "if we focus too much on the technology and not enough on flow, feedback, and metrics," he added. "There is a danger for some companies of hiring 'DevOps' engineers who then go on to spend months building some complicated tooling thing which does not align to the value delivery stream."

In the meantime, some of the world's largest companies certainly have put wood behind the DevOps arrow to great success; skeptics and laggards should pay attention.

Topics: DevOps