You are here

You are here

Gene Kim on DevOps in the enterprise: Architecture the next big thing

Mike Perrow Technology Evangelist, Vertica

We recently interviewed Gene Kim—author of The Phoenix Project and tireless enthusiast for DevOps—to discuss the upcoming DevOps Enterprise Summit (DOES) in London, June 30-July 1. He gave TechBeacon a sneak peek at what some of the speakers would be discussing at the event. If you can get to London by the end of June, the event is something you won’t want to miss.

But aside from our questions about the DOES16 event, we couldn’t help asking Kim about specific trends he’s noticed regarding the practices around DevOps itself. His answers are insightful, sometimes provocative, and always engaging.

TechBeacon: Four years ago, when people thought of DevOps, they considered it in terms of culture shift—how to extend agile principles, you might say, to include operations and more IT processes. Over the last two years, we seem to have focused more on the tools and how to get the DevOps process working and automated. What do you think will be the next wave for DevOps?

Gene Kim: Let me answer that by first saying there are really three things required to make DevOps happen. One is the cultural component. Certainly the other part is tools and technology. The third part is architecture, and that’s supported through our benchmarking of 20,000 organizations. You need great technology practices, which includes automation. You need a high-trust culture. And it is architecture that enables you to get there.

If you think about how DevOps is being used—and I don’t mean in the unicorns like Google, Amazon, Netflix, but instead in large, complex organizations that have been around for decades or even centuries—these are organizations that may have outsourced IT away almost entirely. For them DevOps requires a lot of bravery and leadership. Essentially, they’re taking on decades of habits and practices, making the necessary changes, and taking the plunge into DevOps.

Organizations adopting DevOps principles and practices want their engineers to be as productive as a Google, an Amazon, or a Netflix.

We now know what is required to achieve good DevOps outcomes, right? Fast flow, high deployment rates, great reliability, and great security. The question is, how do we get from here to there? Going into the third year of the DevOps Enterprise Summit, we’re hearing from leaders who are driving these transformations in large, complex organizations. They’re telling stories at an epic, heroic scale, some with air cover, some with much less than they need for what they’re doing. But the results speak for themselves.

Of the nearly 100 speakers at DevOps Enterprise Summits over the last two years, about one in three have been promoted. My interpretation is that these leaders have created something of incredible value. Now executive leadership is telling them: We want you to make a bigger contribution, not just work on your area, but help elevate the entire organization.

I’ve been studying high-performing technology organizations since 1999, and the biggest surprise was how I stumbled into the DevOps community. I think that organizations adopting DevOps principles and practices want their engineers to be as productive as a Google, an Amazon, or a Netflix. We know from the benchmarking that these high performers are about 200 times faster than their peers in terms of how quickly they can move and change, whether it’s features or security patches.

How quickly can they move that into the production environment? High performers can do it in minutes or hours. For most large, complex organizations, to get anything into production requires weeks, months, or maybe even quarters. In an age where speed matters more than ever, if they can do that in minutes versus months, they will have an immense competitive advantage.       

TB: What are you hearing from these leaders in large, complex organizations, who have a lot of legacy infrastructure and are trying to implement DevOps principles and practices? 

GK: There are many, many stories from across a lot of different industries: a publishing company that’s over 100 years old, or a multibillion-dollar insurance company, or an entertainment ticket brokerage built on an application that was written over 30 years ago, or one of the world’s largest producers of consumer products that’s nearly 90 years old. I’m hearing from all these storied organizations who are saying that, in order to compete and win in the marketplace, they need to build a world-class technology organization.

These larger, older companies are kind of like sleeping giants; once they wake up, they’re going to unleash an incredible force. 

For me, these large, established companies present such a contrast to what's been written about the unicorns—Google, Amazon, Facebook, Netflix—the businesses that most people would agree are very different from established consumer product companies. By contrast, these larger, older companies are kind of like sleeping giants, as the United States was referred to at the dawn of WWII; once they wake up, they’re going to unleash an incredible force. In this case, that force will create an incredible economy. If we can elevate developer and engineer productivity in these organizations so that they're as productive as Google, there's no doubt in my mind that the value of DevOps is going to be fully realized.

TB: From a DevOps perspective, especially looking at some of these more traditional organizations, what is your view of some of the large-scale agile frameworks, things like SAFe and DAD and LeSS and Scrum of Scrums. Are you seeing that they are helpful in a DevOps context, and if so, how?

GK: One of the things I noticed in the San Francisco summit for the last two years is that there’s a wide adoption of the Scaled Agile Framework. I think the reason for that is that there are a lot of the practices you need. There’s a specific practice called release planning, which is absolutely necessary when you have large, monolithic applications that are very difficult to change. Every time you want to make a small change, you have to coordinate with a thousand developers. I think release planning, Scrum of Scrums, those are the things we need to do to just coordinate and communicate and be able to move 1,300 pieces simultaneously in order to get something done. By the way, Dean Leffingwell is brilliant. I have a lot of respect for him.

A big part of what makes Google and Amazon and Facebook so productive is that they don’t have to do a lot of communication and coordination to get small things done. 

Here’s where I think it eventually goes. With practices like continuous integration, continuous delivery, with loosely coupled architectures, you end up in a situation where small development teams can deploy value to customers independently, without a lot of communication and integration overhead. The reliance on these heavyweight planning processes goes down significantly. I think that’s a big part of what makes Google and Amazon and Facebook so productive, that they don’t have to do a lot of communication and coordination to get small things done. They’re able to work with much more autonomy and safety.

TB: Sounds like you’re suggesting this is because of a more granular, more modular interoperability between the various modules without having to do a lot of integration testing and so forth.

GK: Exactly. Less integration testing and, even before that, less scheduling. When you have say only 10 integration test environments, you have to jockey for position to get use of those scarce resources. And even to get there, you have to talk to 30 different committees, 30 different product groups, because you’re all trying to make your changes at the same time, and you can only go through one at a time.  

TB: In what way does DevOps change IT’s relationship with its vendors?  

GK: That’s interesting. I think there are examples of where you can get DevOps outcomes through outsourcing certain portions that are not internally staffed. Across many of the DevOps enterprise stories is a shift from optimizing for cost to optimizing for speed.

We don’t rely so much on firm, fixed-price contracts. Instead, we just want the best experts in this or that area, which means more time and materials. We’re saying, Hey, we don’t want the lowest-cost labor. We want the skills that we need in order to achieve our goals as quickly as we can. I think that shift is pretty dramatic.

TB: What do you see next for ops and security? For example, what are some of the specific challenges for teams who really want to embrace DevOps, but at the same time comply with security standards?

GK: I think it boils down to about three things.

The first is security integration. The days are over when we just reviewed application security and the environment at the end of a project. Now we have to integrate that into daily work. For 13 years, I was the CTO of Tripwire, so I have a lot of compassion for security teams, whose job always came at the end. There was never enough time to fix issues. Back then it was really the only way; it was either that or no security at all.

How do we shift from that old way of thinking into building the tools and platforms so we can integrate security testing into everyone’s daily work? That comes with automated testing in the deployment pipeline, when security teams begin using the same tools that developers use. By the way, that also means security has to speak in the language of development. So it’s not a GRC system that lives to the side. It’s inside, whether it’s JIRA, ServiceNow, whatever; we’re using their language.

The second is to embed our expertise as security engineers into tools, so we’re building and running security tools every time a developer commits code. Nothing goes into production without vulnerability scanning, static code analysis, dynamic scanning, all happening in production. The security supply chain, the supply chain is secure. All those things happen all the time. That only happens when security engineers are not bogged down writing reports and mailing PDF files to developers.

How do we shift from that old way of thinking into building the tools and platforms so we can integrate security testing into everyone’s daily work? 

The third one is, how do we help bridge the DevOps security world to compliance. I think auditors have been trained to audit the same way for the last 30 years. If there are 1,000 servers, we sample 100 of them. We ask where they are, we ask for screenshots. We put them into Word docs. We put them on the Sharepoint. Instead, we don’t need to sample anymore. We can give the auditors everything, because all that telemetry is there.

How do we actually make the auditing process more efficient and more effective? Because I’m certain that doing things in a DevOps way results in better security outcomes. But now we really have to show our work better, so an auditor can actually conclude that the security controls are actually effective. That’s a big chasm to cross, in some ways a bigger chasm than the dev and ops chasm, but I think it’s inevitable that we’ll get there.

This needs to be designed into our deployment pipeline, so a by-product of how we work is a simple, efficient auditing process. I mean, orders of magnitude easier than ever. By the way, there’s no doubt in my mind that the community that’s pioneering these practices are large, complex enterprises.

TB: How is DevOps meshing the promise of continuous integration to continuous delivery, increased delivery velocity, with compliance and security? How do you maintain effective controls without having to make compromises in those areas?

GK: I think we all know that the new way of working is far better than the old one. There’s this kind of joke in the security community that so much of what we do is risk management theater. We draw these diagrams on the board, and we produce this audit evidence. But it doesn’t actually really resemble the way we do work. That’s the theater part, and that’s been the case for at least 15 years.

What’s so exciting about DevOps, from a security perspective, is that instead of doing things periodically, like once a year, we’re doing them all the time. Every time we commit code, we’re running static code analysis, so we find vulnerable code before it goes into production. This happens alongside our automated testing, and we’re doing it all the time. In production, we’re running production telemetry, running vulnerability like dynamic testing. Through that we discover attackers, and ideally find the vulnerabilities before attackers do. 

We should be creating libraries where all the collective knowledge on how to do things securely is embedded, so any developers can use that in their applications. If they do, then we don’t have to re-validate them, right? If they’re using that library, or that configuration setting, then we know that they are as secure as we know how to secure things. I love this quote from Justin Arbuckle, who was chief archtect at GE Capital; now he’s at Chef: “The best architecture document is not the Word doc, it’s something in the source code that every development team can use with just one command. 

There’s no doubt in my mind that the community that’s pioneering these practices are large, complex enterprises.

That’s how we get more secure, by validating that what’s in production always matches our best-known, most secure base. The result is better security than we’ve ever had.

We need to be able to say to compliance officers and the lawyers that these controls are effective, that the evidence for that is available for you to inspect. Not just to sample in the old way of auditing, but you can look at all of it. We get far better security outcomes, and it doesn’t slow the business down. Done this way, it actually supports all the great benefits of DevOps—faster time to market, faster leadtimes, better reliability outcomes, which includes better security and compliance results.

Image source: Flickr

Keep learning

Read more articles about: App Dev & TestingDevOps