What we learned from 3 years of sciencing the crap out of DevOps
As researchers on the Puppet Labs State of DevOps report (along with Gene Kim), we have had the opportunity to investigate the DevOps space for the past few years, and people always ask us: What are your biggest surprises? What did you love finding? Why do you even trust survey data? Well, we (briefly) answered these questions at the RSA Conference in San Francisco a few weeks ago, and TechBeacon has invited us to share and expand on that discussion here.
First, let’s start with the data question: Why do we think we can trust survey data?
Most of us have seen awful surveys, and poorly written surveys are going to give you bad data. The answer to this, reason goes, is to use system-generated data! Well, how many of us have seen bad data in our system logs? When we posed these questions to the crowd at RSA, more hands went up when we asked about seeing bad data in system logs than it did for seeing bad survey data!
Clean data counts
The moral of this little story is that all data has the opportunity to be bad, whether it comes from people or from systems. Our goal is to make the data we use for analysis as good and clean as possible. That is, we want to have a reasonable assurance that our data is telling us what we think it is telling us. When we are working with survey data, this is called psychometrics, and it involves both the careful and artful craft of survey question writing as well as the statistical evaluation of the questions to ensure that the questions we so carefully wrote are capturing what we think they are.
An in-depth discussion of psychometrics is outside the scope of this article, but we encourage you to check it out—it’s super interesting, and something that sets the State of DevOps report apart from many other vendor reports out there: This study is designed to be rigorous and generate academic peer review in addition to the marketing reports that are released.
But why surveys? We use survey data for the State of DevOps study because we want to capture data from across several industries, hundreds of countries, thousands of companies, and tens of thousands of people—and getting access to system data just isn’t possible. Additionally, some of the data we’re interested in deals with perceptions, and things like culture and employee engagement just can’t be collected from databases. (Some proxies exist in HR databases, but they aren’t perfect, and many of them aren’t even good.)
For example, attrition can be explained by a bad organizational culture, or it can be explained by a spouse getting a job out of state and your employee leaving the company to be the trailing spouse—that has nothing to do with culture, but it could get counted in attrition numbers, which are rolled into culture numbers in some HR proxies for culture. As I said earlier, not all system data is perfect, either.
What are we proud of finding?
IT performance. We love being able to say that IT performance is captured and can be measured by three things: lead time for changes (measured as the time from code commit to code deploy), deployment frequency, and mean time to restore service (MTTR) following an outage or incident. This reflects the things that those of us in DevOps have talked about for years and reflects both throughput measures (seen in lead time and deployment frequency) and stability measures (seen in MTTR). We’ve heard stories that throughput and stability are both possible for teams that are embracing DevOps principles, and now we see it in the data. This is super exciting.
Continuous delivery (CD) practices make work better AND make it feel better. We find evidence that CD improves IT and organizational performance AND makes life better for technical teams by reducing feelings of burnout and decreasing deployment pain. So great.
Lean management practices make work better AND make it feel better. Things like WIP limits, the use of visualization of work, and monitoring to make business decisions decrease burnout and contribute to a good organizational culture, all the while improving IT and organizational performance. Win-win.
What are some surprises?
Change failure rate, or the degree to which changes introduced into the system are not successful, is not part of IT performance from a statistical standpoint. This surprised us, because it is a stability measure. But it is also a quality measure, so this year’s report is looking at additional measures of quality to see if this distinction is meaningful in the data.
Commercial configuration management tools aren’t correlated with IT performance, but open source is. What’s not so surprising is this: Third-party scripts, homegrown scripts, golden images, and manual configuration management are negatively correlated with IT performance—if you’re doing this manually and ad hoc, things are going to go wrong, especially in complex systems.
Think like a scientist, and act like one
Whenever you embark on any kind of measurement journey, you should take some steps to make sure your data is good and is telling you what you think it’s telling you. If you’re doing it right, you should be surprised some of the time by the results—but not all of the time.
These are just a few highlights. For more information, check out the 2015 Puppet Labs State of DevOps report, the 2014 State of DevOps Report, and stay tuned for the 2016 State of DevOps report, which will be released in late June. If you’re a dev or ops practitioner, take the survey, which launched March 22.
Have you embarked on your own measurement journey? Share your experiences in the comments section below.
Jez Humble, formerly vice president of Chef and now deputy director of Delivery Architecture and Infrastructure Services GSA - 18F, co-authored this post.
See the presentation slides from RSA Conference 2016: