You are here

You are here

Monitorama show recap: Are you putting the 'M' and 'S' in DevOps?

J. Paul Reed Principal Consultant, Release Engineering Approaches

The four cornerstones that constitute DevOps' much-discussed CAMS—culture, automation, metrics, and sharing—are, by now, old hat to most practitioners. Despite this, our conversations are still dominated by talk of "tools and culture," with metrics and sharing taking a back seat.

Not so at Monitorama, a conference where deep sharing about the power and promise of metrics takes center stage. The annual event's fourth installment, which recently concluded in Portland, Oregon on June 29, consisted of three full days featuring experts from both tech startups and industry titans. The conference's single-track format provides a laser focus on the tools, techniques, tribulations, and triumphs encountered while trying to build observability into internet infrastructure and applications at scale, as well as what it means to incorporate healthy monitoring practices into an engineering culture, DevOps or otherwise.

Mainstays of the conference this year included presentations on the nuts of bolts of monitoring, with case studies of both massive and modest monitoring deployments, refreshers in statistics basics that many of us have forgotten since leaving school (but that are required to monitor anything effectively), and detailed discussions on why long-standing tropes on "Monitoring All the Things" may not, in fact, be the correct answer.

In case you missed this year's event (all of Monitorama's talks have always been streamed, for free), here's a short and sweet rundown of everything you need to know about the state of the art in metrics and sharing.


Monitoring: Still an unsolved (and ill-defined) problem

Battery Ventures Technology fellow and Netflix alum Adrian Cockroft kicked the event off with a review of areas the industry has improved upon since he spoke to Monitorama two years ago—and those areas we haven't. He also discussed huge shifts in the industry that affect all of us and have interesting implications for monitoring: New technologies running on incredibly ephemeral infrastructure present particular challenges for creating observability. These trends, combined with increasing price disruptions that make it trivial to scale, are making the practice of monitoring more complex, despite the fact that many organizations are still grappling with monitoring fundamentals.

Greg Poirier, chief technology officer at Opsee, followed Cockroft's analysis with his own warnings of falling into the "DevOps trap of asking five people for a definition and getting 86 answers." Poirier's talk drew his line in the sand for a definition of monitoring, which accounted for the fact that we are now monitoring distributed systems (in an academic sense), and so we must not tackle that problem by ignoring the 30-plus years of research done on distributed systems (a fact Poirier emphasized at the end of his talk by publishing a bibliography with more than 20 items).

Cory Watson, a software engineer at Stripe, also emphasized the importance of a working knowledge of academic theories on observability, discussed monitoring's definition (which, happily, was quite similar to Poirier's), and walked the audience through control theory as he discussed what was required to promote a "culture of observability."

Innovative monitoring: From big data to big databases

Theory and definitions are critically important, and while Monitorama makes sure it creates space for those conversations to take place within the monitoring community, implementing those theories in production was also part of the show: Eron Nicholson and Noah Lorang, engineers at 37 Signals, presented on "Chicken and Waffles," their home-crafted system for automatically reacting to the barrage of hacking attempts that Basecamp fends off every day. Lorang, a data scientist, offered fascinating insights into the number crunching and heuristics required to walk the tightrope between blocking threatening nuisances that could blossom into denial-of-service attacks and accidentally blocking legitimate access from paying customers.

If you think monitoring is only for applications and infrastructure, Tammy Butow, site reliability engineering manager for storage infrastructure at Dropbox, illustrated how critical it is to monitor other parts of the stack: Her team at Dropbox is responsible for monitoring the company's internal database-as-a-service offerings. But Butow said it's not all about the technology: The way the team shares the monitoring tasks related to a huge production database is just as important as how they implement the technology behind the monitoring.

As it has every year, Monitorama featured more monitoring tales of both technology and culture from a list of veritable household names, including Netflix, Twitter, Etsy, Facebook, Airbnb, Pinterest, and Google.

Create a culture of monitoring to monitor and improve your culture

While creating a culture of monitoring has always been a theme at Monitorama, this year the focus was on the importance of "monitoring your culture."

Nicole Forsgren, director of organizational performance and analytics at Chef and co-author of the 2016 State of DevOps Report, discussed why it's critical to monitor your company culture (and the practices that result from it) to find ways to improve the often squishy 'people stuff.' Forsgren also presented surprising bits of data from this year's report showing that changing how people work, both in relation to each other and the technical systems they interact with, has a measurable business impact.

Sarah Hagan, an experimental psychologist working on ways to improve employee productivity at online real estate broker Redfin, echoed the importance of setting up ways to monitor your culture to see if any of the changes or improvements discussed in the DevOps community are actually working. Redfin is unusual in that it hires employees for positions that other companies tend to fill with independent contractors. To make this work, it needed to come up with complicated algorithms to manage hiring in new markets. Hagan also discussed the importance of conducting all of this "cultural monitoring" in a way that's not totally creepy.

These cultural shifts directly feed back into monitoring implementations: Kelsey Hightower, staff developer advocate at Google, demonstrated the importance of bringing monitoring closer to the application, exquisitely, by doing exactly that in his presentation. He integrated Google's healthz checks into an application and showed how monitoring can feed into canary deployment scenarios, so that customers never experience downtime. To do this, Hightower argued, we need to stop reverse engineering applications to monitor them and start doing so from within the application. Developers of frameworks that don't support this pattern will have some explaining to do if they want to be considered "modern."

Putting the 'M' and 'S' back in DevOps

If you feel you've automated your way to nirvana and your culture is humming along nicely, but there's still something missing, it may very well be time to take a closer look at your metrics practice and sharing strategies.

Librato's Dave Josephsen said it best this year: "I think good monitoring changes people." Conferences are a great way to get up to speed, and Monitorama is the only event dedicated to bringing people together to discuss how to implement DevOps' "fast feedback loops" and the backend of CAMS. 

Even if you couldn't join this year, you can still catch up: Monitorama just finished publishing video of all this year's sessions. Check them out and see how your can start putting the "M" and "S" back into your DevOps strategy.


Image credit: Flickr

Keep learning

Read more articles about: App Dev & TestingApp Dev