You are here

You are here

How to scale ChatOps to an enterprise-friendly level

Abbas Haider Ali CTO, xMatters Inc.
Robots in a row

Gone are the days when chat and instant messaging were vilified—and even banned—by the workplace. Today, chat has evolved to become a place to get work done. With integrations between chat and the applications that companies use, you can turn chat into a productivity powerhouse.

ChatOps, is an example of applying this approach to IT Operations. Powerful collaboration and highly integrable platforms like Slack and Hipchat allow for the creation of unified consoles that can be harnessed to combat information overload and alert fatigue, and help companies successfully achieve their digital transformation and innovation goals.

It’s not hard to see how smaller operations teams can successfully implement a relatively unstructured ChatOps model. With single #major-incident channels, you can easily loop the right employees in on as-needed basis—all it takes is a simple @mention. It’s not really a big deal to onboard new users into the channel, because the environment isn’t large or complex enough to get up to speed. In addition, there’s no real need to keep conversational records around compliance.

Challenges for ChatOps in the enterprise

Athough ChatOps is an obvious advantage for smaller teams, it becomes very tricky at scale. A 10-person startup will have very different processes and needs than a global corporation with thousands of employees. How can you successfully adhere to the same model when those same alerts are now pinging hundreds of employees, in the interest of getting the attention of one or two? At the enterprise level, even slick tools like Slack start to become clumsy and noisy, and ultimately create more inefficiencies than they solve.

Take, for example, some of the typical challenges faced by a large enterprise. In juggling multiple task and issue management platforms, operations teams will need to update items both directly and indirectly in order to keep various groups within the company up-to-date. This might include incident records, as well as liaising with applications and customer-facing teams. From ServiceNow to Zendesk to Atlassian’s JIRA, the complexities increase with every class of system and organization that operations teams must handle.

So we find ourselves back where we started: too much noise, and too much irrelevant information. We need a way to scale ChatOps so that it can work just as flexibly and efficiently, no matter how many employees are using it. And while this may seem a daunting task, it’s actually possible.

How to effectively scale ChatOps 

The answer is to make it more intelligent by expanding the very power that gave rise to ChatOps in the first place—integrations. Using bots and interactions with other applications, we can adjust the structure and workflow of ChatOps to fit the scalability needs of the enterprise.

Let’s take a look at a specific example. A telecommunications provider uses Hipchat, and alerts from Sensu and Splunk are directed at specific operations team engineers. These alerts are first refined using some basic heuristics—on-call schedule, priority, availability, skills, location—so that only specific team members are targeted. Already, the volume of irrelevant messages has been drastically reduced—instead of alerting the entire team of 150 operations engineers, the company can contact the 6 or 7 relevant team members for the job.

From here, let’s say one of the engineers contacted takes action based on the alert. He or she might do this directly through Hipchat (or mobile app, email, SMS, etc.) which automatically creates and assigns a JIRA ticket with the full details of the incident at hand. This is not only incredibly efficient for the operations engineer (since he or she doesn’t need to manually create the ticket in JIRA), but also starts an important record-keeping process. With 150 or more team members—not to mention thousands of additional tech team members—this is a necessary step. At some point, anybody at the company might need to refer back to this incident.

In addition to these integrations, this company could leverage the power of a bot with directives. In our example, the operations engineer could issue a single command like “I need help with ticket 96507 from network, database, and payment processing,” and this request would automatically be logged in the JIRA system of record. Then, a specific engineer from each of the necessary teams would be automatically engaged and pulled into a new Hipchat channel created specifically for collaboration on the incident at hand. And closing this channel automatically adds all activities that took place to the JIRA ticket as well, ensuring that compliance and record-keeping requirements are met.

Action item: Take action within your ChatOps 

This is just an example, but you can already see how this kind of ChatOps model can work at scale. Empower your team to take actions directly from the chat console, automate smart targeting and adding teams as necessary into specific channels, and eliminate irrelevant, mass-blasted alerts.

Tackling ChatOps at scale means refining the communication model. It requires buy-in from your team, but it’s worth it when done right.

How is your small or large company using ChatOps. Can you share any best practices you've learned?

Image credit: Flickr

Keep learning

Read more articles about: App Dev & TestingApp Dev