The problem with DIY tools and how they can hurt DevOps

Engineers in DevOps have a bit of a MacGyver complex. How else to describe their fascination with do-it-yourself (DIY) tools and scripts? If there’s a problem provisioning servers, then we build a scripting tool. Have an issue with monitoring? Create a tool to gather and report data. I am at times guilty of this myself. And, usually, the justification is that “I can implement exactly the process I want with less effort and cost than purchasing and integrating an off-the-shelf solution.”

But that's no way to run a DevOps organization. There are serious drawbacks to this approach, the least of which is the time required to build a DIY tool in the first place. The negatives include:

Creation of technical debt.
Creation of tools that aren’t thoroughly tested before deployment to a live instance.
Problems with integrating DIY tools into existing infrastructure.
Problems with integrations with other APIs in the stack.
Tool maintenance and upkeep.

Yet despite being aware of these drawbacks, teams often fail to heed these warnings, choosing instead to embark on their own DIY adventure.

A cautionary tale of DIY disaster

I recently spoke with Nick Simmonds, lead operations engineer at Rhode Island-based startup Datrista. He has dozens of years of experience in operations, and has seen many of the real-life problems that result from DIY. At a previous start-up, one of his first jobs was to gain control over a DIY scaling tool that had been developed in-house to provision new Amazon EC2 instances. The tool was created to address previous infrastructure choices taken on by the development team before any operations engineers had been hired.

The tool Simmonds' team built eliminated the need for manual scaling of the product’s microservices by automating the process. But the tool did not actually push code up to the new EC2 instances as they spun up. Instead, for each new instance was created they had no code on them at all. When the new servers were spun down, the scripting tool would scale down too, but it never checked as to whether the code was working on the newest servers before it destroyed the old ones. And, as the tool didn’t efficiently push code to the new instances, the company was left with new servers that had no code running on them.

You could write a book about the comedy of errors here. There had been little testing of the initial script, so Simmonds’ team only knew how the script should work. Only when the microservices stopped working did they realize that the tool did not work—only when they were in production and live did the team recognize their failure. No monitoring, no alerting, nothing was in place to let Simmonds' team know the deployment had failed.

Lessons learned

While you might view this fiasco as a mistake of just one team, that would be the wrong lesson to take away from this experience. What Simmonds' team learned, and what many DevOps teams learn, is that DIY tools create a waterfall of problems. The tool was not sufficiently tested. The tool was deployed, and there was no monitoring or alerting attached to it. And the team only realized the tool’s failure when the whole system failed.

The other lesson was that DIY tools are often a response to the technical debt left by developers. In early stage start-ups—and this was the case at Simmonds' start-up—the focus is on rapid development and deployment of code at the expense of proper networking and automation. Development often just wants code to work. As such, technical debt gets created and when operations arrives, and they must work through the debt acquired before progress can be made. In his case, the Ops team attempted that work-through using a DIY tool.

When DIY tools are not thoroughly vetted and don’t provide proper monitoring and alerting components, they create more problems than they solve. Proper monitoring and alerting cannot be an afterthought because both are critical to the proper functioning of any stack.

Make your IT monitoring and alerting effective

I won’t argue with a start-up's choice to focus on developers at the expense of operations. Nor is it fair to diminish every DIY attempt. But do not view these scenarios through rose-colored glasses; realize the problems to which these teams leave themselves open.

Simmonds learned the hard way that critical alerting is needed all along the stack. He needed monitoring to provide a visible health check on the system. Alerting needed to be loud and in his face to let him know when tools fail. Unfortunately, DIY tools don’t provide this because it’s not an expertise most operations engineers have.

To ensure that you have adequate monitoring, invest in one of the many effective monitoring tools out there. These tools can help users with different levels of monitoring along the stack. For effective IT alerting, you need a tool whose developer has thought through the different scenarios of how messages are delivered, as well as the various contingency plans.

Essentially, you want an alerting tool that:

Is persistent. will continue to alert the on-call individual beyond the time of initial contact.
Will alert appropriately. Not all alerts are high-priority, so you need both high- and low-priority alerting.
Provides escalation. If the on-call engineer doesn’t react within a given amount of time, enable the alert to escalate to the next person on call.
Provides an audit trail. You need an interface where you can ensure that sent messages have been received and read.
Enables messaging. Enable teams to communicate with one another from within the tool.
Ensures message security. Messages should be encrypted and secure.

DIY tools are usually robust enough to provide this level of alerting or alerting integration. As Simmonds stated, “Monitoring and alerting are too critical to the infrastructure to leave it all to a DIY tool.”

DIY precautions: Avoid technical debt

One question teams should ask themselves before embarking on a DIY project is, do we really want to invest the time and energy in creating, testing and maintaining a DIY tool? Or might there be some existing tool out there that could solve the problem? If you find yourself thinking that the best option is to build the tool yourself, try doing more research, and find a tool off the shelf.

Indeed, there are times when the scripting tool is simple, and the effort required to research and find a tool is not worth the effort. But more often than not, the DevOps team fails to consider all of the necessary components of monitoring and alerting that will be needed to ensure that the tool works successfully.

Look at existing code libraries before jumping into DIY. And when it comes to monitoring and alerting, you need it to cover everything that’s important to your end product. If you have effective monitoring and alerting in place, you’ll recognize when a problem arises. The tool will check itself in this scenario, and highlight the errors. And you’ll never find yourself facing the problems that Simmonds and his team had to deal with.

Keep learning

Take a deep dive into the state of quality with TechBeacon's Guide. Plus: Download the free World Quality Report 2022-23.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon's Buyer's Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon's Guide.
Take your testing career to the next level. TechBeacon's Careers Topic Center provides expert advice to prepare you for your next move.

Read more articles about: App Dev & Testing, DevOps

You are here