Best practices for building in DevOps metrics without hurting speed

Nothing should throttle the move to modern software delivery—that would be a contradiction. However, the problem with a self-service, just-go mentality is that starting is so easy that you might not know where you were, where you are, or where you're going. Teams are doing themselves injustice if they don't utilize strong metrics to keep DevOps up and running for an extended amount of time.

For DevOps to extend and expand it needs a path. And that path can only be established when it's measurable, analyzed, and actionable. Teams need to consider metrics for their entire development practice and to develop best practices for building metrics in without interfering with DevOps speed.

We're addicted to data—especially in the tech world—and we want to log everything and then visualize it. But for some reason, in software development, there are large blind spots. You can tell a person everything about a running application but very little about how it got there. This is mostly the result of teams thinking only in terms of releases and not the process of releasing.

In a DevOps environment, there's information everywhere. We log application data with application performance monitoring/management (APM) tools and system data with log analysis platforms; we even log data about application components with component monitoring tools. But there are some gaps. For example, pipeline-release frequency and failures are't commonly logged. In DevOps, it isn't just one piece of information that's most important, it's the continuity between them all. Just as the releases should be continuous, so should the data associated with them at each step of the way.

Your POV

But data isn't the same for everyone. Even something as simple as your role will determine what data is important to you. For the most part, you won't consider your peers' measurement goals. For example, if you use the word "logging" with a developer, they're going to reference APM, which helps them react to user behavior. They're aware of system monitoring, but they'd rather have IT ops think about it. The inverse is true as well.

Because of the difference in POV, what each person measures is different, and thus the entire population of data is disconnected. One way to reconcile this discrepancy is to create a shared objective. If the entire organization is shooting for the same goal, there will be a large incentive to share data and be on the same page. But this also requires that teams start monitoring their pipeline as well.

Pipelines as code

Organizations need to reconsider their pipeline as its own entity. Why do we consider logging the data in the applications we release but not the processes that release them? This usually stems from the fact that everyone owns a portion of the process, but rarely is one person or group watching all of it. Seeing the big picture of the entire development process requires a holistic point of view.

If teams consider their pipeline the meta-application, it can be transformative, and it quickly becomes clear how beneficial this can be to proving success over time. At the end of the year, the entire team can claim success for the modernization of their delivery process.

Continuous analytics

But adding the pipeline data is just one more data source in a mountain of data. For example, at a minimum, organizations need to log data from these examples of data elements:

System monitoring: Uptime, system events, security, machine performance
Performance data: Response times, load balancing
Exception monitoring: Stack traces, exception repetition, exception criticality
Configuration management: Script type and version audits
Pipeline data: Release versions, release times, release frequency, pass/fail releases, backlog performance
Component monitoring: License compliance, common vulnerabilities, dated components
Testing data: Scripts run, script run times, pass/fail, repeating issues
Marketing/application data: User actions, user behavior, user cohort

Below is an example reference architecture for setup, data sources, and their relationship to pipeline stages.

Each data point doesn't represent a new tool, but there certainly is an ecosystem of one or more tools that make it happen.

This is a lot of arrows and a lot of data points. So let's quantify why it's worth doing the data origami:

One language: In stand-up meetings, a good portion of time is spent just explaining what a term means. Or even worse, a term is used that has different meanings, team-wide without any explanation. A good analytics environment has everyone using a common language.

Continuous documentation: In DevOps, documentation doesn't go away, but the waterfall method of creating it has to. The best documentation is that which is created automatically by the system. At any point in time, a good analytics system can produce clear documentation, whether on individual components, or all of them.

Sharing info without sharing the keys: Sharing necessary information outside of a user's role, such as direct access to machines for developers, has been a bottleneck. With a good analytics environment, it isn't necessary to share access because the data will provide relevant information without requiring direct access to data sources.

Change control: When you move fast, it isn't clear the specialization that occurs, who owns what, or the potential impact of team member attrition. Having good analytics allows autonomy of all team members to access data if they're new or on duty and don't necessarily know who the subject matter expert is.

Remember, you aren't only justifying analytics from a business perspective—you have to convince individual stakeholders why they should care about more than just their slice of the data pie.

Developers: You're increasingly accountable for security and bugs. Knowing about issues sooner allows you to respond faster or even before they surface to the user. This will impact your ability to be a team player and hone your understanding of the code's impact on production.

IT ops: Knowing what code is coming to production will help you prepare for its impact. For example, new components might have you turn the magnifying glass on system vulnerabilities. Or, increased release frequency will have you reevaluate the resources planning mechanisms.

Management: Your applications are more and more attached to your bottom line. Identifying potential legal and user uprising can impact your ability to respond to critical events or amplify massive wins. The management team should be exposed to synthesized analytics and insights on a regular basis.

QA: Understanding the issues developers faced before testing allows you to create an expectation for upcoming testing cycles and helps you design test cases or refine existing ones. Knowing about potential negative impacts in production allows you to prepare for questions on test cases that have been run and ultimately refine a test strategy.

Growth hacking and marketing: The rate of new functionality and the impact of poor software quality are as important as user behavior, because at some point in the lifespan of the product, a new feature or outage will impact user behavior in the short and long term. Broader analytics can help you realize the pros and cons of such an event, and identify the relationship between an event and changes in user behavior.

As stated above, there are lots of moving parts. It isn't easy, but the expectation isn't to get the analysis down in one fell swoop—that's waterfall—it's that you enable the data visibility required to accomplish these types of analyses in the future.

Challenges

Here are the top challenges to providing all this data. Don't worry—they're all manageable.

Ownership: Each of the above components is owned by different groups. This is unlikely to change, but there can be a facilitator of data, such as the DevOps team, or even the QA team, which already has the whole picture in their mind. They may not own the data, but they can connect the dots. There are no longer data owners, but rather data stewards and experts who help the data facilitators know what to share with the broader team.

Disparate silos: It isn't likely that an environment will have a single system of record, but it's possible. With the APIs in log analysis platforms, data collection from any source can be easily stored and analyzed. This requires a lot of discipline on the part of the data stewards to provide that data and the organization to set up standards for doing so. While it's unlikely, a goal to reduce visualization and analysis platforms is a good idea.

Data paralysis: Team members can burn a lot of time digging into data, impacting their understanding of the system in a polarized way and causing them to become less effective in their core responsibilities. Avoid this by focusing on dashboards and insights and making deep dives into data a less frequent activity. Modern platforms will help push anomalies and patterns to you without the hunt. Data paralysis can also stem from having too many dashboards, many without a purpose.

New things: The modern development shop is changing rapidly. Building analytics systems for change is critical. Most notable is the introduction of container-driven pipelines. Here, in addition to all the data above, you now need to measure container registries, microservices, large customers, rogue and running containers, and so on.

Best practices

Just start: You won't get the analysis and insights right from day one. This is OK, just log everything and log early. Respond to business questions and drivers to create dashboards and insights as the environment evolves.

Be proactive, not reactive: Your goal should be to respond to events, both good and bad, as or before they happen. This isn't possible without active analysis and systems that push interesting data to you.

Share: Putting it all together is the second most challenging element. The first challenge is people and getting them on board in an effective and useful way. You can do it by sharing data that's useful to them to bring them into the conversation. And train them—training shouldn't necessarily be on how to use the data but on establishing a consistent way to do so.

The best and only way to start is to identify who the data facilitators are and to give them the autonomy to understand what's available, who needs it, and what's missing. Being data-driven is a mentality, and doing it effectively requires strategy, not just a project. Focus on simply identifying data elements and having a vision for how they're going to be used. When you do this right, the speed of modern development won't be its ultimate destruction.

Keep learning

Take a deep dive into the state of quality with TechBeacon's Guide. Plus: Download the free World Quality Report 2022-23.
Put performance engineering into practice with these top 10 performance engineering techniques that work.
Find to tools you need with TechBeacon's Buyer's Guide for Selecting Software Test Automation Tools.
Discover best practices for reducing software defects with TechBeacon's Guide.
Take your testing career to the next level. TechBeacon's Careers Topic Center provides expert advice to prepare you for your next move.

Read more articles about: App Dev & Testing, App Dev

You are here