You are here

You are here

Why metrics don't matter in software development (unless you pair them with business goals)

public://pictures/Steven-Lowe-Principal-Consultant-Developer-ThoughtWorks.jpg
Steven A. Lowe Product Technology Manager, Google
 

All software-quality metrics are interesting, but none of them matter intrinsically. Cyclomatic complexity, for example, is irrelevant if users hate your software and the business loses money. Consistently producing quality software has little to do with the traditional source-code metrics and more to do with your process and production environment. And it has everything to do with the value you deliver.

Choosing metrics requires considerable thought and care to support specific business goals. This is critical: Measurements should be designed to answer business questions. And those questions are never, “How many KLOCs are we up to now?”

Because everything in software development is unique, nothing we can measure has any predictive value in isolation. Trends (not fluctuations) in objective metrics remain useful, but only when they support a specific business goal. This is why modern software development focuses on subjective metrics, attached to certain features and supporting specific business goals, thus allowing us to use metrics as an effective tool for continuous learning and improvement.

The snowflake metaphor: Why we can’t use traditional metrics

Much of the information available on the web regarding software-quality metrics is still oriented around waterfall and silos (and manual testing, yikes!), but modern software development organizations burned down the silos and went agile years ago. Now it’s all about delivering value rapidly using continuous integration, continuous delivery, and continuous improvement. But what happened to all of the traditional software-quality metrics? Don’t we still need them?

Software development is analogous to manufacturing, except that we don't make the same identical widgets over and over. We can't just measure for defects, reject some products, and ship the others. In software development, everything we build is a snowflake: unique, valuable, and incomparable.

  • Each component is a snowflake. It’s extremely rare that we would need two components with the exact same functionality, so comparing component metrics is pointless. We also cannot know what the “right” metrics are for each component in advance; there is no “Goldilocks metric” that tells us when something is just right.
  • Each person is a snowflake. Individual skills, traits, backgrounds, worldviews, interests, and motivations combine to affect each individual’s contribution. You like building user interfaces. I like streamlining architectures and uncovering domain models. We are both developers, but our performance can only be compared to our own history and goals, not to each other.
  • Each team is a snowflake. The pattern of communication and cooperation between team members is what distinguishes a team from a group of hermits. Because teams are composed of individuals, adding or removing team members creates a different team, which will manifest its own natural patterns. We cannot compare teams to one another, since they are composed of different people producing different things.
  • Each project or product is a snowflake. Comparing the invoicing system to the customer registration system is nonsense; each project contributes its own unique value to the business.

While we can easily measure all kinds of things about software components, individuals, teams, and projects, the metrics we choose will have no basis for comparison outside of their own scope. The only valid comparison in all of these cases is relative to individual history, as an indicator of progress toward a goal—ideally a business goal.

Continuous snowflake delivery

In addition, we don’t build software in big chunks and toss them over silo walls anymore (yay!). Instead, we build software continuously, in small, interlocking pieces, and seek feedback on each piece as rapidly as possible so we can learn from that what customers really want, which is what really adds value. Every software system or component embodies a hypothesis about adding value that can only be verified or disproved by delivering it. Until we deliver the software and see how it is used (or not used), we cannot learn.

So we seek to produce a steady stream of useful snowflakes, release them into the world as rapidly as possible, and learn everything we can from them. We don’t measure success as conformance to a plan or a preconceived specification (even if we call it a project story backlog); we measure success—and hence quality—as customer happiness and business value delivered.

Does that mean we don't need metrics? That software quality is unquantifiable? Of course not; we still need metrics, but different ones—ones that reflect business value. Thus, the subjective aspects of quality become more important than the traditional, objective ones.

Defining quality

So do we need to redefine quality? Not really. When we say “software quality,” we really mean three different things:

  1. The structural and logical quality of the software: ”academic” metrics, elegance of design, ease of understanding and maintenance, etc. There’s a lot of literature and folklore on what to measure for each of these, but it’s not terribly informative, given the snowflake comparison.
  2. The reliability, throughput, and time-and-motion efficiency of the software-development process. Here we automate everything that can be automated, from build to deployment to measurement, and monitor the production environment for stability and throughput.
  3. The suitability for use, i.e., user happiness. The performance and subjective usability of the software is most important here; users do not care about your fabulous McCabe complexity scores. They care that the software works in harmony with how they work and is reasonably reliable and responsive. There is no point in producing academically perfect software with five nines of reliability that the users won’t use.

We can measure all sorts of things about each of these aspects of software quality, but it is the subjective quality aspects that drive modern software development: the delivery of business value. This makes the traditional metrics largely irrelevant. They don’t go away; they just fade into the background.

Who owns quality?

The other difference from the traditional silo QA approach is that the entire team is responsible for quality and bakes it into the process. Practices such as test-driven development bring considerations of testability to the fore, embedding QA thinking into development activities. This organically improves the quality of the software (and the developers) without requiring any counting or tabulation. Objective quality becomes ubiquitous, more or less, as the test suites grow with the software (ideally).

The team is also responsible for subjective quality—everything from intuitive design to color schemes to that faster-than-expected load time—but this is not often discussed. In theory, there is no set boundary on the quality aspects of user experience or sentiment, yet in practice we can measure things that correlate with subjective quality. We just need a hypothesis to tie the metrics to the goals.

The problem is, how do we determine the correct success metrics for each system, feature, or component? And what about the objective metrics for our process?

The metrics that really do matter

What matters most is success metrics. To get there, you can indeed use several specific and objective measurements, in combination, to answer real business questions. The goal is to determine what value—e.g., a set of values, competitive benefits, game-changers for your customers, etc.—you want to deliver. Then you use several strong measurements to test your “value hypothesis.”

The specific set of metrics, and the approach to building a value hypothesis, is the subject of part 2, “9 metrics that can make a difference to today's software development teams.” Stay tuned!

Keep learning