Big benefits from tiny types: How to make your code's domain concepts explicit

Primitive code types such as string, integer, and float are tremendously useful for representing data, but the types themselves are uninformative. You must rely on the variable name to know what it means. Normally that's fine, but sometimes it isn’t—especially in extreme cases of primitive obsession. 

Fortunately, you can use tiny types to solve the problem. Here's how.

Gartner Magic Quadrant for Software Test Automation 2017

Just read teh codez?

Take this snippet of code, for example:

void Publish(String channel, String message)

You can guess at the intent from the variable names—but you can’t be sure precisely. If you just look at the type signature you see:

void Publish(String, String)

The types provide no useful information to the developer or the compiler.

So what? Developers do this all the time. The code works, they figure it out, so what’s the problem?

There are holes in your code. You might just fill these holes with primitives instead of thinking about the implied domain type. But doing so degrades the domain model, causing needless confusion and avoidable errors. Plant tiny types in these holes instead.

When you choose a type to represent some information in a program, you have the opportunity to add domain knowledge into the code, to inform the next developer of your intentions. Also, from a type-purity point of view, it is incorrect to assert that, for example, a channel is just a string. A string may hold any value, of any length, in any language. A channel likely has only a few valid values.

Some languages make it easy to define a tiny type; others less so. The amount of effort required will vary, but in general the effort is slight, and the benefits enormous.

Wait, what benefits?

A tiny type is just a lightweight value object, an object whose value is also its identity. If two different instances of the same value object hold the same value, they are equal. The value of a value object cannot change once it is set. So a tiny type must be immutable and comparable.

Precisely how to do this varies by language. In Java your tiny type’s class must override the equals and getHashCode methods to support comparison by value properly, and all properties are marked as "final" to prevent modification after construction. In the examples that follow I ignore these syntactic details in my code for brevity.

Replacing primitive types with informative domain tiny types brings four main benefits:

  • Domain intention: The tiny type tells the developer what the item means in the domain.
  • Compiler information: The compiler can perform stronger and more relevant type-checking.
  • Consolidated validation: Validation is moved to the constructor/factory, so instances are always valid.
  • Immutable value objects: Values cannot be changed after construction, and thus are thread-safe.

How to extract a tiny type

To extract a tiny type, first decide what the tiny type means in the domain (hint: It’s not “string”). In this case it’s pretty simple: A channel tiny type represents a communications topic on a message bus.

Second, find all of the relevant validations and invariants for the new type, and enforce them in the constructor.

To find everything relevant to your new tiny type, you must examine all uses of the concept in the code. In a more realistic system, this might require some sleuthing, paying careful attention to existing and necessary unit test coverage so you can safely refactor the code.

But in my test script, the user enters the channel ID, which just gets passed along. In this case, the code does not tell you what constitutes a valid channel value. To figure this out, you'll need to examine the bus library documentation (or code) and use a bit of common sense. For example, there’s probably a limit on the length of a channel ID that the bus library can handle, and null or blank channel IDs probably won’t work. Let’s say we discover that channel IDs cannot be blank or null, and can be at most 16 characters.

Here’s the original (and terrible) test script:

  Scanner scanner = new Scanner(System.in);
  String channelId = scanner.next();
  String message = scanner.next();
  Bus bus = new Bus();
  bus.Publish(channelId, message);  //but no one is listening

And our channel class:

  public Channel {
    private int MAX_CHANNEL_ID_LENGTH = 16;
    private String _channelId;
    public Channel(String channelId) {
     validate(channelId);
     _channelId = channelId; 
    }
    private void validate(String channelId) {
      if (channelId == null) { … throw exception }
      if (channelId.length == 0) { … throw exception }
      if (channelId.length > MAX_CHANNEL_ID_LENGTH) { … throw exception }
    }
    public String getChannelId() { return _channelId; }
  }

Notice that in this class, it is impossible for the constructor to create invalid channel objects. What's more, the bus library class still accepts strings, not tiny types.

If you didn’t control this library, you'd need to create a facade or adapter class to map from the world of trusted, strongly typed value objects to the library’s world of suspicious strings. For the purposes of this article, I'll assume that I control the bus library, so I can change it to use the new tiny types directly.

Thus:

bus.Publish(String channelId, String message)

...becomes:

bus.Publish(Channel channel, String message)

But wait, why didn’t I name that method getValue instead of getChannelId?

The name getChannelId makes the domain intention more obvious than getValue would. I also suspect that later on the channel tiny type will accumulate additional properties, at which point having getValue return the channel ID would cease to make sense.

So now I can rewrite the test script to use the new channel tiny type:

  Scanner scanner = new Scanner(System.in);
  String channelId = scanner.next();
  Channel channel = new Channel(channelId);  //throws exceptions for invalid
channel IDs
  String message = scanner.next();
  Bus bus = new Bus();
  bus.Publish(channel, message);  //but no one is listening

But wait, won’t that throw exceptions when invalid channel IDs are entered?

Yes. And while that’s a fairly heinous user experience, the original code would likely also throw exceptions for invalid channel IDs, only much later/deeper in the code. Or it might not, and you’d never know that publish failed, which is even worse.

You can easily trap exceptions and make the test script report failure gracefully, even run in a loop until a valid channel ID is entered, etc., and there are ways other than exceptions to report validation violations. But that’s not the point here. If the channel constructor succeeds, you are guaranteed to have a valid channel object because channel now owns all of the validation logic for the type.

One more example, please?

Extracting a tiny type for the message field follows a similar path. But should you bother? The message really is just a string, isn’t it?

Maybe. But it probably can’t be null—it’s pointless for it to be empty—and there’s most likely an upper limit to its size. (In a realistic bus implementation, it might have further structure as well.) Suppose you get something like this:

  public Message {
    private int MAX_MESSAGE_LENGTH = 4294967296;
    private String _message;
    public Message(String message) { validate(message); _message = message; }
    private void validate(String message) {
      if (message == null) { … throw exception }
      if (message.length == 0) { … throw exception }
      if (message.length > MAX_MESSAGE_LENGTH) { … throw exception }
    }
    public String getMessage() { return _message; }
  }

And now your test script looks like this:

  Scanner scanner = new Scanner(System.in);
  String channelId = scanner.next();
  Channel channel = new Channel(channelId);  //throws exceptions for invalid
channel IDs
  String messageText = scanner.next();
  Message message = new Message(messageText);
  Bus bus = new Bus();
  bus.Publish(channel,message);  //but no one is listening

In the original code, what would have happened if we called:

bus.Publish(message,channel);

It would not complain, and might even work if the message text happens to be the same as a channel ID.

In the old code, to the compiler the channel and message are just strings, and one string is as good as another. In the new code, the tiny types provide domain information to the compiler, and it will complain about passing a message where it expects a channel. The compiler can help keep the semantics straight, but the rest of the code can trust instances of channel and message to be valid, since a code element is now responsible for the distinct concepts (not uninteresting primitives) of channel and message.

Prior to refactoring, the concepts of channel and message did not exist explicitly in the code. They were implied by the names of some variables, but no code element was responsible for the representation. Any changes to channel or message would require hunting through all of the old code to find strings pretending to be channels and messages. Now those concepts truly exist in the code and can respond to changes more efficiently.

Resiliency regained

For example, what would happen if you were given a new rule that channel IDs could not contain white space?

All you'd have to do is change the validate method in the channel class to test the new condition and throw an appropriate exception. It can’t get much simpler than that.

Note that if the bus library changed in other ways, such as if it used a GUID as a channel identifier or required a message to report its type and encoding format, the changes would have a single point of focus: the tiny type class. In the original code, by comparison, the validation and structural criteria were either strewn about the code or missing completely.

In the original code, domain knowledge was lost, but it was regained by refactoring out tiny types. The question “What is a channel?” is now easily answered by a few lines of code in one logical place, and a channel can never be mistaken for a message again.

Use tiny types to make the invisible visible

A tiny type plants the seed of a domain concept firmly in the code. As such, it may inform both developers and the compiler, and may evolve over time. Many tiny types will stay tiny, while others may grow into more complex value object or entity classes. The fundamental value of a tiny type is to make an invisible domain concept visible, and it is this visibility that yields benefits.

Look through your code and notice the places where you are using a primitive type for an item that has a domain-meaningful name. Then consider extracting a tiny type to make the implicit domain concept explicit.

Are you ready to give it a shot? Let me know how it goes by posting your comments and questions below.

Gartner Magic Quadrant for Software Test Automation 2017
Topics: AgileApp DevQuality