The code etymologist, Part 6: Data structures

The Code Etymologist is a multi-part series that covers typical software knowledge that deserves to be documented and shared within development teams.

There's a neverending debate between compile time and runtime. Between types and values. Between static analysis and dynamic analysis. Between type checkers and unit tests. Which side is more valuable? I've been pivoting myself a lot between the two sides quite a lot. They are both valuable, but in different ways.

Runtime values and unit tests provide information about what things actually are. They don't tell how things should or could be. Therefore, when we encounter some untyped code, we might have some questions about specific runtime values.

Let's consider an example of a Rest API call to fetch a list of messages for a chat component. The response contains an array of objects with various fields, but some of them, in particular, might raise some questions:

// GET: /messages
[
  {
    "id": 8356,
    "content": "good morning, everyone!",
    // ... other fields
    "status": 2, // 1️⃣
    "messageType": "text", // 2️⃣
    "annotation": null // 3️⃣
  }
]

For instance, you might ask yourself:

what does 2 represent, and what other values could the status field have?
are there any other types of messages besides "text"?
when could the annotation field be different than null, and what data type is it?

Types and static analisys on the other hand describe the theory, what things should or could be. They describe the contract and rules instead of the execution.

But in JS/TS land, types are completely stripped away at runtime, providing no actual guarantees. To add salt to injury, there are various levels of type-safety in TS, depending on the strictness settings in tsconfig.json. So, any contract we might have at compile time can be completely broken during execution.

Also, TypeScript per se doesn't implicitly help us a lot. Even if we use types, the response might be typed as:

type MessageListResponse = Array<Message>;

type Message = {
  id: number;
  content: string;
  // ... other fields
  status: number; // 1️⃣
  messageType: string; // 2️⃣
  annotation: Annotation | null; // 3️⃣
};

The above type definition adds some additional information, but not too much:

we still don't know the possible values for status and what they represent;
we still don't know the possible values for messageType;
we know that annotation is nullable, but we still don't know the conditions of having the Annotation structure.

The reality is that TypeScript includes a gradual typing system. There are many shades of type definitions. Some are wider, permissive, and encapsulate little information. Others are narrower, stricter, and provide lots of insights.

Now, if we encounter the above situation in real life, we don't have much of a choice but to analyze the code and figure out the possible values of those fields. Alternatively, we could ask our teammates who have been around longer and might recall the details.

So, let's improve the type definitions to convey more information about the data structure.

Enums

Whenever we have to deal with a limited set of values, it's very helpful to define them separately:

const MessageStatus = {
    Unsent: 0,
    Sent: 1,
    Deleted: 2,
} as const;

type MessageStatus = typeof MessageStatus[keyof typeof MessageStatus];

type Message = {
    // ... other fields
    status: MessageStatus, // 1️⃣
    messageType: string,
    annotation: Annotation | null,
}

The above declaration includes additional valuable information, regardless if we're familiar with the code or not. MessageStatus clearly documents what are the possible values for the status field and what they represent.

NoteWe could also use TypeScript Enums as well, but for the purpose of this article, it doesn't matter which syntax is used to define them. The only relevant motivation for the above approach using plain JS objects is the type-stripping feature added in Node.js 22, which encourages the usage of ECMAScript-compliant syntax.

Literal Unions

An alternative to Enums is to define all possible values as a Union of literal types. This applies especially for string values, because they are self-explanatory and don't require an additional key to explain the value. For instance, we might deal with 2 types of messages:

type MessageType = "text" | "comment";

type Message = {
    // ... other fields
    status: MessageStatus,
    messageType: MessageType, // 2️⃣
    annotation: Annotation | null,
}

The above improvement answers our 2nd question above regarding the possible values for messageType.

NoteChoosing between Enums or Unions is a matter of personal preferences. Both of them provide intellisense and autocomplete, both are type-safe, and both work with rename refactoring tools. Enums work with both number and string types, while Literal Unions work better with strings.

Discriminated Unions

Whenever we are dealing with different variations of a single object, there's a good chance that their structures are slightly different, even if they have a lot of common fields. In such scenarios, a discriminated union is usually hiding, waiting to be discovered.

Another hint for discriminated unions, even when there's no obvious discriminant, is when we have data structures with a bunch of optional or nullable fields.

We can refactor the Message type as a union of two different types, using the messageType as a discriminant:

type Message = TextMessage | CommentMessage;

type TextMessage = {
  // ... other fields
  messageType: "text";
  status: MessageStatus;
  annotation: null;
};

type CommentMessage = {
  // ... other fields
  messageType: "comment";
  status: MessageStatus;
  annotation: Annotation;
};

Using the above discriminated union, we're answering the 3rd question above. The annotation is not really nullable. Instead, there are two types of Messages:

there's a TextMessage with no annotation;
and there's an CommentMessage that includes some Annotation data.

Discriminated unions provide valuable information about the entities and the relationships between them. Additionally, the TypeScript language service deeply understands them, providing great control flow analysis and type narrowing.

Schemas

As we've seen so far, types help a lot during development, at compile time. However, they don't provide 100% guarantees at runtime.

Schema validators fill this gap, providing runtime data parsing and validation. Below, we have the same data structure we worked with so far, but defined as a schema using Zod:

import { z } from "zod";

enum MessageStatus { Unsent = 0, Sent = 1, Deleted = 2 };

const TextMessage = z.object({
  id: z.number(),
  content: z.string(),
  status: z.nativeEnum(MessageStatus),
  messageType: z.literal("text"),
  annotation: z.null(),
});

const CommentMessage = z.object({
  id: z.number(),
  content: z.string(),
  status: z.nativeEnum(MessageStatus),
  messageType: z.literal("comment"),
  annotation: Annotation,
});

const Message = z.discriminatedUnion("messageType", [TextMessage, CommentMessage]);

Additionally, modern libraries are powerful enough to infer the TypeScript static type from the schema definition. This enables us to have a single source of truth for a data structure definition.

type Message = z.infer<typeof Message>;

Explaining Type

Let's look at a different example of improving a type definition. In the example below, we have a Record type for storing the last read messages in order to display which messages are unread for each chat room. In practice, this is a key-value pair between a chatRoomId and a messageId.

type LastReadMessage = Record<number, number>;

However, the above code doesn't describe what those number types represent. A self-explanatory approach, without using code comments, is to extract the primitive number into 2 differently named type aliases to describe their nature:

type ChatRoomId = number;
type MessageId = number;
type LastReadMessage = Record<ChatRoomId, MessageId>;

This type definition might seem useless, but it clearly explains the structure of the Record. This refactoring is similar to Martin Fowler's Introducing Explaining Variable, but at the primitive type level.

Continue reading Part 7: Technical decisions