Item Collections in Web APIs

August 9, 2023

When designing or evolving HTTP/JSON APIs, there is sometimes the need to consider a collection of items within a parent API resource. In this post, we will evaluate the pros and cons of a few patterns.

Let’s start by taking a stripped-down API with a user resource as an example. There are operations for creating a new user, fetching the user by its ID, and updating the user properties.

curl https://example.api/users -X POST -d '{"name": "Yogi"}'
{
  "id": 1,
  "name": "Yogi"
}

curl https://example.api/users/1
{
  "id": 1,
  "name": "Yogi"
}

curl https://example.api/users/1 -X PATCH -d '{"name": "Boo-Boo"}'
{
  "id": 1,
  "name": "Boo-Boo"
}

Simple, right? Now let’s say that our user can have a number of hats. As with the user’s name, we want to support changes to those hats, because, well, hats go in and out of fashion.

Object of Dynamic Keys

So one way we might try to solve this is with an object of dynamic keys to objects. Or an object that represents a collection. In this way, we would define a user “hats” property that is an object. The object’s keys are either a generated key like a unique ID or a timestamp, or an indexed number. The values themselves are the item value.

What might that look like in our example?

curl https://example.api/users/1 -X PATCH -d '{
  "hats": {
    "hat_0": { "style": "Sun hat", "occasion": "beach" },
    "hat_1": { "style": "Fedora", "occasion": "lounging" }
  }
}'
{
  "id": 1,
  "name": "Yogi",
  "hats": {
    "hat_0": { "style": "Sun hat", "occasion": "beach"  },
    "hat_1": { "style": "Fedora", "occasion": "lounging" }
  }
}

There’s a lot that is wrong with this approach, but what are the advantages of this? First, when the keys are uniquely named like this, changing a property of one specific hat is trivial if you know that there is a hat. A second advantage is that all our keys are guaranteed to be strings. This can avoid some confusion in JSON libraries that provide property access via notations like .[0] or .0.

curl https://example.api/users/1 -X PATCH -d '{
  "hats": {
    "hat_0": { "style": "Bucket" }
  }
}'
{
  "id": 1,
  "name": "Yogi",
  "hats": {
    "hat_0": { "style": "Bucket", "occasion": "beach" },
    "hat_1": { "style": "Fedora", "occasion": "lounging" }
  }
}

However, in my experience, this design only really works if the “hats” collection property is immutable—either available to the creation of the resource only or a property that is referenced from other data and never a part of any write operations for the parent resource.

On the other hand, the disadvantages are numerous.

One inconvenience is that the keys are dynamic. Schema definition standards typically do not support dynamic keys. Documenting an object that represents a collection requires either statically defining the first key with a note explaining that further properties of the same type can be added by incrementing the key suffix or using a regular expression pattern to define the key’s data type.

When working with code that implements such an API, codebases cannot be searched by the key text of “hat_0” unless constants are defined, and there is no knowing how many constants are enough. We instead turn to searching via the string “hat_”, or some combination of “hat” and “_” if the code uses string concatenation.

The real problem is with the fact that the existing resource data must be known before changes can be made to the “hats” property. If we wanted to add a hat, we cannot blindly PATCH with “hat_0” or we risk overwriting data. The next hat number must first be ascertained with a fetch of the existing record.

Defining item deletion also requires unconventional semantics. In the user hats example, the options are either to allow a hat value to be “null” or to first set the entire “hats” object to “null” before patching back the hats we want to keep. One breaks iteration logic when programming against the hats, and the other requires keeping state across multiple API calls.

Array of Objects

The better alternative to representing the collection of items in a parent API resource is to use an actual array of objects.

In our user hats example, that would look like:

curl https://example.api/users/1 -X PATCH -d '{
  "hats": [
    { "style": "Sun hat", "occasion": "beach" },
    { "style": "Fedora", "occasion": "lounging" }
  ]
}'
{
  "id": 1,
  "name": "Yogi",
  "hats": [
    { "style": "Sun hat", "occasion": "beach" },
    { "style": "Fedora", "occasion": "lounging" }
  ]
}

This design trades the dynamic keys for static keys. This gives us back the ability to properly document this kind of API with standard tools. It also gives back the small convenience of being able to better search code for the specific key name.

With arrays of objects, update semantics still need to be defined. The two options are either to treat the array itself as a primitive scalar value or to define a unique key named something like “reference_id” in each item object.

Treating the array value like a primitive scalar value is more conventional. Defining this update semantic is clear. To update one item of the collection, the API caller must pass in the full end result of the data including all subobjects that are not being modified from the previous array property. Any item left out is treated as removed. The advantage of this is simplicity, making it hard to misuse. Idempotency is also kept—the end result is the same no matter how many times the same update operation is attempted. This strategy is quite useful when the objects of the array are relatively simple, small, and have few keys.

The other update semantic is defining a reference ID property that uniquely identifies items that should be updated. During an update operation, any items in the existing record with a reference ID matching a reference ID in the request will be removed. After the update request (what gets stored), the array field will contain the objects in the update request body as well as any existing subobjects with reference IDs that did not match in the request. Explained another way, the existing items are filtered by out by the set of reference IDs in the request and then concatenated with the items specified in the request.

In our user hats example, it might look like this:

curl https://example.api/users/1
{
  "id": 1,
  "name": "Yogi",
  "hats": [
    { "style": "Sun hat", "occasion": "beach", "reference_id": "f8RdS79" },
    { "style": "Fedora", "occasion": "lounging", "reference_id": "rbj43kA" }
  ]
}

curl https://example.api/users/1 -X PATCH -d '{
  "hats": [
    { "style": "Bucket", "occasion": "beach", "reference_id": "f8RdS79" }
  ]
}'
{
  "id": 1,
  "name": "Yogi",
  "hats": [
    { "style": "Fedora", "occasion": "lounging", "reference_id": "rbj43kA" },
    { "style": "Bucket", "occasion": "beach", "reference_id": "f8RdS79" }
  ]
}

Now with this, specific items can be modified via its reference ID. However, this makes defining the update much more complicated. Supporting item deletion via reference IDs involves special keywords that would break the representational structure of the schema.

Sub-resource Endpoint

As the schema of the item becomes more complex, the more compelling that defining a sub-resource becomes. Defining the item as a distinct API resource allows the manipulation of instances of those items via HTTP verbs and conventions. Defining a distinct sub-resource endpoint also makes the API more flexible for future integration use cases by enabling relationships between the sub-resource items and other API resources by item ID.

There are two ways of identifying instances of sub-resources. The first way is the conventional service-generated ID. For this, the API caller must create an instance of the sub-resource via a POST call. The second way is a stable list of named strings. An example for our user hats API might be to identify the hat by the occasion it is used in, resulting in hat endpoints like /users/1/hats/beach and /users/1/hats/lounging. This second way of identifying items is useful if the name strings are available by default and do not need explicit create requests to contain valid data.

In conclusion, when designing an item collection in a REST API, consider using simple arrays of objects or sub-resource endpoints for more complex objects to make the API easier to use, harder to misuse, and more flexible for future use cases.