Designing Resilient APIs with Idempotency

How can we design APIs to be resilient when our networks necessarily aren’t? An API should be robust enough to handle failure scenarios including connectivity drops, timeouts between resultant calls, and more. If a client makes a request to our API and loses connection during the request, how can we ensure that a successive identical request doesn’t alter the state of the system in a way that we weren’t expecting? This is where idempotence comes into play. An idempotent request is one which can be made any number of times with the guarantee that any resulting logic, or side effects, only happen once.

HTTP has two methods as idempotent by default, the PUT and DELETE verbs. A PUT request is utilised to replace an entire entity with the payload included in the request. Therefore, we can safely make a PUT request multiple times, safe in the knowledge that we’ll simply overwrite an entity with the same contents. DELETE requests are similar in that if a first DELETE request failed, then a subsequent request would leave the system in the same intended state. Multiple successful DELETE requests might return different status codes in the response (200 for the first request, 410 or 404 for the second), but again, the state of the system would remain the same. We should be careful not to interpret idempotency as “I should receive the same response from multiple identical requests” but as “The state of the system should be the same when multiple identical requests are made”.

How can we implement idempotency in our APIs?

We’ve identified the need to ensure that our API is capable of serving multiple identical requests under conditions of volatility, but how do we implement that in practice? One such way of dealing with such cases is through the use of idempotency keys.

An idempotency key is a unique token generated by the client and passed into the header of a request. When a server receives a request containing an idempotency key it stores it for potential later use. Once the server finishes handling the request, it will update the details stored against the idempotency key to mark this request as completed. If possible, the server could also store the result. If a client makes a further request containing the same idempotency key (perhaps they lost connection before the results were retrieved), the server identifies the key it stored previously and serves up the cached results, or, in scenarios where the server does not store a result, it could return a 409 status code response, detailing that a resource already exists against the idempotency key passed in the request header.

Let’s take a look at these examples in more detail. Firstly, our client makes a request to create a new resource by calling our POST HTTP endpoint, passing an idempotency key in the header and a payload in the request body:

POST https://an-api/v1/resources HTTP/1.1
Idempotency-Key: 845c52a3-6b91-4358-9004-e2f94eec48fa
{
    "first_name": "Jean Luc",
    "surname": "Picard",
    "rank": "Captain"
}

Server side, we create a new resource with the attributes specified in the request body and store the idempotency key and the status of the request, which is ‘complete’. However, the connection between the client and the server dropped, so we’ve been unable to return a response to the client. In this case, the client retries their request, re-sending an identical idempotency key and payload. The server cross references the incoming idempotency key with those contained in storage, identifies that it is a duplicate key and returns the following response:

HTTP/1.1 409 (Conflict)
{
    "error": "A resource has previously been created using this idempotency key"
}

Let’s re-use the above example, but consider this time that the server has stored the response it would have sent had the connection between client and server not dropped. Again, the client retries their request, re-sending the identical idempotency key and payload. The server again cross references the incoming idempotency key with those contained in storage, identifies that it is a duplicate key and returns the cached response:

HTTP/1.1 201 (CREATED)
{
    "message": "Resource created successfully"
}

In our final scenario, perhaps the server was unable to complete the request due to a failure part-way through processing. The logic and resultant behaviour here depends on how the idempotency is implemented on the server. In this situation, the server might have stored the state of the request against the idempotency key at certain points of operation, in which case upon a re-request from the client, the server can cross reference the incoming idempotency key in a re-request with those in storage and identify at which point the transaction was aborter. The server can then continue processing before sending back a response. Another implementation might be that the entire operation was rolled back via an ACID database, meaning that the server can re-process the request from scratch.

The server side storage of idempotency keys should be recycled periodically. We wouldn’t expect a dropped connection re-request to happen 24 hours after the original request, so this isn’t the kind of data which we need to store long term.

In a future blog post I’ll look at a lightweight implementation of idempotency in both Flask and Django web applications.