Skip to content

Cache ListCorpusWorkerVersions

Goal: cache the API results of ListCorpusWorkerVersions, using Django low-level cache, creating a generic mixin for other API list views.

The mixin would overload ListAPIView, and expose a method to build a cache key for given request. The API View using that mixin would be able to define its own logic, and get out of the box:

  • a check on cache as soon as possible on the HTTP GET request flow
  • if no cached version is available, call the generic workflow and save the result in the cache for the next call

The main issue lies in building smart cache keys:

  • they need to be easily found to be deleted (cache invalidation from other parts of the code)
  • they need to be easy & fast to build in the HTTP GET request flow
  • they need to be precise, and reflect the request attribute (pagination especially)

As Django does not support a way to list cache keys given a pattern, we'll need to maintain a list of keys for a related object (cache busting DB approach). So it now becomes easy to create cache keys:

  • generate a unique ID for a given request (serialize all request parameters, using url query string, path, ....)
  • link that to an object in another cache key

This does not change anythong for the cache check and only adds 2 steps in the cache build:

  1. retrieve the current cache reference for the target
  2. update that cache reference

Example for /api/v1/workers/{corpus_id}/versions/?page=2:

  1. cache key is ListWorkerVersions:{corpus_id}:page_2 (or something close)
  2. target element is corpus corpus_id
  3. target cache key is cache:corpus:{corpus_id}
  4. ... and it holds a pickled list including ListCorpusWorkerVersions:{corpus_id}:page_2

When we want to bust the whole cache for that corpus:

  1. retrieve the cache key of the corpus
  2. call delete_many on all keys
Edited by Bastien Abadie