Skip to content

List and cache all worker runs used on a corpus

Refs https://redmine.teklia.com/issues/5590

As we want to transition away from worker versions to filter and reference ML results, we need a way to list all worker runs used in a corpus by ML results.

We can use the WorkerRun model directly as it's tied to a process which is linked to a corpus: we'll simply extend it with a boolean field has_results (default to False).

A dedicated Django command cache_worker_runs will lookup all ML results in a corpus (like cache_worker_versions) with worker runs, and update these worker runs with has_results:

  1. reset all worker runs has_results from processes in that corpus
  2. iterate over results (lookup CorpusWorkerVersionManager) to set has_results to True for matching worker runs

Finally, we need an endpoint ListCorpusWorkerRuns which uses the same serializer structure (not necesarilly the same class) as ListCorpusWorkerVersions, to expose:

  • worker run ID
  • worker version details
  • configuration details
  • model details