Skip to content

Create fake worker runs for ML results with only worker versions

https://redmine.teklia.com/issues/10463

We still have both worker_version and worker_run foreign keys on elements, classifications, transcriptions, metadata, entities, and transcription entities, where having a worker_run implies having a worker_version set, but a worker_version can still be set without any worker_run, to allow older ML results to still exist.

To remove the extra worker_version FK without losing any information, we need to set a worker_run on all ML results that have a worker_version but no worker_run.

A new migration should:

  1. List all (corpus_id, worker_version_id) sets for all 6 tables of ML results, where the worker_version_id is set but there is no worker_run_id.
  2. If nothing has been found, end here.
  3. Bulk create Workers processes on each corpus named Migration of ML results without worker runs.
  4. Bulk create WorkerRuns within these processes for all the listed worker versions.
  5. Update all ML results to set their worker_run_id to the newly created worker runs. It will probably be fastest to do one update for each (corpus_id, worker_version_id) set since this should be able to use indexes for fast access.

Finally, all *_worker_run_requires_worker_version constraints should be renamed *_worker_run_and_worker_version, and should now require that worker_run_id IS NULL = worker_version_id IS NULL. This will ensure that you can either set both the version and run, or neither of them, removing the case where there is a version and no run.

These migrations will need to be tested on a rather large database as there is a high chance of hitting performance issues.