Migrate workers & their versions from a repo towards generic implem
Closes #1481 (closed)
This new django command allows an instance administrator to quickly migrate all workers from a repos towards a Generic worker along with a model.
As we are publishing our models on Arkindex instances, and removing them from the workers Docker image, we should only have a generic implementation in the worker, and a lot of models (instead of lots of workers).
This command does the following (limited to specified repository):
- find a generic worker semi-automatically (looking by name, otherwise asking to choose one worker)
- check the generic worker has an available version on main/master. Otherwise crash. We do not want to migrate towards a bad image.
- migrate all workers (except training ones) by iterating on them and doing:
- removing all totally unused worker versions (there are a lot on preprod !)
- find a model matching the name of the worker (asking user to pick one)
- mark the model as compatible with the generic worker (will help for later work)
- use the last available model version, as we cannot match worker version & model version
- ask for confirmation
- migrate all worker version by doing:
- list all corpus with some ML results linked to that worker version
- iterate over all corpus
- create a new worker process for that corpus to migrate that worker
- create a new worker run in that process using generic worker version + model version on that local process previously found
- update all ML results to use that new worker run + generic worker version
- delete that worker version as it became useless
- delete the worker if it became unused (no versions remaining)
I developed that command against a local databse using preprod dump, so it works on real data (I had to update the weird U-FCN
model name from preprod to Doc-UFCN
as on prod/demo and cheat by enabling a few GitRef
for master branches on generic worker versions).
I also had to open the constraint on I now create a WorkerRun
so we can have several worker runs on a process using the same worker version (but different models).Workers
process for each worker to migrate and for each corpus impacted by the change.