Execute docker tasks in RQ
TODO:
-
Handle task failure -
+ recursive update (children tasks)
-
-
Ponos auth (via env) -
Publish logs as artifact -
Read artifact of the parent task -
Publish artifact -
Fix existing tests -
Fix exception mentioned in !2227 (comment 224950) -
Add tests for tasks scheduler
Follow-ups:
- Support downloading and extracting ZST extra files (
download_extra_files
method) #1695 (closed) - Retry single task #1696 (closed)
- Add tests for RQ agent (docker mocks) #1697 (closed)
Merge request reports
Activity
changed milestone to %Arkindex 1.5.4
assigned to @vrigal
added 6 commits
-
d610949b...bcb51a73 - 2 commits from branch
master
- 09b302e2 - POC: RQ tasks execution
- a1211240 - Upload logs
- d52ef3be - Publish artifacts
- 31fbeacf - WIP: Support GPU in docker
Toggle commit list-
d610949b...bcb51a73 - 2 commits from branch
I did not implemented the
download_extra_files
feature (that download and decompress .zst archives), but it should be easy copying the method and mounting a second temporary folder to/data/extra_files
. Implementing the "stop" action should be easy too.Also I got frequent case where the initialization task is stuck to "Awaiting workers activity initialization" (when the task is started immediately).
requested review from @babadie
I Had some race conditions checking for state while reading logs (e.g. if there is no text at all the container will not stop), so I used a simple polling (which is a little bit slower but performs less request to boto). Also I stopped the container (which actually is a kill after 10s).
example: stop
note: I found a weird bug with boto sending an empty payload (freezing for a minute) but has never been resolved (https://github.com/minio/minio/issues/6540).
added 9 commits
Toggle commit listNote: This MR does not handle retrying a single task, but it should be doable easily.
I supposeschedule_tasks
andrun_task_rq
could be overriden in the EE so no RQ job actually runs ?Also I got this message frequently from RQ agent:
Exception: Container was not updated to state running
because the container runs too fast (created
→exited
), removing the polling (a478eb08) does not fix it.Edited by Valentin Rigaladded 12 commits
-
363d400d...24db1780 - 3 commits from branch
master
- 3540f59a - POC: RQ tasks execution
- ab70f20f - Upload logs
- 7559bcdc - Publish artifacts
- c3587730 - Support GPU in docker
- 3de6f797 - Fixes
- 6c403bd3 - Download parents artifacts
- 69b30464 - Support stopping task
- def994d1 - Use polling to check stopping task
- a478eb08 - Remove polling waiting for the container to be available
Toggle commit list-
363d400d...24db1780 - 3 commits from branch
added 10 commits
-
18883039 - 1 commit from branch
master
- e8934f19 - POC: RQ tasks execution
- 1ae1af4c - Upload logs
- 9bd00a91 - Publish artifacts
- 58a2f544 - Support GPU in docker
- bade26e2 - Fixes
- 3addcbd0 - Download parents artifacts
- 62df505c - Support stopping task
- b3ce6c7d - Use polling to check stopping task
- 6293a154 - Remove polling waiting for the container to be available
Toggle commit list-
18883039 - 1 commit from branch
mentioned in merge request !2233 (merged)
added 17 commits
-
3faa7057...609b7e55 - 3 commits from branch
master
- 609b7e55...d6f56cfb - 4 earlier commits
- 7d7e8d6c - Fixes
- 244506ad - Download parents artifacts
- 4a516997 - Support stopping task
- 84b09676 - Use polling to check stopping task
- d28da981 - Remove polling waiting for the container to be available
- 6d060da5 - Prevent listing process' tasks in user jobs
- 98f791a9 - Update existing tests
- 579f963e - WIP: Prevent tasks to be executed in RQ during tests
- 6401dcee - Revert "WIP: Prevent tasks to be executed in RQ during tests"
- 311d7e1e - Use a generic way to patch process.run() in tests
Toggle commit list-
3faa7057...609b7e55 - 3 commits from branch
added 1 commit
- ad1bb84a - Use a more generic way to patch process.run() in tests
added 15 commits
-
69a08e79 - 1 commit from branch
master
- 69a08e79...e478a7ca - 4 earlier commits
- e813ba10 - Fixes
- 8fcfd079 - Download parents artifacts
- 5ee1e6d9 - Support stopping task
- 933812e5 - Use polling to check stopping task
- b0a7bb4c - Remove polling waiting for the container to be available
- aad9813a - Prevent listing process' tasks in user jobs
- b673b085 - Update existing tests
- 0df1e7cb - WIP: Prevent tasks to be executed in RQ during tests
- 3fdc1419 - Revert "WIP: Prevent tasks to be executed in RQ during tests"
- 94ade017 - Use a more generic way to patch process.run() in tests
Toggle commit list-
69a08e79 - 1 commit from branch
added 15 commits
-
87e7af5d - 1 commit from branch
community
- 87e7af5d...35705377 - 4 earlier commits
- 517b6e42 - Fixes
- e237009f - Download parents artifacts
- e65ac63a - Support stopping task
- ce77823d - Use polling to check stopping task
- 2fbc5361 - Remove polling waiting for the container to be available
- f2776cdf - Prevent listing process' tasks in user jobs
- a1f4f284 - Update existing tests
- a54a8ff5 - WIP: Prevent tasks to be executed in RQ during tests
- dcf8c329 - Revert "WIP: Prevent tasks to be executed in RQ during tests"
- 8191d353 - Use a more generic way to patch process.run() in tests
Toggle commit list-
87e7af5d - 1 commit from branch
- Resolved by Valentin Rigal
@vrigal there are CI issues
changed milestone to %Arkindex 1.6.0
- Automatically resolved by Valentin Rigal
- Automatically resolved by Valentin Rigal
- Automatically resolved by Valentin Rigal
- Resolved by Valentin Rigal
- Resolved by Valentin Rigal
- Automatically resolved by Valentin Rigal
- Resolved by Valentin Rigal
You need to implement the restart in this MR, it would be a lot faster to test
added 2 commits
- Resolved by Valentin Rigal
Let's now focus on adding a unit test that validates that a process
build_workflow
will generate RQ tasks.Please create 1 followup issues for all other remaining points :
- exception from !2227 (comment 224950)
- model download (extra files)
- restart task
Finally, introduce a setting
PONOS_RQ_EXECUTION
=true|false so that enterprise can disable this behaviour