IntegrityError when updating a task that already has a finished date to Running
Sentry Issue: ARKINDEX-BACKEND-2D7
A Slurm Ponos agent tried over 7000 times to update a few tasks to running
, which was apparently an allowed transition here (I don't know what their original state was), but it got blocked by PostgreSQL, because updating it to running
only changes the started
date and not the finished
date. Those tasks already had a finished
date, and it was earlier than the new start date, causing a check constraint to fail.
A finished
date is only expected on a task that is in a final state, and once it is in such a state, it will never go back to unscheduled
or pending
, where it could be updated to running
. I think the only way this could occur is if someone messes with the task in the Django admin.
We can add another check constraint to only allow a finished date to be set when the task is in a final state. This constraint can be validated by the Django admin, and it can show a nice error message through the violation_error_message
, requiring the admin to do the right thing. This constraint will also mean we should not be able to end up in the situation that caused this Sentry issue again, even if the problem was not from a manual update in the admin.
CheckViolation: new row for relation "ponos_task" violates check constraint "task_finished_after_started"
DETAIL: Failing row contains (858efc1b-04b6-433b-be9d-6d168e94165b, 0, 3, dan_offline_publish_4a4e6b_4, running, null, 2024-12-23 09:55:24.384586+00, 2025-01-06 07:15:26.265048+00, 6d449d7e-67e6-c5a3-8e15-5580c9eddea5, worker-dan-offline-publish, registry.gitlab.teklia.com/workers/dan:commit-ceaeda40, null, f, 10, 2025-01-22 09:55:24.342747+00, null, "TASK_ELEMENTS"=>"/data/initialisation/elements_chunk_4.json", "..., , oqmiESutSxmS81y1A0LHgr4qOllLjUaYjNNIBjLj+H0=, 01de7786-80c5-4c1c-a18a-8ad921a68be7, 4a4e6b0b-5a2a-434d-923b-d21b887603c7, null, 2024-12-28 04:49:10.969451+00, 2025-01-06 07:15:26.264618+00, 432000).
File "django/db/backends/utils.py", line 105, in _execute
return self.cursor.execute(sql, params)
IntegrityError: new row for relation "ponos_task" violates check constraint "task_finished_after_started"
DETAIL: Failing row contains (858efc1b-04b6-433b-be9d-6d168e94165b, 0, 3, dan_offline_publish_4a4e6b_4, running, null, 2024-12-23 09:55:24.384586+00, 2025-01-06 07:15:26.265048+00, 6d449d7e-67e6-c5a3-8e15-5580c9eddea5, worker-dan-offline-publish, registry.gitlab.teklia.com/workers/dan:commit-ceaeda40, null, f, 10, 2025-01-22 09:55:24.342747+00, null, "TASK_ELEMENTS"=>"/data/initialisation/elements_chunk_4.json", "..., , oqmiESutSxmS81y1A0LHgr4qOllLjUaYjNNIBjLj+H0=, 01de7786-80c5-4c1c-a18a-8ad921a68be7, 4a4e6b0b-5a2a-434d-923b-d21b887603c7, null, 2024-12-28 04:49:10.969451+00, 2025-01-06 07:15:26.264618+00, 432000).
(23 additional frame(s) were not displayed)
...
File "arkindex/ponos/serializers.py", line 155, in update