Allow agents to update a task from Stopping to Error
Sentry's weekly report made me notice that the one and only Sentry issue of Ponos has exploded again since Thursday evening. xenarque
has been trying to stop a container that has disappeared entirely:
févr. 19 09:22:11 xenarque ponos-agent[2508203]: [WARNING] Missing container ponos-9322-558bbb: 404 Client Error for http+docker://localhost/v1.40/containers/ponos-9322-558bbb/json: Not Found ("No such container: ponos-9322-558bbb")
févr. 19 09:22:11 xenarque ponos-agent[2508203]: [WARNING] Missing container ef8fb902cb211a12485a9ba6f1255b33caecd3641773e8fd4f3c7c556bd996a7: 404 Client Error for http+docker://localhost/v1.40/containers/ef8fb902cb211a12485a9ba6f1255b33caecd3641773e8fd4f3c7c556bd996a7/json: Not Found ("No such container: ef8fb902cb211a12485a9ba6f1255b33caecd3641773e8fd4f3c7c556bd996a7")
févr. 19 09:22:11 xenarque ponos-agent[2508203]: [WARNING] Container for task initialisation (558bbbed-15c5-4b24-ae63-d6bc6ca86c26) not found
févr. 19 09:22:11 xenarque ponos-agent[2508203]: [ERROR] Main loop error: 400 Client Error: Bad Request for url: https://demo.arkindex.org/api/v1/task/558bbbed-15c5-4b24-ae63-d6bc6ca86c26/from-agent/
The task is marked as Stopping
, so the backend tells the agent to stop the task. The task is not found, so the agent cannot stop it and cannot report it as stopped: it doesn't know whether the task actually started, got killed, ended (un)successfully, or anything else. So it tries to mark it as Error
and gets an HTTP 400 because that's not in the allowed states implemented in #1603 (closed).