Possible memory leak in the binary build
Sentry Issue: ARKINDEX-BACKEND-4E
SystemExit: 1
(32 additional frame(s) were not displayed)
...
File "arkindex_common/ml_tool.py", line 257, in iter
File "glob.py", line 72, in _iglob
for name in glob_in_dir(dirname, basename, dironly):
File "glob.py", line 92, in _glob0
if os.path.lexists(os.path.join(dirname, basename)):
File "posixpath.py", line 181, in lexists
os.lstat(path)
File "gunicorn/workers/base.py", line 201, in handle_abort
sys.exit(1)
Gunicorn workers in preprod have been killed a few times after the preprod ran out of memory, causing it to go down for about 10 minutes at a time, while runaway tasks (ponos#12) were flooding the CreateClassifications
endpoint. There was a slow, but visible, growth in memory usage from backend Gunicorn workers before the tasks were killed. While there were other API calls during that time,
Status pages for the three runaway tasks that were running during the OOM kills:
- Google Vision (4 days) https://preprod.arkindex.teklia.com/process/253bda68-ec74-428a-9786-6d3c29d3ab8d/0
- Madcat (3-4 weeks) https://preprod.arkindex.teklia.com/process/8b165379-1c72-45ad-946b-80409ead9182/0
- Madcat (3-4 weeks) https://preprod.arkindex.teklia.com/process/1d06cc13-8777-4c29-842e-5745e1659fa9/0
Edited by Erwan Rouchet