CreateExportProcess might start an SQLite export before it is saved
Some export processes in prod like this one stopped almost immediately after starting because they found that the CorpusExport was marked as failed. The RQ task for the CorpusExport failed as soon as it tried to mark the export as running:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 105, in _execute
return self.cursor.execute(sql, params)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "documents_corpusexport_pkey"
DETAIL: Key (id)=(0d439ed7-05cb-4326-96ef-8a6b80b93ad4) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/rq/worker.py", line 1431, in perform_job
rv = job.perform()
File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
self._result = self._execute()
File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
result = self.func(*self.args, **self.kwargs)
File "/usr/share/arkindex/documents/export/__init__.py", line 142, in local_export
File "/usr/share/arkindex/documents/export/__init__.py", line 156, in export_corpus
File "/usr/local/lib/python3.10/site-packages/django/db/models/base.py", line 822, in save
self.save_base(
File "/usr/local/lib/python3.10/site-packages/django/db/models/base.py", line 909, in save_base
updated = self._save_table(
File "/usr/local/lib/python3.10/site-packages/django/db/models/base.py", line 1071, in _save_table
results = self._do_insert(
File "/usr/local/lib/python3.10/site-packages/django/db/models/base.py", line 1112, in _do_insert
return manager._insert(
File "/usr/local/lib/python3.10/site-packages/django/db/models/manager.py", line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/django/db/models/query.py", line 1847, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/usr/local/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1823, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.10/site-packages/sentry_sdk/utils.py", line 1713, in runner
return sentry_patched_function(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/sentry_sdk/integrations/django/__init__.py", line 650, in execute
result = real_execute(self, sql, params)
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in execute
return self._execute_with_wrappers(
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 92, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 100, in _execute
with self.db.wrap_database_errors:
File "/usr/local/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.10/site-packages/django/db/backends/utils.py", line 105, in _execute
return self.cursor.execute(sql, params)
django.db.utils.IntegrityError: duplicate key value violates unique constraint "documents_corpusexport_pkey"
DETAIL: Key (id)=(0d439ed7-05cb-4326-96ef-8a6b80b93ad4) already exists.
This does not seem to be a stale read, since we are only talking about write operations (creating then updating the CorpusExport). But CorpusExport.start() is called before the transaction is committed, so it is possible that the RQ task is starting too early, before the CorpusExport is actually saved. We should use on_commit to only start the export after the transaction is committed.