Skip to content

CreateExportProcess endpoint

https://redmine.teklia.com/issues/7497

Requires #1858 (closed), #1863 (closed), arkindex/workers/export#7 (closed), arkindex/workers/export#8 (closed)

A new CreateExportProcess endpoint at /api/v1/process/export/{id}/, where {id} is the ID of a corpus, accepts the following parameters:

  • export_id: UUID or None. Required, even if you just want to set it to None, so that it is explicit.

    • When set to an UUID, it should be of a CorpusExport belonging to the corpus. If it is in an error state, return HTTP 400.

    • When set to None, this will start an export, so this field should validate the same rules as StartExport:

      • No export can be started if one is already created or running on the corpus.
      • No export can be started if one has been created less than EXPORT_TTL_SECONDS ago and is done.

      You may want to move the code from arkindex.documents.serializers.export to a method on CorpusExport so that the same checks are done on both endpoints.

    This behavior should be documented in the help_text for this field.

  • format: Enum, required.

    You can define a new arkindex.process.models.ExportFormat enum which for now will only have pdf and pagexml. We will add more options later depending on which workers are available. You can add a comment on ArkindexFeature to remind developers to update both ArkindexFeature and ExportFormat for export feature workers!

  • element_id: UUID or None. Optional, defaults to None.

    • When this element does not exist or belongs to a corpus the user does not have guest access to, return HTTP 404.
    • When the user has guest access to the element's corpus, but it is not the corpus from the URL, return HTTP 400.
  • selection: Boolean. Optional, defaults to False.

    • If this is enabled and element_id is set, return HTTP 400.
    • If this is enabled and there are no elements in the user's selection for this corpus, return HTTP 400.
  • configuration: Dict. Optional, defaults to {}.

    This should be validated as a user configuration for the WorkerVersion that provides the ArkindexFeature that matches the specified format. For example, with pdf, sending {"order_by_name": "mayhaps"} should fail, because order_by_name should be a boolean. Pay particular attention to checking that no required parameters are missing.

When the user does not have a verified email, return HTTP 403.
When the user does not have guest access to the corpus, return a generic HTTP 404.
When the user has guest access to the corpus but not admin access, return HTTP 403 with an explicit message.

When everything is valid, this endpoint should, in a single transaction:

  1. Create and start a new CorpusExport on the corpus if export_id was None.
  2. Create a new Export process on the corpus, with the authenticated user as its creator and element_id set if one was specified.
  3. If selection is enabled, then new ProcessElement instances should be created in bulk for all of the elements selected by the user on this corpus.
  4. Get or create a WorkerConfiguration for the worker of the version that provides the feature selected by the format, containing the configuration and an extra export_id parameter set to the ID of the specified CorpusExport.
  5. Create a WorkerRun for the WorkerVersion providing the feature, with the selected WorkerConfiguration.
  6. Start the process.
  7. Return HTTP 201 with the same payload as a RetrieveProcess.
Edited by Erwan Rouchet