Skip to content

Import the export !

We need to move some corpus data from preprod to prod for Ocapi (but some ML experts wanted that feature too).

We can now build a Django management script (named load_export) to import the export from an instance onto another. The overall workflow will be:

  1. create a new corpus with a name provided as --name on CLI (default to Data import <date>)

  2. create all element types (from table elements), the name is the slug with .title()

  3. create all image server (from table image, using url). Some code from ImageServerManager.form_url should be used, or simply executed (might not be perf enough)

  4. create all images, using previously created image servers

  5. create all worker versions, respecting the hierarchy:

    • get or create a new repo with url http://data.import
    • get or create workers using the provided slug as identifier
    • get or create version using the provided ID
    • we lack some data in the export to do that cleanly
  6. create all elements using previously created images and types

  7. create all transcriptions

  8. create all metadata (no allowed here)

  9. create all ml class using all distinct classification class names

  10. create all classifications

Please note on that issue (as comments) every data bit that lacks and should be added to the export.

Do not import entities for now.

Edited by Bastien Abadie