Skip to content

Implement data generation

Depends #1 (closed)

Refs https://redmine.teklia.com/issues/3676

  • the worker should work on a corpus (from process config), find its last available corpus export
  • iterate over all filtered elements from the process
  • extract all informations about this element and ALL its children (transcriptions, metas, entities)
  • store them in a cache sqlite database compatible with our existing workers
  • download all images used by these elements
  • build a tar+zstd archive with the resulting payload