Skip to content

Build corpus export as SQLite database

This issue introduce only the low-level code to build and fill the sqlite database (basically the async task). It should not introduce the API endpoint or any extra behaviour (email sending, storage, etc... this will all be introduced in other issues).

Format

A format similar to the worker cache:

  • element
    • id
    • created
    • updated
    • name
    • type (the slug)
    • polygon
    • image_id
    • worker_version_id
  • element_path
    • id
    • parent_id
    • child_id
    • ordering
  • image
    • id
    • url
    • width
    • height
  • transcription
    • id
    • type
    • text
    • worker_version_id
    • confidence
  • classification
    • id
    • class_name
    • state
    • confidence
    • high_confidence
    • worker_version_id
  • entity
    • id
    • name
    • type
    • validated
    • moderator (email)
    • metas (JSON)
    • worker_version_id
  • transcription_entity
    • transcription_id
    • entity_id
    • offset
    • length
    • worker_version_id (soon™)
  • entity_link
    • id
    • parent_id
    • child_id
    • role_id
  • entity_role
    • id
    • parent_name
    • child_name
    • parent_type
    • child_type
  • metadata
    • id
    • element_id
    • name
    • type
    • value
    • entity_id
    • worker_version_id
  • worker_version
    • id
    • name (from worker)
    • slug (from worker)
    • type (from worker)
    • revision (hash from revision)