Skip to content
Snippets Groups Projects

Rework the worker due to `Dataset` API changes

Merged Eva Bardou requested to merge rework-worker into main
Edited by Eva Bardou

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added P1 label

  • assigned to @ebardou

  • Eva Bardou added 1 commit

    added 1 commit

    Compare with previous version

  • Eva Bardou added 1 commit

    added 1 commit

    Compare with previous version

  • Author Developer

    Tested locally using this dataset:

    export ARKINDEX_CORPUS_ID=b3a664de-eecc-4ca0-9ce5-939a650eee02
    export ARKINDEX_API_URL=https://ee.preprod.arkindex.teklia.com/api/v1/ ARKINDEX_API_TOKEN=XXX
    
    worker-generic-training-dataset --set a65c5e0c-5126-44de-8d68-674f6ac071cc:training a65c5e0c-5126-44de-8d68-674f6ac071cc:validation a65c5e0c-5126-44de-8d68-674f6ac071cc:test --dev
    
    2024-04-04 16:43:11,038 WARNING/arkindex_worker: Missing ARKINDEX_WORKER_RUN_ID environment variable, worker is in read-only mode
    2024-04-04 16:43:11,038 INFO/arkindex_worker: Worker will use /home/eva/.local/share/arkindex as working directory
    2024-04-04 16:43:12,172 WARNING/arkindex_worker: Running without any extra configuration
    2024-04-04 16:43:12,172 INFO/arkindex.pagination: Loading first page on try 1/5
    2024-04-04 16:43:12,364 INFO/arkindex.pagination: Pagination will load a total of 1 page.
    2024-04-04 16:43:12,364 INFO/worker_generic_training_dataset.worker: Downloading export (4a3af150-730b-42b5-8289-23b9f0d084fd)...
    2024-04-04 16:43:13,050 INFO/worker_generic_training_dataset.worker: Downloaded export (4a3af150-730b-42b5-8289-23b9f0d084fd) @ `/tmp/test-eva-20240404-143748.sqlite`
    2024-04-04 16:43:13,111 INFO/worker_generic_training_dataset.worker: Building Dataset (a65c5e0c-5126-44de-8d68-674f6ac071cc) (1/1)
    2024-04-04 16:43:13,111 WARNING/arkindex_worker: Cannot update dataset as this worker is in read-only mode
    2024-04-04 16:43:13,112 INFO/worker_generic_training_dataset.worker: Processing Dataset (a65c5e0c-5126-44de-8d68-674f6ac071cc) (1/1)
    2024-04-04 16:43:13,112 INFO/worker_generic_training_dataset.worker: Cached database will be saved at `/tmp/tmpsf1y946a-arkindex-data/db.sqlite`.
    2024-04-04 16:43:13,113 INFO/arkindex_worker: Connected to cache on /tmp/tmpsf1y946a-arkindex-data/db.sqlite
    2024-04-04 16:43:13,264 INFO/worker_generic_training_dataset.worker: Images will be saved at `/tmp/tmpsf1y946a-arkindex-data/images`.
    2024-04-04 16:43:13,265 INFO/worker_generic_training_dataset.worker: Inserting dataset (a65c5e0c-5126-44de-8d68-674f6ac071cc)
    2024-04-04 16:43:13,272 INFO/arkindex.pagination: Loading first page on try 1/5
    2024-04-04 16:43:13,363 INFO/arkindex.pagination: Pagination will load a total of 1 page.
    2024-04-04 16:43:13,365 INFO/worker_generic_training_dataset.worker: Filling the cache with information from elements in the split training
    2024-04-04 16:43:13,366 INFO/worker_generic_training_dataset.worker: Processing `training` element (1/1)
    2024-04-04 16:43:13,366 INFO/worker_generic_training_dataset.worker: Processing element (735fa0da-f7de-45fd-b0a3-eceb93b342ae)
    2024-04-04 16:43:13,366 INFO/worker_generic_training_dataset.worker: Downloading image
    2024-04-04 16:43:14,120 INFO/arkindex_worker: Downloaded image https://preprod-arkindex-iiif.europe.iiif.teklia.com/iiif/2/ddf8fcbd-579e-449c-9d89-8f5ef9773d69/0,0,800,711/full/0/default.jpg - size=800x711 in 0:00:00.664373
    2024-04-04 16:43:14,142 INFO/worker_generic_training_dataset.worker: Inserting image
    2024-04-04 16:43:14,156 INFO/worker_generic_training_dataset.worker: Inserting element
    2024-04-04 16:43:14,165 INFO/worker_generic_training_dataset.worker: Listing classifications
    2024-04-04 16:43:14,166 INFO/worker_generic_training_dataset.worker: Listing transcriptions
    2024-04-04 16:43:14,167 INFO/worker_generic_training_dataset.worker: Listing entities
    2024-04-04 16:43:14,167 INFO/worker_generic_training_dataset.worker: Linking element 735fa0da-f7de-45fd-b0a3-eceb93b342ae to dataset (a65c5e0c-5126-44de-8d68-674f6ac071cc)
    2024-04-04 16:43:14,182 INFO/arkindex.pagination: Loading first page on try 1/5
    2024-04-04 16:43:14,266 INFO/arkindex.pagination: Pagination will load a total of 1 page.
    2024-04-04 16:43:14,268 INFO/worker_generic_training_dataset.worker: Filling the cache with information from elements in the split validation
    2024-04-04 16:43:14,268 INFO/worker_generic_training_dataset.worker: Processing `validation` element (1/1)
    2024-04-04 16:43:14,268 INFO/worker_generic_training_dataset.worker: Processing element (ab3fbf5b-8644-4d84-a111-adde4415a4fd)
    2024-04-04 16:43:14,268 INFO/worker_generic_training_dataset.worker: Downloading image
    2024-04-04 16:43:15,587 INFO/arkindex_worker: Downloaded image https://preprod-arkindex-iiif.europe.iiif.teklia.com/iiif/2/e8dcc924-f711-489c-9c55-8479d79baf43/0,0,1200,1674/full/0/default.jpg - size=1200x1674 in 0:00:01.195585
    2024-04-04 16:43:15,626 INFO/worker_generic_training_dataset.worker: Inserting image
    2024-04-04 16:43:15,635 INFO/worker_generic_training_dataset.worker: Inserting element
    2024-04-04 16:43:15,643 INFO/worker_generic_training_dataset.worker: Listing classifications
    2024-04-04 16:43:15,645 INFO/worker_generic_training_dataset.worker: Listing transcriptions
    2024-04-04 16:43:15,646 INFO/worker_generic_training_dataset.worker: Listing entities
    2024-04-04 16:43:15,646 INFO/worker_generic_training_dataset.worker: Linking element ab3fbf5b-8644-4d84-a111-adde4415a4fd to dataset (a65c5e0c-5126-44de-8d68-674f6ac071cc)
    2024-04-04 16:43:15,659 INFO/arkindex.pagination: Loading first page on try 1/5
    2024-04-04 16:43:15,746 INFO/arkindex.pagination: Pagination will load a total of 1 page.
    2024-04-04 16:43:15,747 INFO/worker_generic_training_dataset.worker: Filling the cache with information from elements in the split test
    2024-04-04 16:43:15,747 INFO/worker_generic_training_dataset.worker: Processing `test` element (1/1)
    2024-04-04 16:43:15,747 INFO/worker_generic_training_dataset.worker: Processing element (10262955-e628-4e6c-bfd9-04e3060aa74d)
    2024-04-04 16:43:15,748 INFO/worker_generic_training_dataset.worker: Downloading image
    2024-04-04 16:43:17,210 INFO/arkindex_worker: Downloaded image https://preprod-arkindex-iiif.europe.iiif.teklia.com/iiif/2/1b158a8f-b9d9-4e64-a5db-f9fbc8a50db0/0,0,1500,998/full/0/default.jpg - size=1500x998 in 0:00:01.251458
    2024-04-04 16:43:17,254 INFO/worker_generic_training_dataset.worker: Inserting image
    2024-04-04 16:43:17,272 INFO/worker_generic_training_dataset.worker: Inserting element
    2024-04-04 16:43:17,281 INFO/worker_generic_training_dataset.worker: Listing classifications
    2024-04-04 16:43:17,282 INFO/worker_generic_training_dataset.worker: Listing transcriptions
    2024-04-04 16:43:17,282 INFO/worker_generic_training_dataset.worker: Listing entities
    2024-04-04 16:43:17,282 INFO/worker_generic_training_dataset.worker: Linking element 10262955-e628-4e6c-bfd9-04e3060aa74d to dataset (a65c5e0c-5126-44de-8d68-674f6ac071cc)
    2024-04-04 16:43:17,291 INFO/worker_generic_training_dataset.worker: Compressing the images to /home/eva/.local/share/arkindex/a65c5e0c-5126-44de-8d68-674f6ac071cc.tar.zst
    2024-04-04 16:43:17,302 INFO/worker_generic_training_dataset.worker: Completed Dataset (a65c5e0c-5126-44de-8d68-674f6ac071cc) (1/1)
    2024-04-04 16:43:17,302 WARNING/arkindex_worker: Cannot update dataset as this worker is in read-only mode
    2024-04-04 16:43:17,302 INFO/worker_generic_training_dataset.worker: Ran on 1 dataset: 1 completed, 0 failed

    Which generated the following archive a65c5e0c-5126-44de-8d68-674f6ac071cc.tar.zst.

    I was not able to test the worker directly on preprod as the frontend to launch DatasetProcesses is currently broken there.

  • Eva Bardou requested review from @yschneider

    requested review from @yschneider

  • Eva Bardou added 4 commits

    added 4 commits

    Compare with previous version

  • Eva Bardou added 1 commit

    added 1 commit

    Compare with previous version

  • Eva Bardou changed the description

    changed the description

  • Eva Bardou resolved all threads

    resolved all threads

  • Eva Bardou requested review from @yschneider

    requested review from @yschneider

  • added 1 commit

    Compare with previous version

  • Yoann Schneider enabled an automatic merge when the pipeline for 951bf9f1 succeeds

    enabled an automatic merge when the pipeline for 951bf9f1 succeeds

  • Yoann Schneider approved this merge request

    approved this merge request

  • Yoann Schneider canceled the automatic merge

    canceled the automatic merge

  • added 5 commits

    Compare with previous version

  • Yoann Schneider enabled an automatic merge when the pipeline for a5726c5c succeeds

    enabled an automatic merge when the pipeline for a5726c5c succeeds

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading