Skip to content

Move the unknown token replacement step to download

Gabriel Bermes Poupeau requested to merge move-unknown-token-replace into main

Closes #291 (closed)

I was able to extract a real dataset using the same commands used in https://redmine.teklia.com/issues/7433#note-23

teklia-dan dataset entities \
    /home/training_data/ATR_page/Heritus/heritus-20240619-073640.sqlite

teklia-dan dataset tokens \
    ./entities.yml

teklia-dan dataset extract \
    /home/training_data/ATR_page/Heritus/heritus-20240619-073640.sqlite \
    --dataset-id fc6d4c69-ca18-4f39-8479-b891cc93bd29 \
    --element-type double_page \
    --output . \
    --tokens ./tokens.yml \
    --transcription-worker-runs 3947a85e-0661-42bc-8d71-4852adb94375 54e9034a-0ce9-43ab-8f12-2885fcc83842 ad2e5af5-a61a-4ade-827a-0129b0cfa493 \
    --entity-worker-runs 3947a85e-0661-42bc-8d71-4852adb94375 54e9034a-0ce9-43ab-8f12-2885fcc83842 ad2e5af5-a61a-4ade-827a-0129b0cfa493

# Merge dataset (override with Callico transcriptions)
teklia-dan dataset extract \
    /home/training_data/ATR_page/Heritus/heritus-20240619-073640.sqlite \
    --dataset-id 10f85201-3a7c-4190-806a-ae5452503280 \
    --element-type double_page \
    --output . \
    --tokens ./tokens.yml \
    --transcription-worker-runs ef07caeb-191c-4609-aee7-72308a7201ab \
    --entity-worker-runs ef07caeb-191c-4609-aee7-72308a7201ab

teklia-dan dataset download \
    --output . \
    --tokens ./tokens.yml \
    --max-width 1800

teklia-dan dataset analyze \
    --labels ./labels.json \
    --tokens ./tokens.yml \
    --output-file ./analyze.md
Edited by Manon Blanco

Merge request reports

Loading