Skip to content
Snippets Groups Projects

Compute confidence scores by char, word or line

Merged Solene Tarride requested to merge 33-compute-confidence-scores-by-char-word-or-line into main
All threads resolved!

Closes #33 (closed)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • changed milestone to %ML Prod - Next

  • added P2 Quick Win labels

  • Solene Tarride added 1 commit

    added 1 commit

    • 828a5db1 - compute word/line confidence scores

    Compare with previous version

  • Solene Tarride added 1 commit

    added 1 commit

    Compare with previous version

  • Solene Tarride requested review from @schneider-y

    requested review from @schneider-y

  • Author Maintainer

    I added the parameter --confidence-score-levels to compute confidence scores averaged by word and/or lines.

    The following command will compute confidence scores for each line and for each word.

    teklia-dan predict --image dan_humu/example.jpg --model dan_humu/model.pt --parameters dan_humu/parameters.yml --charset dan_humu/charset.pkl --output dan_humu/out/ --scale 0.5 --confidence-score --confidence-score-levels [line, word]

    Output JSON:

    {
        "text": "Hansteensgt. 2 IV 28/4 - 19\nKj\u00e6re Gerhard.\nTak for Brevet om Boken og Haven\nog Crokus og Blaaveis og tak fordi\nDu vilde be mig derut sammen\nmed Kris og Ragna. Men vet Du\nda ikke, at Kris reiste med sin S\u00f8-\nster Fru Cr\u00f8ger til Lillehammer\nnogle Dage efter Begravelsen? Hen\ndes Adresse er Amtsingeni\u00f8r\nCr\u00f8ger. Hun skriver at de blir\nder til lidt ut i Mai. Nu er hun\nnoksaa medtat skj\u00f8nner jeg af Sorg\nog af L\u00e6ngsel, skriver saameget r\u00f8-\nrende om Oluf. Ragna har det\nherligt, skriver hun. Hun er bare\ngla, og det vet jeg, at \"Oluf er gla over,\nder hvor han nu er. Jeg har saa in-\nderlig ondt af hende, og om Du skrev\net Par Ord tror jeg det vilde gj\u00f8re\nhende godt. - Jeg gl\u00e6der mig over,\nat Du har skrevet en Bok, og\njeg er vis paa, at den er god.",
        "confidences": {
            "total": 0.99,
            "word": [
                1.0,
                1.0,
                1.0,
                0.99,
                ...
            ],
            "line": [
                0.99,
                0.99,
                1.0,
                0.99,
                1.0,
                1.0,
                0.99,
                1.0,
                1.0,
                0.99,
                0.99,
                1.0,
                1.0,
                1.0,
                1.0,
                0.99,
                0.99,
                0.98,
                1.0,
                1.0,
                0.99,
                1.0,
                1.0
            ]
        }
    }

    By default, only the total (overall) confidence score will be computed.

  • Solene Tarride marked this merge request as ready

    marked this merge request as ready

  • Solene Tarride added 1 commit

    added 1 commit

    Compare with previous version

  • Solene Tarride added 2 commits

    added 2 commits

    • db79c0e1 - small improvements
    • 96327069 - add arguments for word and line separators

    Compare with previous version

  • Solene Tarride resolved all threads

    resolved all threads

  • Author Maintainer

    As previously discussed, I added two arguments --word-separators and --line-separators to set separators to split text into words or lines.

    These parameters are used to compute confidence scores and plot attention maps at word/line level.

  • Solene Tarride added 2 commits

    added 2 commits

    Compare with previous version

  • Author Maintainer

    Now using regular expressions.

    We cannot use the custom type with a list of strings:

    The argument to type can be any callable that accepts a single string.

  • Solene Tarride resolved all threads

    resolved all threads

  • Solene Tarride added 1 commit

    added 1 commit

    • bf51947e - Simplify compute_prob_by_separator

    Compare with previous version

  • Solene Tarride added 1 commit

    added 1 commit

    Compare with previous version

  • Yoann Schneider resolved all threads

    resolved all threads

  • Yoann Schneider approved this merge request

    approved this merge request

  • Please register or sign in to reply
    Loading