Add Language Model Decoder
Closes #142 (closed)
Taking over from !222 (closed)
Merge request reports
Activity
changed milestone to %ML Prod - Next
added P1 label
assigned to @starride
added 2 commits
added 2 commits
added 1 commit
- a3d3370c - Use CTCHypothesis.tokens instead og CTCHypothesis.words
added 2 commits
added 26 commits
-
482981a3...5e109064 - 5 commits from branch
main
- 5e109064...8d268f96 - 11 earlier commits
- ee823c36 - Update tests for data extraction
- f94e2acb - Support batch_size>1
- a1551bc7 - Write tests for LM decoding
- b23826ea - Fix CTC token probability
- b9f4f3e8 - Use CTCHypothesis.tokens instead og CTCHypothesis.words
- b59f73e8 - Move tensor to correct device and trim prediction
- 06f5ef65 - Fix tests
- 28aaa477 - Simplify and document data extraction
- 51219ae4 - Document prediction with language model
- e28ddf96 - Document prediction command
Toggle commit list-
482981a3...5e109064 - 5 commits from branch
Here are the main changes:
- the
teklia-dan dataset extract
command now also generates resources to build an n-gram LM (default behavior)
output/ ├── charset.pkl ├── labels.json ├── images │ ├── train │ ├── val │ └── test ├── language_model │ ├── corpus.txt │ ├── lexicon.txt │ └── tokens.txt
- the
teklia-dan predict
command supports a new argument--use-language-model
. Not that other LM parameters should be set ininference_parameters.yml
parameters: ... language_model: model: path/to/language_model.arpa lexicon: path/to/lexicon.txt tokens: path/to/tokens.txt weight: 1.0
I have documented the prediction example here.
- the
Please register or sign in to reply