Skip to content
Snippets Groups Projects

Update doc with `--discount_fallback` option for LM

Merged Solene Tarride requested to merge update-doc-discount_fallback into main
1 file
+ 13
4
Compare changes
  • Side-by-side
  • Inline
@@ -20,9 +20,12 @@ At character-level, we recommend building a 6-gram model. Use the following comm
```sh
bin/lmplz --order 6 \
--text my_dataset/language_model/corpus_characters.txt \
--arpa my_dataset/language_model/model_characters.arpa
--arpa my_dataset/language_model/model_characters.arpa \
--discount_fallback
```
Note that the `--discount_fallback` option can be removed if your corpus is very large.
The following message should be displayed if the language model was built successfully:
```sh
@@ -62,16 +65,19 @@ Chain sizes: 1:1308 2:27744 3:159140 4:412536 5:717920 6:1028896
Name:lmplz VmPeak:12643224 kB VmRSS:6344 kB RSSMax:1969316 kB user:0.196445 sys:0.514686 CPU:0.711161 real:0.682693
```
### Subord-level
### Subword-level
At subword-level, we recommend building a 6-gram model. Use the following command:
```sh
bin/lmplz --order 6 \
--text my_dataset/language_model/corpus_subwords.txt \
--arpa my_dataset/language_model/model_subwords.arpa
--arpa my_dataset/language_model/model_subwords.arpa \
--discount_fallback
```
Note that the `--discount_fallback` option can be removed if your corpus is very large.
### Word-level
At word-level, we recommend building a 3-gram model. Use the following command:
@@ -79,9 +85,12 @@ At word-level, we recommend building a 3-gram model. Use the following command:
```sh
bin/lmplz --order 3 \
--text my_dataset/language_model/corpus_words.txt \
--arpa my_dataset/language_model/model_words.arpa
--arpa my_dataset/language_model/model_words.arpa \
--discount_fallback
```
Note that the `--discount_fallback` option can be removed if your corpus is very large.
## Predict with a language model
See the [dedicated example](../predict/index.md#predict-with-an-external-n-gram-language-model).
Loading