Skip to content
Snippets Groups Projects
Commit cfc54b7a authored by Solene Tarride's avatar Solene Tarride Committed by Mélodie Boillet
Browse files

Update doc with `--discount_fallback` option for LM

parent 82d7946e
No related branches found
No related tags found
1 merge request!304Update doc with `--discount_fallback` option for LM
...@@ -20,9 +20,12 @@ At character-level, we recommend building a 6-gram model. Use the following comm ...@@ -20,9 +20,12 @@ At character-level, we recommend building a 6-gram model. Use the following comm
```sh ```sh
bin/lmplz --order 6 \ bin/lmplz --order 6 \
--text my_dataset/language_model/corpus_characters.txt \ --text my_dataset/language_model/corpus_characters.txt \
--arpa my_dataset/language_model/model_characters.arpa --arpa my_dataset/language_model/model_characters.arpa \
--discount_fallback
``` ```
Note that the `--discount_fallback` option can be removed if your corpus is very large.
The following message should be displayed if the language model was built successfully: The following message should be displayed if the language model was built successfully:
```sh ```sh
...@@ -62,16 +65,19 @@ Chain sizes: 1:1308 2:27744 3:159140 4:412536 5:717920 6:1028896 ...@@ -62,16 +65,19 @@ Chain sizes: 1:1308 2:27744 3:159140 4:412536 5:717920 6:1028896
Name:lmplz VmPeak:12643224 kB VmRSS:6344 kB RSSMax:1969316 kB user:0.196445 sys:0.514686 CPU:0.711161 real:0.682693 Name:lmplz VmPeak:12643224 kB VmRSS:6344 kB RSSMax:1969316 kB user:0.196445 sys:0.514686 CPU:0.711161 real:0.682693
``` ```
### Subord-level ### Subword-level
At subword-level, we recommend building a 6-gram model. Use the following command: At subword-level, we recommend building a 6-gram model. Use the following command:
```sh ```sh
bin/lmplz --order 6 \ bin/lmplz --order 6 \
--text my_dataset/language_model/corpus_subwords.txt \ --text my_dataset/language_model/corpus_subwords.txt \
--arpa my_dataset/language_model/model_subwords.arpa --arpa my_dataset/language_model/model_subwords.arpa \
--discount_fallback
``` ```
Note that the `--discount_fallback` option can be removed if your corpus is very large.
### Word-level ### Word-level
At word-level, we recommend building a 3-gram model. Use the following command: At word-level, we recommend building a 3-gram model. Use the following command:
...@@ -79,9 +85,12 @@ At word-level, we recommend building a 3-gram model. Use the following command: ...@@ -79,9 +85,12 @@ At word-level, we recommend building a 3-gram model. Use the following command:
```sh ```sh
bin/lmplz --order 3 \ bin/lmplz --order 3 \
--text my_dataset/language_model/corpus_words.txt \ --text my_dataset/language_model/corpus_words.txt \
--arpa my_dataset/language_model/model_words.arpa --arpa my_dataset/language_model/model_words.arpa \
--discount_fallback
``` ```
Note that the `--discount_fallback` option can be removed if your corpus is very large.
## Predict with a language model ## Predict with a language model
See the [dedicated example](../predict/index.md#predict-with-an-external-n-gram-language-model). See the [dedicated example](../predict/index.md#predict-with-an-external-n-gram-language-model).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment