Update doc with `--discount_fallback` option for LM
The command used to build LM can fail if the text corpus is too small. In this case, we should add the --discount_fallback
option (see here).
We need to add this information in DAN's documentation.
bin/lmplz --order 3 --text ../../dan/data/madcat/language_model/corpus_words.txt --arpa ../../dan/data/madcat/model_words.arpa
=== 1/5 Counting and sorting n-grams ===
Reading corpus_words.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 1356499 types 36791
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:441492 2:4435973632 3:8317450752
/home/mboillet/Desktop/Git/kenlm/lm/builder/adjust_counts.cc:49 in void lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const lm::builder::DiscountConfig&) threw BadDiscountException because `s.n[j] == 0'.
Could not calculate Kneser-Ney discounts for 1-grams with adjusted count 4 because we didn't observe any 1-grams with adjusted count 3; Is this small or artificial data?
Try deduplicating the input. To override this error for e.g. a class-based model, rerun with --discount_fallback
Aborted (core dumped)