Skip to content
Snippets Groups Projects

Support subword and word language models

Merged Solene Tarride requested to merge subword-and-word-lm into main
2 files
+ 2
0
Compare changes
  • Side-by-side
  • Inline
Files
2
@@ -365,6 +365,7 @@ class ArkindexExtractor:
self.mapping.encode[token]
) if token in self.mapping.encode else self.language_tokens.append(token)
self.language_tokens.append(self.mapping.ctc.encoded)
self.language_tokens.append(self.unknown_token)
# Build LM corpus
train_corpus = [
Loading