Skip to content
Snippets Groups Projects

Support subword and word language models

Merged Solene Tarride requested to merge subword-and-word-lm into main
1 file
+ 4
3
Compare changes
  • Side-by-side
  • Inline
@@ -226,9 +226,10 @@ class Tokenizer:
:param text: Text to be tokenized.
"""
return " ".join(
self.encode(
[char if char in self.charset else self.unknown_token for char in text]
)
[
char if char in self.charset else self.unknown_token
for char in self.encode(text)
]
)
def encode(self, text: List[str]) -> List[str]:
Loading