Skip to content
Snippets Groups Projects

Support subword and word language models

Merged Solene Tarride requested to merge subword-and-word-lm into main
All threads resolved!
8 files
+ 37
246
Compare changes
  • Side-by-side
  • Inline
Files
8
@@ -87,16 +87,18 @@ def insert_token(text: str, entity_type: EntityType, offset: int, length: int) -
def normalize_linebreaks(text: str) -> str:
"""
Remove begin/ending linebreaks
Replace \r with regular linebreak and consecutive linebreaks
Remove begin/ending linebreaks.
Replace \r with regular linebreak and consecutive linebreaks.
:param text: Text to normalize.
"""
return TRIM_RETURN_REGEX.sub("\n", text.strip())
def normalize_spaces(text: str) -> str:
"""
Remove begin/ending spaces
Replace \t with regular space and consecutive spaces
Remove begin/ending spaces.
Replace \t with regular space and consecutive spaces.
:param text: Text to normalize.
"""
return TRIM_SPACE_REGEX.sub(" ", text.strip())
Loading