Skip to content
Snippets Groups Projects

Fix dataset extraction offset

Merged Solene Tarride requested to merge fix-dataset-extraction-offset into main
1 file
+ 4
1
Compare changes
  • Side-by-side
  • Inline
@@ -22,6 +22,7 @@ def save_json(path, dict):
def insert_token(text, count, start_token, end_token, offset, length):
"""
Insert the given tokens at the right position in the text
start_token or end_token can be empty strings
"""
text = (
# Text before entity
@@ -35,7 +36,9 @@ def insert_token(text, count, start_token, end_token, offset, length):
# Text after entity
+ text[count + 1 + offset + length :]
)
return text, count + 2
token_offset = len(start_token) + len(end_token)
return text, count + token_offset
def parse_tokens(filename):
Loading