Skip to content

[Extraction] The "train" set must be extracted first

During extraction, if the train set is not first (if, for example, we have val,test,train instead of train,val,test in dataset.sets then the self.charset variable will be empty and all the characters in the sets before the train set will be replaced by the unknown token.

You can use sets=",".join([VAL_NAME,TRAIN_NAME,TEST_NAME]) to see the tests failed