Skip to content

Draft: Decode without <unk>

Solene Tarride requested to merge decode-without- into master

Some tokens rarely/never appear in the training set, so they cannot be accurately recognized. To avoid this, we use the <unk> token to represent these tokens.

However, we need to prevent the network to predict the <unk> token, as it is always incorrect.

Ex: https://demo.arkindex.org/element/bd15be49-8c40-47a9-a627-991ae3754209?highlight=ac2ab5b4-63ea-4916-8f65-db550a5fda58 => <unk> Mange ved jo digre si

Merge request reports

Loading