Skip to content

Evaluate predictions with nerval

Depends #229 (closed) ner/nerval#11 (closed)

We'll use the new dan.bio module to evaluate batches. We will support a new metric name ner to trigger that new computation. Only trigger this behaviour when we have NER tokens.

We will compute scores for each prediction, during evaluation, and store them. We will compute averages per split and display them nicely in a single markdown table.

We need scores per split, per entity and macro-averaged. To evaluate, we will use the code from ner/nerval#11 (closed).

Store predictions during evaluation step (key = "str_x", also keep "str_y") and return them so that we can store them at the upper-level.

For each split,

Expose nerval threshold as new CLI argument, defaults to 0.3.

Edited by Yoann Schneider