Support empty lines in BIO parser
Nerval evaluation fails on Socface images due to empty lines.
Evaluation: 1%| | 1/126 [00:04<09:08, 4.39s/it, values={'cer': nan, 'cer_no_token': nan, 'wer': 0.0,
2024-02-08 10:03:08,759 INFO/dan.ocr.evaluate: Evaluating on set `test`
Evaluation: 1%| | 1/109 [00:22<40:30, 22.51s/it, values={'cer': 0.0599, 'cer_no_token': 0.0651, 'wer':
#### DAN evaluation
| Split | CER (HTR-NER) | CER (HTR) | WER (HTR-NER) | WER (HTR) | WER (HTR no punct) | NER |
|:-----:|:-------------:|:---------:|:-------------:|:---------:|:------------------:|:----:|
| val | nan | nan | 0.0 | 0.0 | 0.0 | nan |
| test | 5.99 | 6.51 | 17.24 | 16.14 | 16.14 | 1.84 |
#### Nerval evaluation
Traceback (most recent call last):
File "/gpfsdswork/projects/rech/rxm/ulb79yw/nerval/nerval/parse.py", line 45, in parse_line
assert match_iob, f"Line {line} does not match IOB regex"
AssertionError: Line does not match IOB regex
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/env/bin/teklia-dan", line 8, in <module>
sys.exit(main())
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/cli.py", line 26, in main
status = args.pop("func")(**args)
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 221, in run
eval(0, config, nerval_threshold, mlflow_logging)
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 193, in eval
eval_nerval(
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 126, in eval_nerval
ground_truths = inferences_to_parsed_bio("ground_truth")
File "/gpfsdswork/projects/rech/rxm/ulb79yw/dan/dan/ocr/evaluate.py", line 121, in inferences_to_parsed_bio
return parse_bio(bio_values)
File "/gpfsdswork/projects/rech/rxm/ulb79yw/nerval/nerval/parse.py", line 75, in parse_bio
word, label = parse_line(index, line)
File "/gpfsdswork/projects/rech/rxm/ulb79yw/nerval/nerval/parse.py", line 49, in parse_line
raise Exception(f"The file is not in BIO format: check line {index} ({line})")
Exception: The file is not in BIO format: check line 0 ()
Edited by Mélodie Boillet