Convert NER prediction to BIO format

Tested with the provided Hennessy archive and the following script:

import logging
from pathlib import Path

from import convert
from dan.utils import parse_tokens

logger = logging.getLogger(__name__)

ner_tokens = parse_tokens("data_hennessy/tokens.yml")

for directory in ["GT", "preds"]:
    base_output = Path(f"bio/{directory}")
    base_output.mkdir(parents=True, exist_ok=True)

    for path in Path(f"data_hennessy/{directory}").iterdir():"Converting file {path}")

            conversion = convert(path.read_text(), ner_tokens)
        except Exception as e:

        output = base_output /

Conversion on 2b325b53-0a27-4d6c-8325-0190e1c5c51b.txt

Text input
London Ⓐ20th January 1792Ⓑ
I wrote My Dear ⒸJimmyⒹ the 17th Instant and yesterday received yours of the 9th Ditto and Note Its contents am surprized my missing letters were not Come to hand, tI do not believe that ⒸBennet and HinshowⒹ had wrote you, I mentioned to you in my last that tho there was a Mistake in their order to me for the 20 Puncheons they would accept of them at the price you would have Purchased on the 7th Instant I mentioned some posts ago that Messieurs ⒸKneyon and PeileⒹ also accepted of the Purchase you in suspence you are well covered should they leave them for y a which I do not Immagin will be the case, I told you that when Master ⒸScholeyⒹ spoak to me of your Proposal he did not tell me if he would accept of it or not, but said he would write, I have heared nothing from him since on the subject, I call on him to today he was out and not Expected untill 4 ⁇Clock, I was 3 times today at ⒸT. BrownsⒹ 3 times at ⒸBevanⒹ’s saw neither of them I saw all the others in this quarter Nothing new ⒸTuffinⒹ only talkes to me of the affairs of ⒸTranceⒹ could wish himself a french man to take an active part in support of Liberty which he has now no doubt but they will Enjoy how is it that you Conceive so Bad an oppinion of this affair, I see it in a Very different light I have scare a doubt of its success, sir ab… and ⒸWilsonⒹ in the Country the samples per ⒸWilsonⒹ please them, my last Visit this afternoon was at ⒸHarrison W. and JohnsonsⒹ
BIO output
London O
20th B-date de la lettre
January I-date de la lettre
1792 I-date de la lettre
wrote O
My O
Dear O
Jimmy B-nom propre
the O
17th O
Instant O
and O
yesterday O
received O
yours O
of O
the O
9th O
Ditto O
and O
Note O
Its O
contents O
am O
surprized O
my O
missing O
letters O
were O
not O
Come O
to O
hand, O
tI O
do O
not O
believe O
that O
Bennet B-nom propre
and I-nom propre
Hinshow I-nom propre
had O
wrote O
you, O
mentioned O
to O
you O
in O
my O
last O
that O
tho O
there O
was O
a O
Mistake O
in O
their O
order O
to O
me O
for O
the O
20 O
Puncheons O
they O
would O
accept O
of O
them O
at O
the O
price O
you O
would O
have O
Purchased O
on O
the O
7th O
Instant O
mentioned O
some O
posts O
ago O
that O
Messieurs O
Kneyon B-nom propre
and I-nom propre
Peile I-nom propre
also O
accepted O
of O
the O
Purchase O
you O
in O
suspence O
you O
are O
well O
covered O
should O
they O
leave O
them O
for O
y O
a O
which O
do O
not O
Immagin O
will O
be O
the O
case, O
told O
you O
that O
when O
Master O
Scholey B-nom propre
spoak O
to O
me O
of O
your O
Proposal O
he O
did O
not O
tell O
me O
if O
he O
would O
accept O
of O
it O
or O
not, O
but O
said O
he O
would O
write, O
have O
heared O
nothing O
from O
him O
since O
on O
the O
subject, O
call O
on O
him O
to O
today O
he O
was O
out O
and O
not O
Expected O
untill O
4 O
⁇Clock, O
was O
3 O
times O
today O
at O
T. B-nom propre
Browns I-nom propre
3 O
times O
at O
Bevan B-nom propre
’s O
saw O
neither O
of O
them O
saw O
all O
the O
others O
in O
this O
quarter O
Nothing O
new O
Tuffin B-nom propre
only O
talkes O
to O
me O
of O
the O
affairs O
of O
Trance B-nom propre
could O
wish O
himself O
a O
french O
man O
to O
take O
an O
active O
part O
in O
support O
of O
Liberty O
which O
he O
has O
now O
no O
doubt O
but O
they O
will O
Enjoy O
how O
is O
it O
that O
you O
Conceive O
so O
Bad O
an O
oppinion O
of O
this O
affair, O
see O
it O
in O
a O
Very O
different O
light O
have O
scare O
a O
doubt O
of O
its O
success, O
sir O
ab… O
and O
Wilson B-nom propre
in O
the O
Country O
the O
samples O
per O
Wilson B-nom propre
please O
them, O
my O
last O
Visit O
this O
afternoon O
was O
at O
Harrison B-nom propre
W. I-nom propre
and I-nom propre
Johnsons I-nom propre
