Support nested entities
We need to support nested entities with a custom BIO format:
Charles B-child B-name
né I-child
à I-child
Beaune I-child B-location
en I-child
1836 I-child B-date
père O
Jean B-father B-name
Bigre I-father B-surname
charpentier I-father B-occupation
de I-father
cette I-father B-location
paroisse I-father I-location
mère O
Marie B-mother B-name
We could define a hierarchy between entities:
levels:
0:
- child
- father
- mother
1:
- name
- surname
- location
- date
- occupation
We need to modify the Tokens
/ Spans
/ Document
classes to support this new format, and also define a new property named hierarchy
that would return a dictionary (input of David's metrics):
{
"hierarchy": [
{
"category": "child",
"children": [
{
"category": "name",
"children": [
"Charles"
]
},
{
"category": null,
"children": [
"n\u00e9"
]
},
{
"category": null,
"children": [
"\u00e0"
]
},
{
"category": "location",
"children": [
"Beaune"
]
},
{
"category": null,
"children": [
"en"
]
},
{
"category": "date",
"children": [
"1836"
]
}
]
},
{
"category": "father",
"children": [
{
"category": "name",
"children": [
"Jean"
]
},
{
"category": "surname",
"children": [
"Bigre"
]
},
{
"category": "occupation",
"children": [
"charpentier"
]
},
{
"category": null,
"children": [
"de"
]
},
{
"category": "location",
"children": [
"cette"
]
},
{
"category": "location",
"children": [
"paroisse"
]
}
]
},
{
"category": "mother",
"children": [
{
"category": "name",
"children": [
"Marie"
]
}
]
}
]
}