Dataset tokens
Description
Use the teklia-dan dataset tokens
command generate a YAML file containing entities and their token(s) to train a DAN model.
Parameter | Description | Type | Default |
---|---|---|---|
entities |
Path to a YAML file containing the extracted entities. | pathlib.Path |
|
--end-tokens |
Whether to generate end tokens along with starting tokens. | bool |
False |
--output-file |
Path to a YAML file to save the entities and their token(s). | bool |
tokens.yml |
The entities
argument expects a YAML-formatted file with the list of entity names. This file can be generated by the teklia-dan dataset entities
command. More details in the dedicated page.
entities:
- INTITULE
- DATE
- ANALYSE_COMPL.
- PRECISIONS_SUR_COTE
- COTE_ARTICLE
- CLASSEMENT
Examples
Start tokens
teklia-dan dataset tokens \
entities.yml
This command will create a tokens.yml
YAML-formatted file with a specific format. A list of entries with each entry describing a NER entity. The label of the entity is the key to a dict mapping the starting and ending tokens respectively.
INTITULE: # Type of the entity on Arkindex
start: Ⓐ # Starting token for this entity
end: ''
DATE:
start: Ⓑ
end: ''
ANALYSE_COMPL.:
start: Ⓒ
end: ''
PRECISIONS_SUR_COTE:
start: Ⓓ
end: ''
COTE_ARTICLE:
start: Ⓔ
end: ''
CLASSEMENT:
start: Ⓕ
end: ''
Start tokens + End tokens
teklia-dan dataset tokens \
entities.yml \
--end-tokens
This command will create a tokens.yml
YAML-formatted file with a specific format. A list of entries with each entry describing a NER entity. The label of the entity is the key to a dict mapping the starting and ending tokens respectively.
INTITULE: # Type of the entity on Arkindex
start: Ⓐ # Starting token for this entity
end: Ⓑ # Ending token for this entity
DATE:
start: Ⓒ
end: Ⓓ
ANALYSE_COMPL.:
start: Ⓔ
end: Ⓕ
PRECISIONS_SUR_COTE:
start: Ⓖ
end: Ⓗ
COTE_ARTICLE:
start: Ⓘ
end: Ⓙ
CLASSEMENT:
start: Ⓚ
end: Ⓛ