From b614b3a5a3fc3f61c7ef9d1526ca02c58e4b897f Mon Sep 17 00:00:00 2001 From: manonBlanco <blanco@teklia.com> Date: Thu, 20 Jul 2023 11:11:06 +0200 Subject: [PATCH] Update documentation --- docs/usage/datasets/extract.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage/datasets/extract.md b/docs/usage/datasets/extract.md index fb65fcbb..c22537c1 100644 --- a/docs/usage/datasets/extract.md +++ b/docs/usage/datasets/extract.md @@ -12,8 +12,8 @@ Use the `teklia-dan dataset extract` command to extract a dataset from an Arkind | `--parent-element-type` | Type of the parent element containing the data. | `str` | `page` | | `--output` | Folder where the data will be generated. | `Path` | | | `--load-entities` | Extract text with their entities. Needed for NER tasks. | `bool` | `False` | -| `--only-entities` | Remove all text that does not belong to the tokens. | `bool` | `False` | | `--allow-unknown-entities` | Ignore entities that do not appear in the list of tokens. | `bool` | `False` | +| `--entity-separators` | Removes all text that does not appear in an entity or in the list of given characters. Do not give any arguments for keeping the whole text. | `str` | | | `--tokens` | Mapping between starting tokens and end tokens. Needed for NER tasks. | `Path` | | | `--use-existing-split` | Use the specified folder IDs for the dataset split. | `bool` | | | `--train-folder` | ID of the training folder to import from Arkindex. | `uuid` | | -- GitLab