diff --git a/docs/usage/datasets/extract.md b/docs/usage/datasets/extract.md index fb65fcbbb77e51e7d2e5a6e839b1bd9e04361f0d..c22537c1463a36cb8373aeebd35419871251dd5a 100644 --- a/docs/usage/datasets/extract.md +++ b/docs/usage/datasets/extract.md @@ -12,8 +12,8 @@ Use the `teklia-dan dataset extract` command to extract a dataset from an Arkind | `--parent-element-type` | Type of the parent element containing the data. | `str` | `page` | | `--output` | Folder where the data will be generated. | `Path` | | | `--load-entities` | Extract text with their entities. Needed for NER tasks. | `bool` | `False` | -| `--only-entities` | Remove all text that does not belong to the tokens. | `bool` | `False` | | `--allow-unknown-entities` | Ignore entities that do not appear in the list of tokens. | `bool` | `False` | +| `--entity-separators` | Removes all text that does not appear in an entity or in the list of given characters. Do not give any arguments for keeping the whole text. | `str` | | | `--tokens` | Mapping between starting tokens and end tokens. Needed for NER tasks. | `Path` | | | `--use-existing-split` | Use the specified folder IDs for the dataset split. | `bool` | | | `--train-folder` | ID of the training folder to import from Arkindex. | `uuid` | |