Skip to content
Snippets Groups Projects

Implement extraction command

Merged Yoann Schneider requested to merge implement-extraction-command into main
1 file
+ 13
13
Compare changes
  • Side-by-side
  • Inline
+ 13
13
@@ -104,19 +104,19 @@ The available arguments are
| Parameter | Description | Type | Default |
| ------------------------------ | ----------------------------------------------------------------------------------- | -------- | ------- |
| `--parent` | UUID of the folder to import from Arkindex. You may specify multiple UUIDs. | str/uuid | |
| `--element-type` | Type of the elements to extract. You may specify multiple types. | str | |
| `--output` | Folder where the data will be generated. Must exist. | Path | |
| `--load-entities` | Extract text with their entities. Needed for NER tasks. | bool | False |
| `--tokens` | Mapping between starting tokens and end tokens. Needed for NER tasks. | Path | |
| `--use-existing-split` | Use the specified folder IDs for the dataset split. | bool | |
| `--train-folder` | ID of the training folder to import from Arkindex. | uuid | |
| `--val-folder` | ID of the validation folder to import from Arkindex. | uuid | |
| `--test-folder` | ID of the training folder to import from Arkindex. | uuid | |
| `--transcription-worker-version` | Filter transcriptions by worker_version. Use ‘manual’ for manual filtering. | str/uuid | |
| `--entity-worker-version` | Filter transcriptions entities by worker_version. Use ‘manual’ for manual filtering | str/uuid | |
| `--train-prob` | Training set split size | float | 0,7 |
| `--val-prob` | Validation set split size | float | 0,15 |
| `--parent` | UUID of the folder to import from Arkindex. You may specify multiple UUIDs. | `str/uuid` | |
| `--element-type` | Type of the elements to extract. You may specify multiple types. | `str` | |
| `--output` | Folder where the data will be generated. Must exist. | `Path` | |
| `--load-entities` | Extract text with their entities. Needed for NER tasks. | `bool` | `False` |
| `--tokens` | Mapping between starting tokens and end tokens. Needed for NER tasks. | `Path` | |
| `--use-existing-split` | Use the specified folder IDs for the dataset split. | `bool` | |
| `--train-folder` | ID of the training folder to import from Arkindex. | `uuid` | |
| `--val-folder` | ID of the validation folder to import from Arkindex. | `uuid` | |
| `--test-folder` | ID of the training folder to import from Arkindex. | `uuid` | |
| `--transcription-worker-version` | Filter transcriptions by worker_version. Use ‘manual’ for manual filtering. | `str/uuid` | |
| `--entity-worker-version` | Filter transcriptions entities by worker_version. Use ‘manual’ for manual filtering | `str/uuid` | |
| `--train-prob` | Training set split size | `float` | `0.7` |
| `--val-prob` | Validation set split size | `float` | `0.15` |
The `--tokens` argument expects a file with the following format.
```yaml
Loading