Skip to content
Snippets Groups Projects

Filter entities by name when extracting data from Arkindex

Merged Manon Blanco requested to merge allow-unknown-entities into main
1 file
+ 3
3
Compare changes
  • Side-by-side
  • Inline
@@ -55,7 +55,7 @@ CLASSEMENT:
### HTR and NER data from one source
To extract HTR+NER data from **pages** from a folder, you have to define a end token for each entity and use the following command:
To extract HTR+NER data from **pages** from a folder, you have to define an end token for each entity and use the following command:
```shell
teklia-dan dataset extract \
@@ -71,7 +71,7 @@ with `tokens.yml` compliant with the format described before.
### HTR and NER data from multiple source
To do the same but only use the data from three folders, you have to define a end token for each entity and the commands becomes:
To do the same but only use the data from three folders, you have to define an end token for each entity and the commands becomes:
```shell
teklia-dan dataset extract \
@@ -102,7 +102,7 @@ teklia-dan dataset extract \
### HTR from multiple element types with some parent filtering
To extract HTR data from **annotations** and **text_zones** from a folder, but only keep those that are children of **single_pages**, you have to define a end token for each entity and use the following command:
To extract HTR data from **annotations** and **text_zones** from a folder, but only keep those that are children of **single_pages**, you have to define an end token for each entity and use the following command:
```shell
teklia-dan dataset extract \
Loading