Skip to content
Snippets Groups Projects
Commit 0c2612fd authored by Manon Blanco's avatar Manon Blanco Committed by Mélodie Boillet
Browse files

Remove add_eot and add_sot parameters from training configuration

parent c8bd6c91
No related branches found
No related tags found
1 merge request!161Remove add_eot and add_sot parameters from training configuration
......@@ -174,11 +174,9 @@ class OCRDataset(GenericDataset):
sample["label"] = full_label
sample["token_label"] = token_to_ind(self.charset, full_label)
if "add_eot" in self.params["config"]["constraints"]:
sample["token_label"].append(self.tokens["end"])
sample["token_label"].append(self.tokens["end"])
sample["label_len"] = len(sample["token_label"])
if "add_sot" in self.params["config"]["constraints"]:
sample["token_label"].insert(0, self.tokens["start"])
sample["token_label"].insert(0, self.tokens["start"])
return sample
......
......@@ -109,10 +109,7 @@ def get_config():
"height_divisor": 32, # Image height will be divided by 32
"padding_value": 0, # Image padding value
"padding_token": None, # Label padding value
"constraints": [
"add_eot",
"add_sot",
], # add end-of-transcription and start-of-transcription tokens in labels
"constraints": [],
"preprocessings": [
{
"type": "to_RGB",
......
......@@ -8,7 +8,7 @@ All hyperparameters are specified and editable in the training scripts (meaning
| `dataset_name` | Name of the dataset. | `str` | |
| `dataset_level` | Level of the dataset. Should be named after the element type. | `str` | |
| `dataset_variant` | Variant of the dataset. Usually empty for HTR datasets, `"_sem"` for HTR+NER datasets. | `str` | |
| `dataset_path` | Path to the dataset. | `str` |
| `dataset_path` | Path to the dataset. | `str` | |
| `dataset_params.config.dataset_manager` | Dataset manager class. | custom class | `OCRDatasetManager` |
| `dataset_params.config.dataset_class` | Dataset class. | custom class | `OCRDataset` |
| `dataset_params.config.datasets` | Dataset dictionary with the dataset name as key and dataset path as value. | `dict` | |
......@@ -18,7 +18,7 @@ All hyperparameters are specified and editable in the training scripts (meaning
| `dataset_params.config.width_divisor` | Factor to reduce the height of the feature vector before feeding the decoder. | `int` | `32` |
| `dataset_params.config.padding_value` | Image padding value. | `int` | `0` |
| `dataset_params.config.padding_token` | Transcription padding value. | `int` | `None` |
| `dataset_params.config.constraints` | Whether to add end-of-transcription and start-of-transcription tokens in labels. | `list` | `["add_eot", "add_sot"]` |
| `dataset_params.config.constraints` | Whether to add end-of-transcription and start-of-transcription tokens in labels. | `list` | `[]` |
| `dataset_params.config.preprocessings` | List of pre-processing functions to apply to input images. | `list` | (see [dedicated section](#data-preprocessing)) |
| `dataset_params.config.augmentation` | Configuration for data augmentation. | `dict` | (see [dedicated section](#data-augmentation)) |
......
......@@ -72,10 +72,7 @@ def training_config():
"height_divisor": 32, # Image height will be divided by 32
"padding_value": 0, # Image padding value
"padding_token": None, # Label padding value
"constraints": [
"add_eot",
"add_sot",
], # add end-of-transcription and start-of-transcription tokens in labels
"constraints": [],
"preprocessings": [
{
"type": "to_RGB",
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment