Generic Pylaia worker
While the training worker is being implemented in #8 (closed) , we can already imagine the Pylaia generic worker.
This worker's user configuration parameters will at least include:
- use_language_model, bool, defaults to False, whether a language model is required (will fail if looking for one and not finding)
- from_right_to_left, bool, defaults to False, use Right-to-left orientation
- extraction_mode, enum, (see current param)
- batch_size, int, defaults to 2
- line_element_type, str
- line_worker_version_id, str
- scale_x, float
- scale_y_top, float
- scale_y_bottom, float
- color_mode (old image_convert), enum
- lm_weight, float
The current models have to be ported in the right format via the CLI, an archive containing
- model
- syms.txt
- weights.ckpt
- language_model.arpa.gz (optional)
- lexicon.txt (optional)
- tokens.txt (optional)
We need to do something similar to U-FCN generic worker:
- load the model version configuration using
self.model_configuration
- load the model and optional language model in the right folder
- if "model" is in config, keep the current behavior
- else, find the path to the model using
self.find_model_directory()
and retrieve what's available there (language model ifself.config[use_language_model] is True
)
Just like U-FCN generic worker, we need a new CI job that publishes the model on https://arkindex.teklia.com. There is no need to rename the folder though in this case but we need the --use-parent-folder
option to publish the whole folder instead of only the model
binary file.
Edited by Yoann Schneider