# YAML configuration This page is a reference for version 2 of the YAML configuration file for Git repositories handled by Arkindex. Version 1 is not supported. The configuration file is always named `.arkindex.yml` and should be found at the root of the repository. ## Required attributes The following attributes are required in every `.arkindex.yml` file: `version` : Version of the configuration file in use. An error will occur if the version number is not set to `2`. ### Example configuration ```yaml --- version: 2 workers: - workers/config.yml ``` This would match `workers/config.yml` starting at the root of the repository. ## Worker repository attributes The `workers` attribute is a list of the following: - Paths to a YAML file holding the configuration for a single worker - Unix-style patterns matching paths to YAML files holding the configuration for a single worker - The configuration of a single worker embedded directly into the file ### Single worker configuration The following describes the attributes of a YAML file configuring one worker, or of the configuration embedded directly in the `.arkindex.yml` file. All attributes are optional unless explicitly specified. `name` : Mandatory. Name of the worker, for display purposes. `slug` : Mandatory. Slug of this worker. The slug must be unique across the repository and must only hold alphanumerical characters, underscores or hyphens. `type` : Mandatory. Type of the worker, for display purposes only. Some common values include: - `classifier` - `recognizer` - `ner` - `dla` - `word-segmenter` - `paragraph-creator` `gpu_usage` : Whether or not this worker requires or supports GPUs. Defaults to `disabled`. May take one of the following values: `required` : This worker requires a GPU, and will only be run on Ponos agents whose hosts have a GPU. `supported` : This worker supports using a GPU, but may run on any available host, including those without GPUs. `disabled` : This worker does not support GPUs. It may run on a host that has a GPU, but it will ignore it. `model_usage` : Whether or not this worker requires a model version to run. Defaults to `disabled`. May take one of the following values: `required` : This worker requires a model version, and will only be run on processes with a model. `supported` : This worker supports a model version, but may run on any processes, including those without model. `disabled` : This worker does not support model version. It may run on a process that has a model, but it will ignore it. `docker` : Regroups Docker-related configuration attributes: - `build` : Path towards a Dockerfile used to build this worker, relative to the root of the repository. Defaults to `Dockerfile`. - `command` : Custom command line to be used when launching the Docker container for this Worker. By default, the command specified in the Dockerfile will be used. - `shm_size`: Size of the available shared memory in `/dev/shm`. The default value is `64M`, but when training machine learning models an increase might be necessary. The given value must be either an integer, or an integer followed by a unit (`b` for bytes, `k` for kilobytes, `m` for megabytes and `g` for gigabytes). If no unit is specified, the default unit is `bytes`. See the [Docker documentation](https://docs.docker.com/engine/reference/run/#runtime-constraints-on-resources). - `environment` : Mapping of string keys and string values to define environment variables to be set when the Docker image runs. `configuration` : Mapping holding any string keys and values that can be later accessed in the worker's Python code. Can be used to define settings on your own worker, such as a file's location. `user_configuration` : Mapping defining settings on your worker that can be modified by users. [See below](#setting-up-user-configurable-parameters) for details. `secrets` : List of required secret names for that specific worker. For more information, learn how to use secrets in workers on the official Arkindex [documentation](https://doc.arkindex.org/secrets). ### Setting up user-configurable parameters The YAML file can define parameters that users will be able to change when they use this worker in a process on Arkindex. These parameters are listed in a `user_configuration` attribute. A parameter is defined using the following settings: `title` : Mandatory. The parameter's title. `type` : Mandatory. A value type. The supported types are: - `int` - `bool` - `float` - `string` - `enum` - `list` - `dict` - `model` `default` : Optional. A default value for the parameter. Must be of the defined parameter `type`. `required` : Optional. A boolean, defaults to `false`. `choices` : Optional. A list of options for `enum` type parameters. `subtype` : Optional. The type of the elements of `list` type parameters. This definition allows for both validation of the input and the display of a form to make configuring workers easy for Arkindex users.  #### String parameters String-type parameters must be defined using a `title` and the `string` `type`. You can also set a `default` value for this parameter, which must be a string, as well as make it a `required` parameter, which prevents users from leaving it blank. For example, a string-type parameter can be defined like this: ```yaml subfolder_name: title: Created Subfolder Name type: string default: My Neat Subfolder ``` Which will result in the following display for the user:  #### Integer parameters Integer-type parameters must be defined using a `title` and the `int` `type`. You can also set a `default` value for this parameter, which must be an integer, as well as make it a `required` parameter, which prevents users from leaving it blank. For example, an integer-type parameter can be defined like this: ```yaml input_size: title: Input Size type: int default: 768 required: True ``` Which will result in the following display for the user:  #### Float parameters Float-type parameters must be defined using a `title` and the `float` `type`. You can also set a `default` value for this parameter, which must be a float, as well as make it a `required` parameter, which prevents users from leaving it blank. For example, a float-type parameter can be defined like this: ```yaml wip: title: Word Insertion Penalty type: float required: True ``` Which will result in the following display for the user:  #### Boolean parameters Boolean-type parameters must be defined using a `title` and the `bool` `type`. You can also set a `default` value for this parameter, which must be a boolean, as well as make it a `required` parameter, which prevents users from leaving it blank. In the configuration form, boolean parameters are displayed as toggles. For example, a boolean-type parameter can be defined like this: ```yaml score: title: Run Worker in Evaluation Mode type: bool default: False ``` Which will result in the following display for the user:  #### Enum (choices) parameters Enum-type parameters must be defined using a `title`, the `enum` `type` and at least two `choices`. You cannot define an enum-type parameter without `choices`. You can also set a `default` value for this parameter, which must be one of the available `choices`, as well as make it a `required` parameter, which prevents users from leaving it blank. Enum-type parameters should be used when you want to limit the users to a given set of options. In the configuration form, enum parameters are displayed as selects. For example, an enum-type parameter can be defined like this: ```yaml parent_type: title: Target Parent Element Type type: enum default: paragraph choices: - paragraph - text_zone - page ``` Which will result in the following display for the user:  #### List parameters List-type parameters must be defined using a `title`, the `list` `type` and a `subtype` for the elements inside the list. You can also set a `default` value for this parameter, which must be a list containing elements of the given `subtype`, as well as make it a `required` parameter, which prevents users from leaving it blank. The allowed `subtype`s are `int`, `float` and `string`. In the configuration form, list parameters are displayed as rows of input fields. For example, a list-type parameter can be defined like this: ```yaml a_list: title: A List of Values type: list subtype: int default: [4, 3, 12] ``` Which will result in the following display for the user:  #### Dictionary parameters Dictionary-type parameters must be defined using a `title` and the `dict` `type`. You can also set a `default` value for this parameter, which must be a dictionary, as well as make it a `required` parameter, which prevents users from leaving it blank. You can use dictionary parameters for example to specify a correspondence between the classes that are predicted by a worker and the elements that are created on Arkindex from these predictions. Dictionary-type parameters only accept strings as values. In the configuration form, dictionary parameters are displayed as a table with one column for keys and one column for values. For example, a dictionary-type parameter can be defined like this: ```yaml classes: title: Output Classes to Elements Correspondence type: dict default: a: page b: text_line ``` Which will result in the following display for the user:  #### Model parameters Model-type parameters must be defined using a `title` and the `model` type. You can also set a `default` value for this parameter, which must be the UUID of an existing Model, and make it a `required` parameter, which prevents users from leaving it blank. You can use a model parameter to specify to which Model the Model Version that is created by a Training process will be attached. Model-type parameters only accept Model UUIDs as values. In the configuration form, model parameters are displayed as an input field. Users can select a model from a list of available Models: what they type into the input field filters that list, allowing them to search for a model using its name or UUID. For example, a model-type parameter can be defined like this: ```yaml model_param: title: Training Model type: model ``` Which will result in the following display for the user:  #### Example user_configuration ```yaml user_configuration: vertical_padding: type: int default: 0 title: Vertical Padding element_base_name: type: string required: true title: Element Base Name create_confidence_metadata: type: bool default: false title: Create confidence metadata on elements some_other_parameter: type: enum required: true default: 23 choices: - 12 - 23 - 56 title: Another Parameter a_model_parameter: type: model title: Model to train ``` #### Fallback to free JSON input If you have defined user-configurable parameters using these specifications, Arkindex users can choose between using the form or the free JSON input field by toggling the **JSON** toggle. If there are unsupported parameter types in the defined `user_configuration`, the frontend will automatically fall back to the free JSON input field. The same is true if you have not defined user-configurable parameters using these specifications. ### Example configuration ```yaml --- version: 2 workers: # Path to a single YAML file - path/to/worker.yml # Pattern matching any YAML file in the configuration folder # or in its sub-directories - configuration/**/*.yml # Configuration embedded directly into this file - name: Book of hours slug: book_of_hours type: classifier docker: build: project/Dockerfile image: hub.docker.com/project/image:tag command: python mysuperscript.py --blabla shm_size: 128m environment: TOKEN: deadBeefToken configuration: model: path/to/model anyKey: anyValue classes: [X, Y, Z] user_configuration: vertical_padding: type: int default: 0 title: Vertical Padding secrets: - path/to/secret.json ```