# YAML configuration

This page is a reference for version 2 of the YAML configuration file for
Git repositories handled by Arkindex. Version 1 is not supported.

The configuration file is always named `.arkindex.yml` and should be found at
the root of the repository.

## Required attributes

The following attributes are required in every `.arkindex.yml` file:

`version`
: Version of the configuration file in use. An error will occur if the version
  number is not set to `2`.

### Example configuration

```yaml
---
version: 2

workers:
  - workers/config.yml
```

This would match `workers/config.yml` starting at the root of
the repository.

## Worker repository attributes

The `workers` attribute is a list of the following:

- Paths to a YAML file holding the configuration for a single worker
- Unix-style patterns matching paths to YAML files holding the configuration
  for a single worker
- The configuration of a single worker embedded directly into the file

### Single worker configuration

The following describes the attributes of a YAML file configuring one worker, or
of the configuration embedded directly in the `.arkindex.yml` file.

All attributes are optional unless explicitly specified.

`name`
: Mandatory. Name of the worker, for display purposes.

`slug`
: Mandatory. Slug of this worker. The slug must be unique across the repository and must only hold alphanumerical characters, underscores or hyphens.

`type`
: Mandatory. Type of the worker, for display purposes only. Some common values
include:

    - `classifier`
    - `recognizer`
    - `ner`
    - `dla`
    - `word-segmenter`
    - `paragraph-creator`

`gpu_usage`
: Whether or not this worker requires or supports GPUs. Defaults to `disabled`. May take one of the following values:

    `required`
    : This worker requires a GPU, and will only be run on Ponos agents whose hosts have a GPU.

    `supported`
    : This worker supports using a GPU, but may run on any available host, including those without GPUs.

    `disabled`
    : This worker does not support GPUs. It may run on a host that has a GPU, but it will ignore it.

`model_usage`
: Whether or not this worker requires a model version to run. Defaults to `disabled`. May take one of the following values:

    `required`
    : This worker requires a model version, and will only be run on processes with a model.

    `supported`
    : This worker supports a model version, but may run on any processes, including those without model.

    `disabled`
    : This worker does not support model version. It may run on a process that has a model, but it will ignore it.

`docker`
: Regroups Docker-related configuration attributes:
    - `build`
    : Path towards a Dockerfile used to build this worker, relative to the root of
    the repository. Defaults to `Dockerfile`.
    - `command`
    : Custom command line to be used when launching the Docker container for
    this Worker. By default, the command specified in the Dockerfile will be used.
    - `shm_size`: Size of the available shared memory in `/dev/shm`. The default value is `64M`, but when training machine learning models an increase might be necessary. The given value must be either an integer, or an integer followed by a unit (`b` for bytes, `k` for kilobytes, `m` for megabytes and `g` for gigabytes). If no unit is specified, the default unit is `bytes`. See the [Docker documentation](https://docs.docker.com/engine/reference/run/#runtime-constraints-on-resources).
    - `environment`
    : Mapping of string keys and string values to define environment variables to be
set when the Docker image runs.

`configuration`
: Mapping holding any string keys and values that can be later accessed in the
worker's Python code. Can be used to define settings on your own worker, such as
a file's location.

`user_configuration`
: Mapping defining settings on your worker that can be modified by users. [See below](#setting-up-user-configurable-parameters) for details.

`secrets`
: List of required secret names for that specific worker. For more information, learn how to use secrets in workers on the official Arkindex [documentation](https://doc.arkindex.org/secrets).

### Setting up user-configurable parameters

The YAML file can define parameters that users will be able to change when they use this worker in a process on Arkindex. These parameters are listed in a `user_configuration` attribute.

A parameter is defined using the following settings:

`title`
: Mandatory. The parameter's title.

`type`
: Mandatory. A value type. The supported types are:

    - `int`
    - `bool`
    - `float`
    - `string`
    - `enum`
    - `list`
    - `dict`
    - `model`

`default`
: Optional. A default value for the parameter. Must be of the defined parameter `type`.

`required`
: Optional. A boolean, defaults to `false`.

`choices`
: Optional. A list of options for `enum` type parameters.

`subtype`
: Optional. The type of the elements of `list` type parameters.

This definition allows for both validation of the input and the display of a form to make configuring workers easy for Arkindex users.

![User configuration](user_configuration/configuration_form.png "User configuration form on Arkindex")

#### String parameters

String-type parameters must be defined using a `title` and the `string` `type`. You can also set a `default` value for this parameter, which must be a string, as well as make it a `required` parameter, which prevents users from leaving it blank.

For example, a string-type parameter can be defined like this:

```yaml
subfolder_name:
  title: Created Subfolder Name
  type: string
  default: My Neat Subfolder
```

Which will result in the following display for the user:

![String-type parameter](user_configuration/string_config.png "Example string-type parameter.")

#### Integer parameters

Integer-type parameters must be defined using a `title` and the `int` `type`. You can also set a `default` value for this parameter, which must be an integer, as well as make it a `required` parameter, which prevents users from leaving it blank.

For example, an integer-type parameter can be defined like this:

```yaml
input_size:
  title: Input Size
  type: int
  default: 768
  required: True
```

Which will result in the following display for the user:

![integer-type parameter](user_configuration/integer_config.png "Example integer-type parameter.")

#### Float parameters

Float-type parameters must be defined using a `title` and the `float` `type`. You can also set a `default` value for this parameter, which must be a float, as well as make it a `required` parameter, which prevents users from leaving it blank.

For example, a float-type parameter can be defined like this:

```yaml
wip:
  title: Word Insertion Penalty
  type: float
  required: True
```

Which will result in the following display for the user:

![Float-type parameter](user_configuration/float_config.png "Example float-type parameter.")

#### Boolean parameters

Boolean-type parameters must be defined using a `title` and the `bool` `type`. You can also set a `default` value for this parameter, which must be a boolean, as well as make it a `required` parameter, which prevents users from leaving it blank.

In the configuration form, boolean parameters are displayed as toggles.

For example, a boolean-type parameter can be defined like this:

```yaml
score:
  title: Run Worker in Evaluation Mode
  type: bool
  default: False
```

Which will result in the following display for the user:
![Boolean-type parameter](user_configuration/bool_config.png "Example boolean-type parameter.")

#### Enum (choices) parameters

Enum-type parameters must be defined using a `title`, the `enum` `type` and at least two `choices`. You cannot define an enum-type parameter without `choices`. You can also set a `default` value for this parameter, which must be one of the available `choices`, as well as make it a `required` parameter, which prevents users from leaving it blank. Enum-type parameters should be used when you want to limit the users to a given set of options.

In the configuration form, enum parameters are displayed as selects.

For example, an enum-type parameter can be defined like this:

```yaml
parent_type:
  title: Target Parent Element Type
  type: enum
  default: paragraph
  choices:
    - paragraph
    - text_zone
    - page
```

Which will result in the following display for the user:

![Enum-type parameter](user_configuration/enum_config.png "Example enum-type parameter.")

#### List parameters

List-type parameters must be defined using a `title`, the `list` `type` and a `subtype` for the elements inside the list. You can also set a `default` value for this parameter, which must be a list containing elements of the given `subtype`, as well as make it a `required` parameter, which prevents users from leaving it blank.

The allowed `subtype`s are `int`, `float` and `string`.

In the configuration form, list parameters are displayed as rows of input fields.

For example, a list-type parameter can be defined like this:

```yaml
a_list:
  title: A List of Values
  type: list
  subtype: int
  default: [4, 3, 12]
```

Which will result in the following display for the user:

![List-type parameter](user_configuration/list_config.png "Example list-type parameter.")

#### Dictionary parameters

Dictionary-type parameters must be defined using a `title` and the `dict` `type`. You can also set a `default` value for this parameter, which must be a dictionary, as well as make it a `required` parameter, which prevents users from leaving it blank. You can use dictionary parameters for example to specify a correspondence between the classes that are predicted by a worker and the elements that are created on Arkindex from these predictions.

Dictionary-type parameters only accept strings as values.

In the configuration form, dictionary parameters are displayed as a table with one column for keys and one column for values.

For example, a dictionary-type parameter can be defined like this:

```yaml
classes:
  title: Output Classes to Elements Correspondence
  type: dict
  default:
    a: page
    b: text_line
```

Which will result in the following display for the user:

![Dictionary-type parameter](user_configuration/dict_config.png "Example dictionary-type parameter.")

#### Model parameters

Model-type parameters must be defined using a `title` and the `model` type. You can also set a `default` value for this parameter, which must be the UUID of an existing Model, and make it a `required` parameter, which prevents users from leaving it blank. You can use a model parameter to specify to which Model the Model Version that is created by a Training process will be attached.

Model-type parameters only accept Model UUIDs as values.

In the configuration form, model parameters are displayed as an input field. Users can select a model from a list of available Models: what they type into the input field filters that list, allowing them to search for a model using its name or UUID.

For example, a model-type parameter can be defined like this:

```yaml
model_param:
  title: Training Model
  type: model
```

Which will result in the following display for the user:

![Model-type parameter](user_configuration/model_config.png "Example model-type parameter.")

#### Example user_configuration

```yaml
user_configuration:
  vertical_padding:
    type: int
    default: 0
    title: Vertical Padding
  element_base_name:
    type: string
    required: true
    title: Element Base Name
  create_confidence_metadata:
    type: bool
    default: false
    title: Create confidence metadata on elements
  some_other_parameter:
    type: enum
    required: true
    default: 23
    choices:
      - 12
      - 23
      - 56
    title: Another Parameter
  a_model_parameter:
    type: model
    title: Model to train
```

#### Fallback to free JSON input

If you have defined user-configurable parameters using these specifications, Arkindex users can choose between using the form or the free JSON input field by toggling the **JSON** toggle. If there are unsupported parameter types in the defined `user_configuration`, the frontend will automatically fall back to the free JSON input field. The same is true if you have not defined user-configurable parameters using these specifications.

### Example configuration

```yaml
---
version: 2

workers:
  # Path to a single YAML file
  - path/to/worker.yml
  # Pattern matching any YAML file in the configuration folder
  # or in its sub-directories
  - configuration/**/*.yml
  # Configuration embedded directly into this file
  - name: Book of hours
    slug: book_of_hours
    type: classifier
    docker:
      build: project/Dockerfile
      image: hub.docker.com/project/image:tag
      command: python mysuperscript.py --blabla
      shm_size: 128m
      environment:
        TOKEN: deadBeefToken
    configuration:
      model: path/to/model
      anyKey: anyValue
      classes: [X, Y, Z]
    user_configuration:
      vertical_padding:
        type: int
        default: 0
        title: Vertical Padding
    secrets:
      - path/to/secret.json
```