Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • workers/base-worker
1 result
Show changes
Showing
with 903 additions and 108 deletions
black==22.8.0
doc8==0.11.1
mkdocs==1.3.1
mkdocstrings==0.19.0
mkdocstrings-python==0.7.1
recommonmark==0.7.1
Sphinx==5.1.1
sphinx-rtd-theme==1.0.0
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
docs/assets/favicon.png

2.54 KiB

docs/assets/logo.png

5.55 KiB

# -*- coding: utf-8 -*-
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
import os
import sys
from recommonmark.transform import AutoStructify
sys.path.insert(0, os.path.abspath(".."))
# -- Project information -----------------------------------------------------
project = "Arkindex Base Worker"
copyright = "2022, Teklia"
author = "Teklia"
# The full version, including alpha/beta/rc tags
with open("../VERSION") as f:
release = f.read().strip()
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.coverage",
"sphinx.ext.viewcode",
"recommonmark",
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "README.md"]
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
source_suffix = {
".rst": "restructuredtext",
".md": "markdown",
}
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
# -- Extension configuration -------------------------------------------------
autodoc_default_options = {
"members": True,
"undoc-members": True,
"member-order": "bysource",
}
def setup(app):
app.add_config_value(
"recommonmark_config", {"auto_toc_tree_section": "Contents"}, True
)
app.add_transform(AutoStructify)
# GitLab CI for workers
This page describes how continuous integration (CI) is used in workers created
using the `base-worker` template.
For more information on creating workers, see
[Setting up a worker](../create).
## Default template
When creating a worker with our official template, a `.gitlab-ci.yml` file has
been included with a few actions that will run on every push you make.
The CI jobs will run in the following order:
<img style="display:block;float:none;margin-left:auto;margin-right:auto;" src="./pipeline.svg" alt="CI pipeline execution order">
## Git Flow
At Teklia, we use a simple version of [Git Flow][gitflow]:
- The `default` branch should always have validated code and should be deployable
in production at any time.
- Developments should happen in branches, with merge requests to enable code
review and Gitlab CI pipelines.
- Project maintainers should use Git tags to create official releases, by
updating the `VERSION` file and using the same version string as the tag name.
This process is reflected the template's `.gitlab-ci.yml` file.
## Linting
The `lint` job uses [pre-commit] to run source code linters on your
project and validate various rules:
- Checking your Python code is PEP8 compliant
- Auto-formatting your Python code using [black]
- Sort your Python imports
- Check you don't have any trailing white space
- Check your YAML files are well formatted
- Fix some common spelling errors
You can set up pre-commit to run locally too; see
[Activating the pre-commit hook](../create#activating-the-pre-commit-hook).
## Testing
The `test` job uses [tox] and [pytest] modules to run written unit
tests for your repository and avoid any kind of code regression.
Any unit test you have added to your project will be executed on each git push,
allowing you to check the validity of your code before merging it.
Unit tests allow you to prevent regressions in your code when making changes,
and find bugs before they make their way into production.
<!-- TODO:
For more information, see [Writing unit tests for your worker](../tests).
-->
## Building
When the `test` & `lint` jobs run successfully, the `docker` job runs. It will
try to build a docker image from your `Dockerfile`. This will check that your
`Dockerfile` is valid and builds an image successfully.
This build step is only used as a check, as Arkindex builds Docker images on
its own.
## Generating release notes
When the `docker` job is successful and the CI pipeline is running for a Git
tag, the `release-notes` job runs. It will list all the commits since the
previous tag and aggregate them to publish release notes on the GitLab project.
We provide an [open source docker image](https://gitlab.com/teklia/devops/) to build these release notes,
but you'll need to provide your own Gitlab access token so that the task can
publish release notes on your own repository.
You can generate an access token on the Gitlab's page [User Settings > Access Tokens](https://gitlab.com/-/profile/personal_access_tokens), with `api` scope.
The token must then be set as a CI Variable on your Gitlab project:
1. go to your project settings,
1. go to section **CI / CD**
1. click on `Expand` in the **Variables** section
1. add a new variable named `DEVOPS_GITLAB_TOKEN` whose value is your token
[black]: https://github.com/psf/black
[gitflow]: https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow
[pre-commit]: https://pre-commit.com/
[pytest]: https://docs.pytest.org/
[tox]: https://tox.readthedocs.io/
<svg id="mermaid-1611246541133" width="100%" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" height="252" style="max-width: 152.25001525878906px;" viewBox="0 0 152.25001525878906 252"><style>#mermaid-1611246541133{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#000000;}#mermaid-1611246541133 .error-icon{fill:#552222;}#mermaid-1611246541133 .error-text{fill:#552222;stroke:#552222;}#mermaid-1611246541133 .edge-thickness-normal{stroke-width:2px;}#mermaid-1611246541133 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-1611246541133 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-1611246541133 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-1611246541133 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-1611246541133 .marker{fill:#666;}#mermaid-1611246541133 .marker.cross{stroke:#666;}#mermaid-1611246541133 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-1611246541133 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#000000;}#mermaid-1611246541133 .label text{fill:#000000;}#mermaid-1611246541133 .node rect,#mermaid-1611246541133 .node circle,#mermaid-1611246541133 .node ellipse,#mermaid-1611246541133 .node polygon,#mermaid-1611246541133 .node path{fill:#eee;stroke:#999;stroke-width:1px;}#mermaid-1611246541133 .node .label{text-align:center;}#mermaid-1611246541133 .node.clickable{cursor:pointer;}#mermaid-1611246541133 .arrowheadPath{fill:#333333;}#mermaid-1611246541133 .edgePath .path{stroke:#666;stroke-width:1.5px;}#mermaid-1611246541133 .flowchart-link{stroke:#666;fill:none;}#mermaid-1611246541133 .edgeLabel{background-color:white;text-align:center;}#mermaid-1611246541133 .edgeLabel rect{opacity:0.5;background-color:white;fill:white;}#mermaid-1611246541133 .cluster rect{fill:hsl(210,66.6666666667%,95%);stroke:#26a;stroke-width:1px;}#mermaid-1611246541133 .cluster text{fill:#333;}#mermaid-1611246541133 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(-160,0%,93.3333333333%);border:1px solid #26a;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-1611246541133:root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}#mermaid-1611246541133 flowchart{fill:apa;}</style><g><g class="output"><g class="clusters"></g><g class="edgePaths"><g class="edgePath LS-test LE-docker" style="opacity: 1;" id="L-test-docker"><path class="path" d="M30.900001525878906,47L30.900001525878906,72L57.058711534135796,97" marker-end="url(#arrowhead283)" style="fill:none"></path><defs><marker id="arrowhead283" viewBox="0 0 10 10" refX="9" refY="5" markerUnits="strokeWidth" markerWidth="8" markerHeight="6" orient="auto"><path d="M 0 0 L 10 5 L 0 10 z" class="arrowheadPath" style="stroke-width: 1px; stroke-dasharray: 1px, 0px;"></path></marker></defs></g><g class="edgePath LS-lint LE-docker" style="opacity: 1;" id="L-lint-docker"><path class="path" d="M124.02500915527344,47L124.02500915527344,72L97.86629914701655,97" marker-end="url(#arrowhead284)" style="fill:none"></path><defs><marker id="arrowhead284" viewBox="0 0 10 10" refX="9" refY="5" markerUnits="strokeWidth" markerWidth="8" markerHeight="6" orient="auto"><path d="M 0 0 L 10 5 L 0 10 z" class="arrowheadPath" style="stroke-width: 1px; stroke-dasharray: 1px, 0px;"></path></marker></defs></g><g class="edgePath LS-docker LE-release-notes" style="opacity: 1;" id="L-docker-release-notes"><path class="path" d="M77.46250534057617,136L77.46250534057617,170.5L77.46250534057617,205" marker-end="url(#arrowhead285)" style="fill:none"></path><defs><marker id="arrowhead285" viewBox="0 0 10 10" refX="9" refY="5" markerUnits="strokeWidth" markerWidth="8" markerHeight="6" orient="auto"><path d="M 0 0 L 10 5 L 0 10 z" class="arrowheadPath" style="stroke-width: 1px; stroke-dasharray: 1px, 0px;"></path></marker></defs></g></g><g class="edgeLabels"><g class="edgeLabel" style="opacity: 1;" transform=""><g transform="translate(0,0)" class="label"><rect rx="0" ry="0" width="0" height="0"></rect><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" style="display: inline-block; white-space: nowrap;"><span id="L-L-test-docker" class="edgeLabel L-LS-test' L-LE-docker"></span></div></foreignObject></g></g><g class="edgeLabel" style="opacity: 1;" transform=""><g transform="translate(0,0)" class="label"><rect rx="0" ry="0" width="0" height="0"></rect><foreignObject width="0" height="0"><div xmlns="http://www.w3.org/1999/xhtml" style="display: inline-block; white-space: nowrap;"><span id="L-L-lint-docker" class="edgeLabel L-LS-lint' L-LE-docker"></span></div></foreignObject></g></g><g class="edgeLabel" style="opacity: 1;" transform="translate(77.46250534057617,170.5)"><g transform="translate(-22.25,-9.5)" class="label"><rect rx="0" ry="0" width="44.5" height="19"></rect><foreignObject width="44.5" height="19"><div xmlns="http://www.w3.org/1999/xhtml" style="display: inline-block; white-space: nowrap;"><span id="L-L-docker-release-notes" class="edgeLabel L-LS-docker' L-LE-release-notes">on tag</span></div></foreignObject></g></g></g><g class="nodes"><g class="node default" style="opacity: 1;" id="flowchart-test-254" transform="translate(30.900001525878906,27.5)"><rect rx="0" ry="0" x="-22.900001525878906" y="-19.5" width="45.80000305175781" height="39" class="label-container"></rect><g class="label" transform="translate(0,0)"><g transform="translate(-12.900001525878906,-9.5)"><foreignObject width="25.800003051757812" height="19"><div xmlns="http://www.w3.org/1999/xhtml" style="display: inline-block; white-space: nowrap;">test</div></foreignObject></g></g></g><g class="node default" style="opacity: 1;" id="flowchart-docker-255" transform="translate(77.46250534057617,116.5)"><rect rx="0" ry="0" x="-34.01667022705078" y="-19.5" width="68.03334045410156" height="39" class="label-container"></rect><g class="label" transform="translate(0,0)"><g transform="translate(-24.01667022705078,-9.5)"><foreignObject width="48.03334045410156" height="19"><div xmlns="http://www.w3.org/1999/xhtml" style="display: inline-block; white-space: nowrap;">docker</div></foreignObject></g></g></g><g class="node default" style="opacity: 1;" id="flowchart-lint-256" transform="translate(124.02500915527344,27.5)"><rect rx="0" ry="0" x="-20.225006103515625" y="-19.5" width="40.45001220703125" height="39" class="label-container"></rect><g class="label" transform="translate(0,0)"><g transform="translate(-10.225006103515625,-9.5)"><foreignObject width="20.45001220703125" height="19"><div xmlns="http://www.w3.org/1999/xhtml" style="display: inline-block; white-space: nowrap;">lint</div></foreignObject></g></g></g><g class="node default" style="opacity: 1;" id="flowchart-release-notes-259" transform="translate(77.46250534057617,224.5)"><rect rx="0" ry="0" x="-58.48333740234375" y="-19.5" width="116.9666748046875" height="39" class="label-container"></rect><g class="label" transform="translate(0,0)"><g transform="translate(-48.48333740234375,-9.5)"><foreignObject width="96.9666748046875" height="19"><div xmlns="http://www.w3.org/1999/xhtml" style="display: inline-block; white-space: nowrap;">release-notes</div></foreignObject></g></g></g></g></g></g></svg>
\ No newline at end of file
# Setting up a new worker
This page will guide you through creating a new Arkindex worker locally and
preparing a development environment.
This guide assumes you are using Ubuntu 20.04 or later and have root access.
## Preparing your environment
This section will guide you through preparing your system to create a new
Arkindex worker from our [official template][base-worker].
### Installing system dependencies
To retrieve the Arkindex worker template, you will need to have both Git and
SSH. Git is a version control system that you will later use to manage multiple
versions of your worker. SSH allows secure connections to remote machines, and
will be used in our case to retrieve the template from a Git server.
#### To install system dependencies
1. Run the following command:
```
sudo apt install git ssh
```
### Checking your version of Python
Our Arkindex worker template requires Python 3.6 or later. Checking if a
compatible version of Python is installed avoids further issues in the setup
process.
#### To check your version of Python
1. Run the following command: `python3 --version`
This command will have an output similar to the following:
```
Python 3.6.9
```
### Installing Python
If you were unable to check your Python version as stated above because
`python3` was not found, you will need to install Python 3 on your system.
#### To install Python on Ubuntu
1. Run the following command:
```
sudo apt install python3 python3-pip python3-virtualenv
```
1. Check your Python version again, as instructed in the previous section.
### Installing Python dependencies
To bootstrap a new Arkindex worker, some Python dependencies will be required:
- [pre-commit] will be used to automatically check the
syntax of your source code.
- [tox] will be used to run unit tests.
<!--
TODO: Link to [unit tests](tests)
-->
- [cookiecutter] will be used to bootstrap the project.
- [virtualenvwrapper] will be used to manage Python virtual
environments.
#### To install Python dependencies
1. Run the following command:
```
pip3 install pre-commit tox cookiecutter virtualenvwrapper
```
1. Follow the
[official virtualenvwrapper setup instructions][virtualenvwrapper-setup]
until you are able to run `workon`.
`workon` should have an empty output, as no Python virtual environments have
been set up yet.
## Creating the project
This section will guide you through creating a new worker from our official
template and making it available on a GitLab instance.
### Creating a GitLab project
For a worker to be accessible from an Arkindex instance, it needs to be sent
to a repository on a GitLab project. A GitLab project will also allow you to
manage different versions of a worker and run
[automated checks](ci/index) on your code.
#### To create a GitLab project
1. Open the **New project** form [on GitLab.com](https://gitlab.com/projects/new)
or on another GitLab instance
1. Enter your worker name as the **Project name**
1. Define a **Project slug** related to your worker, e.g.:
- `tesseract` for a Tesseract worker
- `opencv-foo` for an OpenCV worker related to project Foo
1. Click on the **Create project** button
### Bootstrapping the project
This section guides you through using our [official template][base-worker]
to get a basic structure for your worker.
#### To bootstrap the project
1. Open a terminal and go to a folder in which you will want your worker to be.
1. Enter this command and fill in the required information:
```
cookiecutter git@gitlab.com:teklia/workers/base-worker.git
```
Cookiecutter will ask you for several options:
`slug`
: A slug for the worker. This should use lowercase alphanumeric characters or
underscores to meet the code formatting requirements that the template
automatically enforces via [black].
`name`
: A name for the worker, purely used for display purposes.
`description`
: A general description of the worker. This will be used to initialize the `README.md` of your repository as well as the `help` command output.
`worker_type`
: An arbitrary string purely used for display purposes.
For example:
- `recognizer`,
- `classifier`,
- `dla`,
- `entity-recognizer`, etc.
`author`
: A name for the worker's author. Usually your first and last name.
`email`
: Your e-mail address. This will be used to contact you if any administrative need arise
### Pushing to GitLab
This section guides you through pushing the newly created worker from your
system to the GitLab project's repository.
This section assumes you have Maintainer or Owner access to the GitLab project.
#### To push to GitLab
1. Enter the newly created directory, starting in `worker-` and ending with your
worker's slug.
1. Add your GitLab project as a Git remote:
```
git remote add origin git@my-gitlab-instance.com:path/to/worker.git
```
You will need to use your own instance's URL and the path to your own
project. For example, a project named `hello` in the `teklia` group
on `gitlab.com` will use the following command:
```
git remote add origin git@gitlab.com:teklia/hello.git
```
1. Push the new branch to GitLab:
```
git push --set-upstream origin master
```
If you want to push a different branch, you first need to create it. For example,
if you want to push to a new branch named `bootstrap`, you will use:
```
git checkout -b bootstrap
git push --set-upstream origin bootstrap
```
1. Open your GitLab project in a browser.
1. Click on the blue icon indicating that [CI](ci/index)
is running on your repository, and wait for it to turn green to confirm
everything worked.
## Setting up your development environment
This section guides you through setting up a Python development environment
specifically for your worker.
### Activating the pre-commit hook
The official template includes code syntax checks such as trailing whitespace,
as well as code linting using [black]. Those checks run on GitLab as soon
as you push new code, but it is possible to run those automatically when you
create new commits using the [pre-commit] hook.
#### To activate the pre-commit hook
1. Run `pre-commit install`.
### Setting up the Python virtual environment
To install Python dependencies that are specific to your worker, and prevent
other dependencies installed on your system from interfering, it is recommended
to use a virtual environment.
#### To set up a Python virtual environment
1. Run `mkvirtualenv my_worker`, where `my_worker` is any name of your choice.
1. Install your worker in editable mode: `pip install -e .`
[base-worker]: https://gitlab.com/teklia/workers/base-worker/
[black]: https://github.com/psf/black
[cookiecutter]: https://cookiecutter.readthedocs.io/
[pre-commit]: https://pre-commit.com/
[tox]: https://tox.readthedocs.io/
[virtualenvwrapper]: https://virtualenvwrapper.readthedocs.io
[virtualenvwrapper-setup]: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
# Workers
Arkindex has a powerful system to run asynchronous tasks. Those are based on
Docker images, and can do about anything (ML processing, but also to import
data into Arkindex, export from Arkindex to another system or file format...)
This section consists of the following guides:
## Contents
* [Setting up a new worker](create)
* [Running your worker locally](run-local)
* [Maintaining a worker](maintenance)
* [GitLab CI for workers](ci/index)
* [YAML configuration](yaml)
* [Template structure](template-structure)
# Maintaining a worker
This page guides you through common tasks applied while maintaining an Arkindex
worker.
## Updating the template
To get the changes we make on our [official template][base-worker] to apply to
your worker, you will need to re-apply the template to the worker and resolve
any conflicts that may arise.
### To update the template
1. Run the following command:
```
cookiecutter base-worker -f --config-file YOURFILE.yaml --no-input
```
Where `YOURFILE.yaml` is the path of the YAML file you previously created.
1. Answer `yes` when Cookiecutter requests confirmation to delete and
re-download the template.
1. Using the Git diff, resolve the conflicts yourself as Cookiecutter will be
overwriting existing files.
[base-worker]: https://gitlab.com/teklia/workers/base-worker/
# Running your worker locally
Once you have implemented a worker, you can run it on some Arkindex elements
on your own machine to test it.
## Retrieving credentials
For a worker to run properly, you will need two types of credentials:
- An API token that gives the worker access to the API
- A worker version ID that lets the worker send results to Arkindex and report
that those come from this particular worker version
### Retrieving a token
For the worker to run, you will need an Arkindex authentication token.
You can use your own account's token when testing on your own machine.
You can retrieve your personal API Token from your [profile page](https://doc.arkindex.org/users/auth/index.md#personal-token).
### Retrieving a worker version ID
A worker version ID will be required in order to publish results. If your worker
does not create any Arkindex element, classification, transcription, etc., you
may skip this step.
If this particular worker was already configured on this instance, you can use
its existing worker version ID; otherwise, you will need to ask an Arkindex
administrator to create a fake version ID.
#### To retrieve a worker version ID from an existing worker
1. Open a web browser and browse to the Arkindex instance.
2. In the top-right user menu, click on **My repositories**.
3. Click on your worker, listed in the **Workers** column.
4. Rewrite the URL in your browser's address bar, to look like
`https://<arkindex_url>/api/v1/workers/<worker_id>/versions/`
- Replace `process` by `api/v1`
- Add a slash character (`/`) at the end
In the JSON output from this API endpoint, the first value next to `"id"` is
the worker version ID.
#### To create a fake worker as an administrator
This action can only be done as an Arkindex administrator with shell access.
1. In the backend's Docker image, run:
```
arkindex fake_worker_version --name <NAME> --slug <SLUG> --url <URL>
```
Replace `<NAME>`, `<SLUG>` and `<URL>` with the name, slug and GitLab
repository URL, respectively.
A Git repository is created with a fake OAuth access token. A fake Git revision
is added to this repository, and a fake worker version from a fake worker is
linked to this revision. You should get the following output:
```
Created a worker version: 392bd299-bc8f-4ec6-aa3c-e6503ecc7730
```
> **Warning:** This feature should only be used when a normal worker cannot be created using the Git workflow.
## Setting credentials
In a shell you need to set 3 environment variables to transmit your credentials
and Arkindex instance information to the worker:
`ARKINDEX_API_URL`
: URL that points to the root of the Arkindex instance you are using.
`ARKINDEX_API_TOKEN`
: The API token you retrieved earlier, on your profile page.
`WORKER_VERSION_ID`
: The worker version ID you retrieved earlier. Can be omitted if the worker does
not create new data in Arkindex.
### To set credentials for your worker
1. In a shell, run:
```sh
export ARKINDEX_API_URL="https://arkindex.teklia.com"
export ARKINDEX_API_TOKEN="YOUR_TOKEN_HERE"
export WORKER_VERSION_ID="xxxxx"
```
> **Warning:** Do not add these instructions to a script such as `.bashrc`;
> this would mean storing credentials in plaintext and can lead to security
> breaches.
## Running your worker
With the credentials configured, you can now run your worker.
You will need a list of element IDs to run your worker on, which can be found
in the browser's address bar when browsing an element on Arkindex.
### To run your worker
1. Activate the Python environment: run `workon X` where `X` is the name of
your Python environment.
2. Run `worker-X`, where `X` is the slug of your worker, followed by
`--element=Y` where `Y` is the ID of an element. You can repeat `--element`
as many times as you need to process multiple elements.
# Template structure
When building a new worker from our [official template][base-worker], a file
structure gets created for you to ease the burden of setting up a Python
package, a Docker build, with the best development practices:
`.arkindex.yml`
: YAML configuration file that allows Arkindex to understand what it should do
with this repository.
To learn more about this file, see [YAML configuration](yaml.md).
`.cookiecutter.yaml`
: YAML file that stores the options you defined when creating a new worker.
This file can be reused to [fetch template updates][template-updates].
`.dockerignore`
: Lists which files to exclude from the Docker build context.
For more information, see the [Docker documentation][dockerignore].
`.flake8`
: Specifies configuration options for the Flake8 linter.
For more information, see the [Flake8 documentation][flake8].
`.gitignore`
: Lists which files to exclude from Git versioning.
For more information, see the [Git docs][gitignore].
`.gitlab-ci.yml`
: Configures the GitLab CI jobs and pipelines.
To learn more about the configuration we provide, see
[GitLab Continuous Integration for workers](ci/index).
`.isort.cfg`
: Configures the automatic Python import sorting rules.
For more information, see the [isort docs][isort].
`.pre-commit.config.yaml`
: Configures the [pre-commit hook](create#activating-the-pre-commit-hook).
`Dockerfile`
: Specifies how the Docker image will be built.
You can change the instructions in this file to update the image to the needs
of your worker, for example to install system dependencies.
`requirements.txt`
: Lists the Python dependencies your worker relies on. Those are automatically
installed by the default Dockerfile.
`tox.ini`
: Configures the Python unit test runner.
For more information, see the [tox docs][tox].
`setup.py`
: Configures the worker's Python package.
`VERSION`
: Official version number of your worker. Defaults to `0.1.0`.
`ci/build.sh`
: Script that gets run by [CI](ci/index) pipelines
to build the Docker image.
`tests/test_worker.py`
: An example unit test file.
<!--
TODO: For more information, see [Writing tests for your worker](tests).
-->
`worker_[slug]/__init__.py`
: Declares the folder as a Python package.
`worker_[slug]/worker.py`
: The core part of the worker. This is where you can write code that processes
Arkindex elements.
<!-- TODO:
For more information, see
[Implementing a Machine Learning worker](implement.md).
-->
[base-worker]: https://gitlab.com/teklia/workers/base-worker/
[dockerignore]: https://docs.docker.com/engine/reference/builder/#dockerignore-file
[flake8]: https://flake8.pycqa.org/en/latest/user/configuration.html
[gitignore]: https://git-scm.com/docs/gitignore
[isort]: https://pycqa.github.io/isort/docs/configuration/config_files/
[template-updates]: maintenance#updating-the-template
[tox]: https://tox.readthedocs.io/en/latest/config.html
docs/contents/workers/user_configuration/bool_config.png

28 KiB

docs/contents/workers/user_configuration/configuration_form.png

124 KiB

docs/contents/workers/user_configuration/dict_config.png

34.8 KiB

docs/contents/workers/user_configuration/enum_config.png

28.9 KiB

docs/contents/workers/user_configuration/float_config.png

29.1 KiB

docs/contents/workers/user_configuration/integer_config.png

26.4 KiB

docs/contents/workers/user_configuration/string_config.png

30 KiB

# YAML configuration
This page is a reference for version 2 of the YAML configuration file for
Git repositories handled by Arkindex. Version 1 is not supported.
The configuration file is always named `.arkindex.yml` and should be found at
the root of the repository.
## Required attributes
The following attributes are required in every `.arkindex.yml` file:
`version`
: Version of the configuration file in use. An error will occur if the version
number is not set to `2`.
`type`
: Type of the repository. Has to be set to `worker` for a repository holding Arkindex
workers.
### Example configuration
```yaml
---
version: 2
type: worker
workers:
- workers/config.yml
```
This would match `workers/config.yml` starting at the root of
the repository.
## Worker repository attributes
When the `type` is set to `worker`, the `workers` attribute is mandatory.
The `workers` attribute is a list of the following:
- Paths to a YAML file holding the configuration for a single worker
- Unix-style patterns matching paths to YAML files holding the configuration
for a single worker
- The configuration of a single worker embedded directly into the file
### Single worker configuration
The following describes the attributes of a YAML file configuring one worker, or
of the configuration embedded directly in the `.arkindex.yml` file.
All attributes are optional unless explicitly specified.
`name`
: Mandatory. Name of the worker, for display purposes.
`slug`
: Mandatory. Slug of this worker. The slug must be unique across the repository and must only hold alphanumerical characters, underscores or dashes.
`type`
: Mandatory. Type of the worker, for display purposes only. Some common values
include:
- `classifier`
- `recognizer`
- `ner`
- `dla`
- `word-segmenter`
- `paragraph-creator`
`docker`
: Regroups Docker-related configuration attributes:
<!--
TODO: Make the path relative to the YAML file itself, in the case of a
separate file for a single worker?
https://gitlab.com/teklia/arkindex/tasks/-/issues/95
-->
<!--
TODO: Implement this!
https://gitlab.com/teklia/arkindex/tasks/-/issues/93
`image`: Tag of an existing Docker image to use for this worker instead of building a
custom image from a Dockerfile.
-->
- `build`
: Path towards a Dockerfile used to build this worker, relative to the root of
the repository. Defaults to `Dockerfile`.
- `command`
: Custom command line to be used when launching the Docker container for
this Worker. By default, the command specified in the Dockerfile will be used.
- `environment`
: Mapping of string keys and string values to define environment variables to be
set when the Docker image runs.
`configuration`
: Mapping holding any string keys and values that can be later accessed in the
worker's Python code. Can be used to define settings on your own worker, such as
a file's location.
`user_configuration`
: Mapping defining settings on your worker that can be modified by users. [See below](#setting-up-user-configurable-parameters) for details.
`secrets`
: List of required secret names for that specific worker. For more information, learn how to use secrets in workers on the official Arkindex [documentation](https://doc.arkindex.org/secrets).
### Setting up user-configurable parameters
The YAML file can define parameters that users will be able to change when they use this worker in a process on Arkindex. These parameters are listed in a `user_configuration` attribute.
A parameter is defined using the following settings:
`title`
: Mandatory. The parameter's title.
`type`
: Mandatory. A value type. The supported types are:
- `int`
- `bool`
- `float`
- `string`
- `enum`
- `dict`
`default`
: Optional. A default value for the parameter. Must be of the defined parameter `type`.
`required`
: Optional. A boolean, defaults to `false`.
`choices`
: Optional. A list of options for `enum` type parameters.
This definition allows for both validation of the input and the display of a form to make configuring workers easy for Arkindex users.
![User configuration](user_configuration/configuration_form.png "User configuration form on Arkindex")
#### String parameters
String-type parameters must be defined using a `title` and the `string` `type`. You can also set a `default` value for this parameter, which must be a string, as well as make it a `required` parameter, which prevents users from leaving it blank.
For example, a string-type parameter can be defined like this:
```yaml
subfolder_name:
title: Created Subfolder Name
type: string
default: My Neat Subfolder
```
Which will result in the following display for the user:
![String-type parameter](user_configuration/string_config.png "Example string-type parameter.")
#### Integer parameters
Integer-type parameters must be defined using a `title` and the `int` `type`. You can also set a `default` value for this parameter, which must be an integer, as well as make it a `required` parameter, which prevents users from leaving it blank.
For example, an integer-type parameter can be defined like this:
```yaml
input_size:
title: Input Size
type: int
default: 768
required: True
```
Which will result in the following display for the user:
![integer-type parameter](user_configuration/integer_config.png "Example integer-type parameter.")
#### Float parameters
Float-type parameters must be defined using a `title` and the `float` `type`. You can also set a `default` value for this parameter, which must be a float, as well as make it a `required` parameter, which prevents users from leaving it blank.
For example, a float-type parameter can be defined like this:
```yaml
wip:
title: Word Insertion Penalty
type: float
required: True
```
Which will result in the following display for the user:
![Float-type parameter](user_configuration/float_config.png "Example float-type parameter.")
#### Boolean parameters
Boolean-type parameters must be defined using a `title` and the `bool` `type`. You can also set a `default` value for this parameter, which must be a boolean, as well as make it a `required` parameter, which prevents users from leaving it blank.
In the configuration form, boolean parameters are displayed as toggles.
For example, a boolean-type parameter can be defined like this:
```yaml
score:
title: Run Worker in Evaluation Mode
type: bool
default: False
```
Which will result in the following display for the user:
![Boolean-type parameter](user_configuration/bool_config.png "Example boolean-type parameter.")
#### Enum (choices) parameters
Enum-type parameters must be defined using a `title`, the `enum` `type` and at least two `choices`. You cannot define an enum-type parameter without `choices`. You can also set a `default` value for this parameter, which must be one of the available `choices`, as well as make it a `required` parameter, which prevents users from leaving it blank. Enum-type parameters should be used when you want to limit the users to a given set of options.
In the configuration form, enum parameters are displayed as selects.
For example, an enum-type parameter can be defined like this:
```yaml
parent_type:
title: Target Parent Element Type
type: enum
default: paragraph
choices:
- paragraph
- text_zone
- page
```
Which will result in the following display for the user:
![Enum-type parameter](user_configuration/enum_config.png "Example enum-type parameter.")
#### Dictionary parameters
Dictionary-type parameters must be defined using a `title`, the `dict` `type`. You can also set a `default` value for this parameter, which must be one a dictionary, as well as make it a `required` parameter, which prevents users from leaving it blank. You can use dictionary parameters for example to specify a correspondence between the classes that are predicted by a worker and the elements that are created on Arkindex from these predictions.
Dictionary-type parameters only accept strings as values.
In the configuration form, dictionary parameters are displayed as a table with one column for keys and one column for values.
For example, a dictionary-type parameter can be defined like this:
```yaml
classes:
title: Output Classes to Elements Correspondence
type: dict
default:
a: page
b: text_line
```
Which will result in the following display for the user:
![Dictionary-type parameter](user_configuration/dict_config.png "Example dictionary-type parameter.")
#### Example user_configuration
```yaml
user_configuration:
vertical_padding:
type: int
default: 0
title: Vertical Padding
element_base_name:
type: string
required: true
title: Element Base Name
create_confidence_metadata:
type: bool
default: false
title: Create confidence metadata on elements
some_other_parameter:
type: enum
required: true
default: 23
choices:
- 12
- 23
- 56
title: Another Parameter
```
#### Fallback to free JSON input
If you have defined user-configurable parameters using these specifications, Arkindex users can choose between using the form or the free JSON input field by toggling the **JSON** toggle. If there are unsupported parameter types in the defined `user_configuration`, the frontend will automatically fall back to the free JSON input field. The same is true if you have not defined user-configurable parameters using these specifications.
### Example configuration
```yaml
---
version: 2
type: worker
workers:
# Path to a single YAML file
- path/to/worker.yml
# Pattern matching any YAML file in the configuration folder
# or in its sub-directories
- configuration/**/*.yml
# Configuration embedded directly into this file
- name: Book of hours
slug: book_of_hours
type: classifier
docker:
build: project/Dockerfile
image: hub.docker.com/project/image:tag
command: python mysuperscript.py --blabla
environment:
TOKEN: deadBeefToken
configuration:
model: path/to/model
anyKey: anyValue
classes: [X, Y, Z]
user_configuration:
vertical_padding:
type: int
default: 0
title: Vertical Padding
secrets:
- path/to/secret.json
```