Skip to content
Snippets Groups Projects
README.md 9.8 KiB
Newer Older
Bastien Abadie's avatar
Bastien Abadie committed
# Arkindex Backend
Bastien Abadie's avatar
Bastien Abadie committed

Erwan Rouchet's avatar
Erwan Rouchet committed
[![pipeline status](https://gitlab.teklia.com/arkindex/backend/badges/master/pipeline.svg)](https://gitlab.teklia.com/arkindex/backend/commits/master)
Yoann Schneider's avatar
Yoann Schneider committed
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
Bastien Abadie's avatar
Bastien Abadie committed

Bastien Abadie's avatar
Bastien Abadie committed
This project is the open-source backend of Arkindex, used to manage and process image documents with Machine Learning tools.

It is licensed under the [AGPL-v3 license](./LICENSE).

Erwan Rouchet's avatar
Erwan Rouchet committed
## Requirements
Bastien Abadie's avatar
Bastien Abadie committed

Erwan Rouchet's avatar
Erwan Rouchet committed
* Git
* Make
Bastien Abadie's avatar
Bastien Abadie committed
* Python 3.10+
Erwan Rouchet's avatar
Erwan Rouchet committed
* pip
Bastien Abadie's avatar
Bastien Abadie committed
* [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/)
* [Docker 24+](https://docs.docker.com/engine/install/#supported-platforms)
Bastien Abadie's avatar
Bastien Abadie committed
* [mkcert](https://github.com/FiloSottile/mkcert?tab=readme-ov-file#installation)
* [GeoDjango system dependencies](https://docs.djangoproject.com/en/3.1/ref/contrib/gis/install/geolibs/): `sudo apt install binutils libproj-dev gdal-bin`
Bastien Abadie's avatar
Bastien Abadie committed

Bastien Abadie's avatar
Bastien Abadie committed
## Setup for developers
Bastien Abadie's avatar
Bastien Abadie committed

Bastien Abadie's avatar
Bastien Abadie committed
You'll also need the [Arkindex frontend](https://gitlab.teklia.com/arkindex/frontend) to be able to develop on the whole platform.

```console
Erwan Rouchet's avatar
Erwan Rouchet committed
git clone git@gitlab.teklia.com:arkindex/backend.git
Bastien Abadie's avatar
Bastien Abadie committed
git clone git@gitlab.teklia.com:arkindex/frontend.git
Bastien Abadie's avatar
Bastien Abadie committed
cd backend
Bastien Abadie's avatar
Bastien Abadie committed
mkvirtualenv ark -a . -p /usr/bin/python3.10
Bastien Abadie's avatar
Bastien Abadie committed
pip install -e .[test]
```

Bastien Abadie's avatar
Bastien Abadie committed
The Arkindex backend relies on some open-source services to store data and communicate to asynchronous workers.
To run all the required services, please run in a dedicated shell:
Bastien Abadie's avatar
Bastien Abadie committed

Bastien Abadie's avatar
Bastien Abadie committed
```console
make services
Bastien Abadie's avatar
Bastien Abadie committed
```
Bastien Abadie's avatar
Bastien Abadie committed

On a first run, you'll need to:

1. Configure the instance by enabling the sample configuration.
2. Populate the database structure.
3. Initialize some fields in the database.
4. Create an administration account.

All of these steps are done through:

```console
cp config.yml.sample arkindex/config.yml
arkindex migrate
Bastien Abadie's avatar
Bastien Abadie committed
arkindex bootstrap
arkindex createsuperuser
Bastien Abadie's avatar
Bastien Abadie committed
```

Bastien Abadie's avatar
Bastien Abadie committed
Finally, you can run the backend:

```console
arkindex runserver
```

At this stage, you can use `http://localhost:8000/admin` to access the administration interface.

### Asycnhronous tasks

To run asynchronous tasks, run in another shell:

```console
make worker
```
Erwan Rouchet's avatar
Erwan Rouchet committed

Bastien Abadie's avatar
Bastien Abadie committed
### Dockerized stack
Bastien Abadie's avatar
Bastien Abadie committed
It is also possible to run the whole Arkindex stack through Docker containers. This is useful to quickly test the platform.
Bastien Abadie's avatar
Bastien Abadie committed
This command will build all the required Docker images (backend & frontend) and run them as Docker containers:
Erwan Rouchet's avatar
Erwan Rouchet committed

Bastien Abadie's avatar
Bastien Abadie committed
```console
make stack
```

You'll be able to access the platform at the url `https://ark.localhost`.
Erwan Rouchet's avatar
Erwan Rouchet committed

Bastien Abadie's avatar
Bastien Abadie committed
### Local configuration
Erwan Rouchet's avatar
Erwan Rouchet committed

Bastien Abadie's avatar
Bastien Abadie committed
For development purposes, you can customize the Arkindex settings by adding a YAML file as `arkindex/config.yml`. This file is not tracked by Git; if it exists, any configuration directive set in this file will be used for exposed settings from `settings.py`. You can view the full list of settings [on the wiki](https://redmine.teklia.com/projects/arkindex/wiki/Backend_configuration).
Erwan Rouchet's avatar
Erwan Rouchet committed

Bastien Abadie's avatar
Bastien Abadie committed
Another way to customize your Arkindex instance is to add a Python file in `arkindex/project/local_settings.py`. Here you are not limited to exposed settings, and can customize any setting, or even load Python dependencies at boot time. This is not recommended, as your customization may not be available to real-world Arkindex instances.
Erwan Rouchet's avatar
Erwan Rouchet committed

### Local image server

Bastien Abadie's avatar
Bastien Abadie committed
Arkindex splits up image URLs in their image server and the image path. For example, a IIIF server at `http://iiif.irht.cnrs.fr/iiif/` and an image at `/Paris/JJ042/1.jpg` would be represented as an ImageServer instance holding one Image. Since Arkindex has a local IIIF server for image uploads and thumbnails, a special instance of ImageServer is required to point to this local server. In local development, this server should be available at `https://ark.localhost/iiif`. You will therefore need to create an ImageServer via the Django admin or the Django shell with this URL. To set the local server ID, you can add a custom setting in `arkindex/config.yml`:
Erwan Rouchet's avatar
Erwan Rouchet committed

```yaml
local_imageserver_id: 999
Erwan Rouchet's avatar
Erwan Rouchet committed
```

Here is how to quickly create the ImageServer using the shell:

Bastien Abadie's avatar
Bastien Abadie committed
```python
$ arkindex shell
>>> from arkindex.images.models import ImageServer
>>> ImageServer.objects.create(id=1, display_name='local', url='https://ark.localhost/iiif')
```

Note that this local server will only work inside Docker.

Erwan Rouchet's avatar
Erwan Rouchet committed
## Usage

Erwan Rouchet's avatar
Erwan Rouchet committed

At the root of the repository is a Makefile that provides commands for common operations:

* `make` or `make all`: Clean and build;
* `make base`: Create and push the `arkindex-base` Docker image that is used to build the `arkindex-app` image;
* `make clean`: Cleanup the Python package build and cache files;
* `make clean-docker`: Deletes all running containers to avoid naming and network ports conflicts;
Erwan Rouchet's avatar
Erwan Rouchet committed
* `make build`: Build the arkindex Python package and recreate the `arkindex-app:latest` without pushing to the GitLab container registry;
* `make test-fixtures`: Create the unit tests fixtures on a temporary PostgreSQL database and save them to the `data.json` file used by most Django unit tests.

### Django commands

Aside from the usual Django commands, some custom commands are available via `arkindex`:
Bastien Abadie's avatar
Bastien Abadie committed
* `build_fixtures`: Create a set of database elements designed for use by unit tests in a fixture (see `make test-fixtures`).
* `delete_corpus`: Delete a big corpus using an RQ task.
* `reindex`: Reindex elements into Solr.
* `move_lines_to_parents`: Moves element children to their geographical parents.
See `arkindex <command> --help` to view more details about a specific command.
Erwan Rouchet's avatar
Erwan Rouchet committed

## Code validation

Once your code appears to be working on a local server, a few checks have to be performed:

* **Migrations:** Ensure that all migrations have been created by typing `arkindex makemigrations`.
* **Unit tests:** Run `arkindex test` to perform unit tests.
   - Use `arkindex test module_name` to perform tests on a single module, if you wish to spend less time waiting for all tests to complete.
### Linting

We use [pre-commit](https://pre-commit.com/) to check the Python source code syntax of this project.

To be efficient, you should run pre-commit before committing (hence the name...).

Bastien Abadie's avatar
Bastien Abadie committed
To do that, run once:
Bastien Abadie's avatar
Bastien Abadie committed
```console
pip install pre-commit
pre-commit install
```

The linting workflow will now run on modified files before committing, and may fix issues for you.

If you want to run the full workflow on all the files: `pre-commit run -a`.
Erwan Rouchet's avatar
Erwan Rouchet committed

Run `pip install ipython django-debug-toolbar django_extensions` to install all the available optional dev tools for the backend.

IPython will give you a nicer shell with syntax highlighting, auto reloading and much more via `arkindex shell`.
Erwan Rouchet's avatar
Erwan Rouchet committed

[Django Debug Toolbar](https://django-debug-toolbar.readthedocs.io/en/latest/) provides you with a neat debug sidebar that will help diagnosing slow API endpoints or weird template bugs. Since the Arkindex frontend is completely decoupled from the backend, you will need to browse to an API endpoint to see the debug toolbar.

Bastien Abadie's avatar
Bastien Abadie committed
[Django Extensions](https://django-extensions.readthedocs.io/en/latest/) adds a *lot* of `arkindex` commands ; the most important one is `arkindex shell_plus` which runs the usual shell but with all the available models pre-imported. You can add your own imports with the `local_settings.py` file. Here is an example that imports some of the backend's enums and some special QuerySet features:
Bastien Abadie's avatar
Bastien Abadie committed
```python
Erwan Rouchet's avatar
Erwan Rouchet committed
SHELL_PLUS_POST_IMPORTS = [
    ('django.db.models', ('Value', )),
    ('django.db.models.functions', '*'),
    ('arkindex.documents.models', (
        'ElementType',
        'Right',
    )),
Erwan Rouchet's avatar
Erwan Rouchet committed
    ('arkindex.process.models', (
Bastien Abadie's avatar
Bastien Abadie committed
        'ProcessMode',
Erwan Rouchet's avatar
Erwan Rouchet committed
    )),
Valentin Rigal's avatar
Valentin Rigal committed
    ('arkindex.project.aws', (
        'S3FileStatus',
## Asynchronous tasks

Bastien Abadie's avatar
Bastien Abadie committed
We use [rq](https://python-rq.org/), integrated via [django-rq](https://pypi.org/project/django-rq/), to run tasks without blocking an API request or causing timeouts. To call them in Python code, you should use the trigger methods in `arkindex.project.triggers`; those will do some safety checks to make catching some errors easier in dev. The actual tasks are in `arkindex.documents.tasks`, or in other `tasks` modules within each Django app. The following tasks exist:

* Delete a corpus: `corpus_delete`
Erwan Rouchet's avatar
Erwan Rouchet committed
* Delete a list of elements: `element_trash`
* Delete worker results (transcriptions, classifications, etc. of a worker version): `worker_results_delete`
* Move an element to another parent: `move_element`
Bastien Abadie's avatar
Bastien Abadie committed
* Create `WorkerActivity` instances for all elements of a process: `initialize_activity`
Erwan Rouchet's avatar
Erwan Rouchet committed
* Delete a process and its worker activities: `process_delete`
* Export a corpus to an SQLite database: `export_corpus`
Bastien Abadie's avatar
Bastien Abadie committed
To run them, use `make worker` to start a RQ worker. You will need to have Redis running; `make services` will provide it. `make stack` also provides an RQ worker running in Docker from a binary build.
## Metrics
The application serves metrics for Prometheus under the `/metrics` prefix.
A specific port can be used by setting the `PROMETHEUS_METRICS_PORT` environment variable, thus separating the application from the metrics API.
Bastien Abadie's avatar
Bastien Abadie committed

## Migration from `architecture` setup

If you were using the `architecture` repository previously to run Arkindex, you'll need to migrate MinIO data from a static path on your computer towards a new docker volume.

```console
docker volume create arkindex_miniodata
mv /usr/share/arkindex/s3/data/iiif /var/lib/docker/volumes/arkindex_miniodata/_data/uploads
mv /usr/share/arkindex/s3/data/{export,iiif-cache,ponos-logs,ponos-artifacts,staging,thumbnails,training} /var/lib/docker/volumes/arkindex_miniodata/_data/
```

You will also need to setup [mkcert](https://github.com/FiloSottile/mkcert?tab=readme-ov-file#installation) as we do not use Teklia development Certificate Authority anymore. `mkcert` will take care of SSL certificates automatically, updating your browsers and system certificate store !

Finally, you can remove the `architecture` project from your work folder, as it's now archived and could be confusing.