Backend for Historical Manuscripts Indexing
===========================================

[![pipeline status](https://gitlab.com/teklia/arkindex/backend/badges/master/pipeline.svg)](https://gitlab.com/teklia/arkindex/backend/commits/master)

## Requirements

* Clone of the [architecture](https://gitlab.com/teklia/arkindex/architecture)
* Git
* Make
* Python 3.6+
* pip
* [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/)

## Dev Setup

```
git clone git@gitlab.com:arkindex/backend.git
cd backend
mkvirtualenv ark -a .
pip install -e .[test]
```

When the [architecture](https://gitlab.com/teklia/arkindex/architecture) is running locally to provide required services:

```
arkindex/manage.py migrate
arkindex/manage.py createsuperuser
```

### Local configuration

For development purposes, you can customize the Arkindex settings by adding a YAML file as `arkindex/config.yml`. This file is not tracked by Git; if it exists, any configuration directive set in this file will be used for exposed settings from `settings.py`. You can view the full list of settings [on the wiki](https://wiki.vpn/en/arkindex/deploy/configuration).


Another mean to customize your Arkindex instance is to add a Python file in `arkindex/project/local_settings.py`. Here you are not limited to exposed settings, and can customize any setting, or even load Python dependencies at boot time. This is not recommended, as your customization may not be available to real-world Arkindex instances.

### ImageMagick setup

PDF and image imports in Arkindex will require ImageMagick. Due to its ability to take any computer down if you give it the right parameters (for example, converting a 1000-page PDF file into JPEG files at 30 000 DPI), it has a security policy file. By default, on Ubuntu, PDF conversion is forbidden.

You will need to edit the ImageMagick policy file to get PDF and Image imports to work in Arkindex. The file is located at `/etc/ImageMagick-6/policy.xml`.

The line that sets the PDF policy is `<policy domain="coder" rights="none" pattern="PDF" />`. Replace `none` with `read|write` for it to work. See [this StackOverflow question](https://stackoverflow.com/questions/52998331) for more info.

### GitLab OAuth setup

Arkindex uses OAuth to let a user connect their GitLab account(s) and register Git repositories. In local development, you will need to register Arkindex as a GitLab OAuth application for it to work.

Go to GitLab's [Applications settings](https://gitlab.com/profile/applications) and create a new application with the `api` scope and add the following callback URIs:

```
http://127.0.0.1:8000/api/v1/oauth/providers/gitlab/callback/
http://ark.localhost:8000/api/v1/oauth/providers/gitlab/callback/
https://ark.localhost/api/v1/oauth/providers/gitlab/callback/
```

Once the application is created, GitLab will provide you with an application ID and a secret. Use the `arkindex/config.yml` file to set them:

```yaml
gitlab:
  app_id: 24cacf5004bf68ae9daad19a5bba391d85ad1cb0b31366e89aec86fad0ab16cb
  app_secret: 9d96d9d5b1addd7e7e6119a23b1e5b5f68545312bfecb21d1cdc6af22b8628b8
```

### Local image server

Arkindex splits up image URLs in their image server and the image path. For example, a IIIF server at `http://iiif.irht.cnrs.fr/iiif/` and an image at `/Paris/JJ042/1.jpg` would be represented as an ImageServer instance holding one Image. Since Arkindex has a local IIIF server for image uploads and thumbnails, a special instance of ImageServer is required to point to this local server. In local development, this server should be available at `https://ark.localhost/iiif`. You will therefore need to create an ImageServer via the Django admin or the Django shell with this URL. To set the local server ID, you can add a custom setting in `arkindex/config.yml`:

```yaml
local_imageserver_id: 999
```

Here is how to quickly create the ImageServer using the shell:

```
backend/arkindex$ ./manage.py shell
>>> from arkindex.images.models import ImageServer
>>> ImageServer.objects.create(id=1, display_name='local', url='https://ark.localhost/iiif')
```

Note that this local server will only work inside Docker.

### User groups

We use a custom group model in `arkindex.users.models` (not the `django.contrib.auth` one).
In this early version groups do not define any right yet.

## Usage

### Makefile

At the root of the repository is a Makefile that provides commands for common operations:

* `make` or `make all`: Clean and build;
* `make base`: Create and push the `arkindex-base` Docker image that is used to build the `arkindex-app` image;
* `make clean`: Cleanup the Python package build and cache files;
* `make build`: Build the arkindex Python package and recreate the `arkindex-app:latest` without pushing to the GitLab container registry;
* `make test-fixtures`: Create the unit tests fixtures on a temporary PostgreSQL database and save them to the `data.json` file used by most Django unit tests.

### Django commands

Aside from the usual Django commands, some custom commands are available via `manage.py`:

* `build_fixtures`: Create a set of database elements designed for use by unit tests in a fixture (see `make test-fixtures`);
* `from_csv`: Import manifests and index files from a CSV list;
* `import_annotations`: Import index files from a folder into a specific volume;
* `import_acts`: Import XML surface files and CSV act files;
* `delete_corpus`: Delete a big corpus using a Ponos task;
* `reindex`: Reindex elements into Solr;
* `telegraf`: A special command with InfluxDB-compatible output for Grafana statistics.
* `move_lines_to_parents`: Moves element children to their geographical parents;

See `manage.py <command> --help` to view more details about a specific command.

## Code validation

Once your code appears to be working on a local server, a few checks have to be performed:

* **Migrations:** Ensure that all migrations have been created by typing `./manage.py makemigrations`.
* **Unit tests:** Run `./manage.py test` to perform unit tests.
   - Use `./manage.py test module_name` to perform tests on a single module, if you wish to spend less time waiting for all tests to complete.

### Linting

We use [pre-commit](https://pre-commit.com/) to check the Python source code syntax of this project.

To be efficient, you should run pre-commit before committing (hence the name...).

To do that, run once :

```
pip install pre-commit
pre-commit install
```

The linting workflow will now run on modified files before committing, and may fix issues for you.

If you want to run the full workflow on all the files: `pre-commit run -a`.

## Debugging tools

Run `pip install ipython django-debug-toolbar django_extensions` to install all the available optional dev tools for the backend.

IPython will give you a nicer shell with syntax highlighting, auto reloading and much more via `./manage.py shell`.

[Django Debug Toolbar](https://django-debug-toolbar.readthedocs.io/en/latest/) provides you with a neat debug sidebar that will help diagnosing slow API endpoints or weird template bugs. Since the Arkindex frontend is completely decoupled from the backend, you will need to browse to an API endpoint to see the debug toolbar.

[Django Extensions](https://django-extensions.readthedocs.io/en/latest/) adds a *lot* of `manage.py` commands ; the most important one is `./manage.py shell_plus` which runs the usual shell but with all the available models pre-imported. You can add your own imports with the `local_settings.py` file. Here is an example that imports most of the backend's enums and some special QuerySet features:

``` python
SHELL_PLUS_POST_IMPORTS = [
    ('django.db.models', ('Value', )),
    ('django.db.models.functions', '*'),
    ('arkindex.documents.models', (
        'ElementType',
        'Right',
        'PageType',
        'PageDirection',
        'PageComplement',
    )),
    ('arkindex.dataimport.models', (
        'DataImportMode',
    )),
    ('arkindex.project.aws', (
        'S3FileStatus',
    )),
    ('arkindex.users.models', (
        'OAuthStatus',
    ))
]
```

You may want to also uninstall `django-nose`, as it is an optional test runner that is used for code coverage in the CI. Uninstalling will remove about a hundred useless lines in the `./manage.py test` output so you will no longer have to scroll to the test errors list.

## Asynchronous tasks

We use [rq](https://python-rq.org/), integrated via [django-rq](https://pypi.org/project/django-rq/), to run tasks without blocking an API request or causing timeouts. To call them in Python code, you should use the trigger methods in `arkindex.project.triggers`; those will do some safety checks to make catching some errors easier in dev. The actual tasks are in `arkindex.documents.tasks`. The following tasks exist:

* Delete a corpus: `corpus_delete`
* Delete a list of elements: `element_trash`
* Delete worker results (transcriptions, classifications, etc. of a worker version): `worker_results_delete`
* Move an element to another parent: `move_element`
* Create `WorkerActivity` instances for all elements of a process: `intitialize_activity`
* Delete a process and its worker activities: `process_delete`
* Export a corpus to an SQLite database: `export_corpus`

To run them, use `make worker` to start a RQ worker. You will need to have Redis running; `make slim` or `make` in the architecture will provide it. `make` in the architecture also provides a RQ worker running in Docker from a binary build.