Backend for Historical Manuscripts Indexing
Requirements
- Clone of the architecture
- Git
- Make
- Python 3.6+
- pip
- virtualenvwrapper
Dev Setup
git clone git@gitlab.com:arkindex/backend.git
cd backend
mkvirtualenv ark -a .
pip install -e .[test]
When the architecture is running locally to provide required services:
arkindex/manage.py migrate
arkindex/manage.py createsuperuser
Local settings
For development purposes, you can customize the Arkindex settings in arkindex/project/local_settings.py
. This file is not tracked by Git; if it exists, any setting set in this file will override any setting from settings.py
. This is useful for debugging and setting your instance to be convenient for you. Those settings are ignored when deploying to a server.
ImageMagick setup
PDF and image imports in Arkindex will require ImageMagick. Due to its ability to take any computer down if you give it the right parameters (for example, converting a 1000-page PDF file into JPEG files at 30 000 DPI), it has a security policy file. By default, on Ubuntu, PDF conversion is forbidden.
You will need to edit the ImageMagick policy file to get PDF and Image imports to work in Arkindex. The file is located at /etc/ImageMagick-6/policy.xml
.
The line that sets the PDF policy is <policy domain="coder" rights="none" pattern="PDF" />
. Replace none
with read|write
for it to work. See this StackOverflow question for more info.
GitLab OAuth setup
Arkindex uses OAuth to let a user connect their GitLab account(s) and register Git repositories. In local development, you will need to register Arkindex as a GitLab OAuth application for it to work.
Go to GitLab's Applications settings and create a new application with the api
scope and add the following callback URIs:
http://127.0.0.1:8000/api/v1/oauth/providers/gitlab/callback/
http://ark.localhost:8000/api/v1/oauth/providers/gitlab/callback/
https://ark.localhost/api/v1/oauth/providers/gitlab/callback/
Once the application is created, GitLab will provide you with an application ID and a secret. Use the local_settings.py
file to set them:
GITLAB_APP_ID = "24cacf5004bf68ae9daad19a5bba391d85ad1cb0b31366e89aec86fad0ab16cb"
GITLAB_APP_SECRET = "9d96d9d5b1addd7e7e6119a23b1e5b5f68545312bfecb21d1cdc6af22b8628b8"
Local image server
Arkindex splits up image URLs in their image server and the image path. For example, a IIIF server at http://iiif.irht.cnrs.fr/iiif/
and an image at /Paris/JJ042/1.jpg
would be represented as an ImageServer instance holding one Image. Since Arkindex has a local IIIF server for image uploads and thumbnails, a special instance of ImageServer is required to point to this local server. In local developement, this server should be available at https://ark.localhost/iiif
. You will therefore need to create an ImageServer via the Django admin or the Django shell with this URL. To set the local server ID, you can use the LOCAL_IMAGESERVER_ID
environment variable or set a custom setting in local_settings.py
:
LOCAL_IMAGESERVER_ID = 999
Here is how to quickly create the ImageServer using the shell:
backend/arkindex$ ./manage.py shell
>>> from arkindex.images.models import ImageServer
>>> ImageServer.objects.create(id=1, display_name='local', url='https://ark.localhost/iiif')
Note that this local server will only work inside Docker.
User groups
Two groups have a special meaning in Arkindex: The Demo
group, for demo users, and the Internal
group, for special users whose tokens are used by workers. Those groups are configured using the INTERNAL_GROUP_ID
and DEMO_GROUP_ID
settings. The development server will show warnings but let you start the server anyway if the group aren't there; you will be able to access the Django admin and create the groups from there. To create them using the shell:
backend/arkindex$ ./manage.py shell
>>> from django.contrib.auth.models import Group
>>> Group.objects.create(id=1, name='Demo')
>>> Group.objects.create(id=2, name='Internal')
Usage
Makefile
At the root of the repository is a Makefile that provides commands for common operations:
-
make
ormake all
: Clean and build; -
make base
: Create and push thearkindex-base
Docker image that is used to build thearkindex-app
image; -
make clean
: Cleanup the Python package build and cache files; -
make build
: Build the arkindex Python package and recreate thearkindex-app:latest
without pushing to the GitLab container registry; -
make latest
: Build and push thelatest
Docker image to the GitLab container registry; -
make release
: Build and push a release Docker image to the GitLab container registry (use theVERSION
file to update the version number); -
make worker
: Start a local (non-Docker) Celery worker; -
make tunnel
: Open a SSH tunnel via the preproduction server, making your dev server available onarkindex.dev.teklia.com:8000
— useful for webhook related development; -
make test-fixtures
: Create the unit tests fixtures on a temporary PostgreSQL database and save them to thedata.json
file used by most Django unit tests.
Django commands
Aside from the usual Django commands, some custom commands are available via manage.py
:
-
build_fixtures
: Create a set of database elements designed for use by unit tests in a fixture (seemake test-fixtures
); -
from_csv
: Import manifests and index files from a CSV list; -
import_annotations
: Import index files from a folder into a specific volume; -
import_acts
: Import XML surface files and CSV act files; -
delete_corpus
: Delete a big corpus using a Ponos task; -
generate_thumbnails
: Generate thumbnails for volumes; -
reindex
: Run asynchronous tasks on the Celery worker to reindex transcriptions in ElasticSearch; -
telegraf
: A special command with InfluxDB-compatible output for Grafana statistics.
See manage.py <command> --help
to view more details about a specific command.
Code validation
Once your code appears to be working on a local server, a few checks have to be performed:
-
Migrations: Ensure that all migrations have been created by typing
./manage.py makemigrations
. -
Unit tests: Run
./manage.py test
to perform unit tests.
Use./manage.py test module_name
to perform tests on a single module, if you wish to spend less time waiting for all tests to complete. -
Code linting: Type
flake8
inside thebackend/arkindex
directory. Our Flake8 settings should allow 120 characters per line instead of PEP8's 80.
Debugging tools
Run pip install ipython django-debug-toolbar django_extensions
to install all the available optional dev tools for the backend.
IPython will give you a nicer shell with syntax highlighting, auto reloading and much more via ./manage.py shell
.
Django Debug Toolbar provides you with a neat debug sidebar that will help diagnosing slow API endpoints or weird template bugs. Since the Arkindex frontend is completely decoupled from the backend, you will need to browse to an API endpoint to see the debug toolbar.
Django Extensions adds a lot of manage.py
commands ; the most important one is ./manage.py shell_plus
which runs the usual shell but with all the available models pre-imported. You can add your own imports with the local_settings.py
file. Here is an example that imports most of the backend's enums and some special QuerySet features:
SHELL_PLUS_POST_IMPORTS = [
('django.db.models', ('Value', )),
('django.db.models.functions', '*'),
('arkindex.documents.models', (
'ElementType',
'Right',
'PageType',
'PageDirection',
'PageComplement',
)),
('arkindex_common.enums', '*'),
('arkindex.dataimport.models', (
'DataImportMode',
'EventType',
)),
('arkindex.project.aws', (
'S3FileStatus',
)),
('arkindex.users.models', (
'OAuthStatus',
))
]
You may want to also uninstall django-nose
, as it is an optional test runner that is used for code coverage in the CI. Uninstalling will remove about a hundred useless lines in the ./manage.py test
output so you will no longer have to scroll to the test errors list.