Skip to content
Snippets Groups Projects

ATR training data generator

This script downloads pages with transcriptions from Arkindex and converts data to ATR format. It also generates reproducible train, val and test splits.

A documentation is available at https://teklia.gitlab.io/atr/data-generator/.

Environment variables

ARKINDEX_API_TOKEN and ARKINDEX_API_URL environment variables must be defined.

You can create an alias by adding this line to your ~/.bashrc:

alias set_demo='export ARKINDEX_API_URL=https://demo.arkindex.org/;export ARKINDEX_API_TOKEN=my_api_token'

Then run:

source ~/.bashrc
set_demo