Skip to content
Snippets Groups Projects

ATR training data generator

This script downloads pages with transcriptions from Arkindex and converts data to ATR format. It also generates reproducible train, val and test splits.

A documentation is available at https://teklia.gitlab.io/atr/data-generator/.

Installation

  • Use a virtualenv (e.g. with virtualenvwrapper mkvirtualenv -a . atr-data-gen)
  • Install atr-data-generator as a package from a local clone:
    1. The teklia-document-processing library is setup via git submodule. Please run git submodule update --init.
    2. Then install both packages via pip install ./document-processing -e .