ATR training data generator
This script downloads pages with transcriptions from Arkindex and converts data to ATR format. It also generates reproducible train, val and test splits.
A documentation is available at https://teklia.gitlab.io/atr/data-generator/.
Installation
- Use a virtualenv (e.g. with virtualenvwrapper
mkvirtualenv -a . atr-data-gen
) - Install atr-data-generator as a package from a local clone:
- The
teklia-document-processing
library is setup via git submodule. Please rungit submodule update --init
. - Then install both packages via
pip install ./document-processing -e .
- The