Skip to content
Snippets Groups Projects
README.md 606 B
Newer Older
Yoann Schneider's avatar
Yoann Schneider committed
# ATR training data generator
Martin's avatar
Martin committed

Martin's avatar
Martin committed
This script downloads pages with transcriptions from Arkindex
Yoann Schneider's avatar
Yoann Schneider committed
and converts data to ATR format.
Solene Tarride's avatar
Solene Tarride committed
It also generates reproducible train, val and test splits.
Martin's avatar
Martin committed

Yoann Schneider's avatar
Yoann Schneider committed
A documentation is available at https://teklia.gitlab.io/atr/data-generator/.
Martin's avatar
Martin committed

Yoann Schneider's avatar
Yoann Schneider committed
## Environment variables
Yoann Schneider's avatar
Yoann Schneider committed

Solene Tarride's avatar
Solene Tarride committed
`ARKINDEX_API_TOKEN` and `ARKINDEX_API_URL` environment variables must be defined.

You can create an alias by adding this line to your `~/.bashrc`:
Yoann Schneider's avatar
Yoann Schneider committed

Solene Tarride's avatar
Solene Tarride committed
```sh
alias set_demo='export ARKINDEX_API_URL=https://demo.arkindex.org/;export ARKINDEX_API_TOKEN=my_api_token'
```

Then run:
Yoann Schneider's avatar
Yoann Schneider committed

Solene Tarride's avatar
Solene Tarride committed
```sh
source ~/.bashrc
set_demo
```