Release notes
- 5386b77f Remove typesystem dep
- a903b466 Bump Python requirement arkindex-client to 1.0.11
- 27b26254 Bump Python requirement teklia-line-image-extractor to 0.2.7
- b6481873 Bump Python requirement tqdm to 4.64.1
- 406d9dc9 Remove apistar dep
- cdd1a6a6 Bump precommit hooks
- 54312b78 Allow filtering by direct parent's metadata
- 8a241052 Only look for classes when we load them
- 33f6dc16 Use new official repository for flake8 on CI
- 7be2623c Implement metadata filtering
- 9be8a911 Bump Python requirement jsonargparse to 4.13.2
- 94056a19 Bump Python requirement jsonargparse to 4.13.1
- 48d67da0 Bump Python requirement jsonargparse to 4.11.0
- cca26d5e Export parameters json
- 32371eab Split by page
- 95c80791 Add basic tests for downloading and splitting
- 1b72fc39 Bump Python requirement tqdm to 4.64.0
- d0fd44fa Rename images with line id
- fd06f077 Bump Python requirement teklia-line-image-extractor to 0.2.4
- 5eb33cdf Bump Python requirement arkindex-client to 1.0.9
- dcbea608 Update CI
- d62f5241 Use line image extractor
- 2f72d0a5 Revert "use cached paginate in selection"
- 2dd8a442 use cached paginate in selection
- 10377420 fix linting
- aa982900 support choosing elements by arkindex selection
- f2d4d49d fix linting
- e86b736b fail when transcriptions contain a newline
- 9b537f02 Add style filter (handwritten, typewritten); support ignored_classes
- 3fbdf930 no more best_classes
- a5a40182 choose only one transcription based on the accepted worker version ids (order is important)
- 5d328548 add username to default cache path to avoid conflicts with other user caches
- 32de35af fix linting
- 7c56a81c add cached api client
- d6cf517d Raise an exception if multiple transcriptions from the same text_line
- 9ab6085f add skew extraction modes
- a297c651 cache full size images to make using several extractions of the same images faster
- 641d6c3f remove deprecated filtering by source.slug
- 681f7ff5 Don't filter vertical lines with rotation class
- 9f61d8c8 Polygon resize
- f096d85d fix color arg (before always grayscale)
- 47be8292 Use rotation class
- 3353a171 forgot to commit main
- a53f8f96 rename script to avoid clash with package name
- 540884e9 Add deskew extraction
- fcfdfcbe add sorts to make the splits reproducible
- ada7640d use text_line by default
- 389d8bb5 update gitignore
- 3e6e387a add codespell ignore lines file
- dceab604 add gitlab ci, first test
- d1c3fc0b move kaldi_data_generator to dir
- 447a1551 remove duplicate f.close()
- 63e24774 fix kraken format with polygon
- 493b10c8 use manual instead of None for manual transcriptions
- a0c0910c applied filter by worker version commit
- 45c9e42e Add option to skip vertical lines
- b263ee7c integrating kraken
- c6363c89 --dataset_name is not required for --split_only
- 5a280bd7 use latest version of arkindex-client
- a24a0f08 fix formatting
- d8a643d0 add option to use existing split; add option to create a split from already downloaded lines
- 338fda47 support new transcriptions
- 96c2464a Filter elements according to their classes
- 9556bdbc fix formatting
- a21bc941 select folders by ids
- 2b036eb1 add volume_type option
- 48746884 set random seed
- 2f412361 sort lines before extracting
- aac3d0f7 add tqdm to requirements
- c77030ab add argument to select slugs
- f5adb3b4 add tqdm for duration estimation
- 050824d0 fix bug if no s3_url
- 94bd75e2 fix formatting
- 0989a2df use logger
- c77cdf57 use corpus id
- 1d3e5576 Update README.md
- f318c3d9 add types
- a55ed1c7 clean up
- 3e331778 update readme
- c98795ce add option to extract polygon images
- e050fe21 deal with negative polygons
- e0214ac2 update readme
- 918c600b add requirements
- d202ed1b fix line_id bug
- 6aee925c remove example fn
- 9d08e020 add readme
- 788aa635 add argparse
- 5b8081dd add split enum
- 967fb987 extract kaldi partition splitter
- 4a76d32d refactor, use class
- 830ddf72 initial commit