Closed
Milestone
Jul 2, 2023–Sep 1, 2023
ML Prod - Summer 2023
Unstarted Issues (open and unassigned)
0
Ongoing Issues (open and assigned)
0
Completed Issues (closed)
21
- Unit tests for extraction command
- Type hints in dan.ocr.predict
- Expose start_token parameter on the predict command
- Parse the tokens
- Investigate oom error
- Missing condition for charset transfer
- Directly format the data during the extraction stage
- Migrate to pathlib
- Ruff minimal setup
- Multithreading for dataset extraction
- Remove arkindex.teklia.com mentions
- Remove data splitting code
- Cache page images and crop them for subelements
- Compute dataset statistics after extraction/formatting
- Investigation: Perfomance issues for data extraction
- Update pages link
- Add constraints on shape of text-lines
- Output tensor of logits from DAN prediction for LM rescoring
- Do not normalize tensor for attention map visualization
- Broken warning bubble
- Document batch inference
Loading
Loading
Loading