diff --git a/docs/usage/predict.md b/docs/usage/predict.md index d6d46350d3db18aaa35b9ff732d077d48dd767f3..e3e68385773070aceef5742574e92ba711964a5a 100644 --- a/docs/usage/predict.md +++ b/docs/usage/predict.md @@ -4,27 +4,57 @@ Use the `teklia-dan predict` command to apply a trained DAN model on an image. ## Description of parameters -| Parameter | Description | Type | Default | -| --------------------------- | --------------------------------------------------------------------------------------------------------------------------- | ------- | ------------- | -| `--image` | Path to the image to predict. Must not be provided with `--image-dir`. | `Path` | | -| `--image-dir` | Path to the folder where the images to predict are stored. Must not be provided with `--image`. | `Path` | | -| `--image-extension` | The extension of the images in the folder. Ignored if `--image-dir` is not provided. | `str` | .jpg | -| `--model` | Path to the model to use for prediction | `Path` | | -| `--parameters` | Path to the YAML parameters file. | `Path` | | -| `--charset` | Path to the charset file. | `Path` | | -| `--output` | Path to the output folder. Results will be saved in this directory. | `Path` | | -| `--confidence-score` | Whether to return confidence scores. | `bool` | `False` | -| `--confidence-score-levels` | Level to return confidence scores. Should be any combination of `["line", "word", "char"]`. | `str` | | -| `--attention-map` | Whether to plot attention maps. | `bool` | `False` | -| `--attention-map-scale` | Image scaling factor before creating the GIF. | `float` | `0.5` | -| `--attention-map-level` | Level to plot the attention maps. Should be in `["line", "word", "char"]`. | `str` | `"line"` | -| `--predict-objects` | Whether to return polygons coordinates. | `bool` | `False` | -| `--word-separators` | List of word separators. | `list` | `[" ", "\n"]` | -| `--line-separators` | List of line separators. | `list` | `["\n"]` | -| `--threshold-method` | Method to use for attention mask thresholding. Should be in `["otsu", "simple"]`. | `str` | `"otsu"` | -| `--threshold-value ` | Threshold to use for the "simple" thresholding method. | `int` | `0` | -| `--batch-size ` | Size of the batches for prediction. | `int` | `1` | -| `--start-token ` | Use a specific starting token at the beginning of the prediction. Useful when making predictions on different single pages. | `str` | `None` | +| Parameter | Description | Type | Default | +| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ------------- | +| `--image` | Path to the image to predict. Must not be provided with `--image-dir`. | `Path` | | +| `--image-dir` | Path to the folder where the images to predict are stored. Must not be provided with `--image`. | `Path` | | +| `--image-extension` | The extension of the images in the folder. Ignored if `--image-dir` is not provided. | `str` | .jpg | +| `--model` | Path to the model to use for prediction | `Path` | | +| `--parameters` | Path to the YAML parameters file. | `Path` | | +| `--charset` | Path to the charset file. | `Path` | | +| `--output` | Path to the output folder. Results will be saved in this directory. | `Path` | | +| `--confidence-score` | Whether to return confidence scores. | `bool` | `False` | +| `--confidence-score-levels` | Level to return confidence scores. Should be any combination of `["line", "word", "char"]`. | `str` | | +| `--attention-map` | Whether to plot attention maps. | `bool` | `False` | +| `--attention-map-scale` | Image scaling factor before creating the GIF. | `float` | `0.5` | +| `--attention-map-level` | Level to plot the attention maps. Should be in `["line", "word", "char"]`. | `str` | `"line"` | +| `--predict-objects` | Whether to return polygons coordinates. | `bool` | `False` | +| `--word-separators` | List of word separators. | `list` | `[" ", "\n"]` | +| `--line-separators` | List of line separators. | `list` | `["\n"]` | +| `--threshold-method` | Method to use for attention mask thresholding. Should be in `["otsu", "simple"]`. | `str` | `"otsu"` | +| `--threshold-value ` | Threshold to use for the "simple" thresholding method. | `int` | `0` | +| `--batch-size ` | Size of the batches for prediction. | `int` | `1` | +| `--start-token ` | Use a specific starting token at the beginning of the prediction. Useful when making predictions on different single pages. | `str` | `None` | +| `--use-language-model` | Whether to use an external n-gram language model to rescore hypotheses. See [the next section](#rescoring-hypotheses-with-a-n-gram-language-model) for details. | `bool` | `False` | + +## Rescoring hypotheses with a N-gram language model + +A dataset extracted with the `teklia-dan dataset extract` command should contain the files required to build a language model (in the `language_model` folder). + +To refine DAN's predictions with a language model, follow these steps: + +1. Install and build [kenlm](https://github.com/kpu/kenlm) +1. Build a 6-gram language model using the following command + +```sh +bin/lmplz --order 6 \ + --text my_dataset/language_model/corpus.txt \ + --arpa my_dataset/language_model/model.arpa +``` + +1. Update `inference_parameters.yml`. The `weight` parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions. + +```yaml +parameters: + ... + language_model: + model: my_dataset/language_model/model.arpa + lexicon: my_dataset/language_model/lexicon.txt + tokens: my_dataset/language_model/tokens.txt + weight: 1.0 +``` + +1. Predict with the `--use-language-model` argument. ## Examples @@ -158,3 +188,53 @@ It will create the following JSON file named `dan_humu_page/predict/example.json ``` <img src="../../assets/example_line_polygon.gif" > + +### Predict with an external n-gram language model + +To run a prediction with the n-gram language model, run this command: + +```shell +teklia-dan predict \ + --image dan_humu_page/example.jpg \ + --model dan_humu_page/model.pt \ + --parameters dan_humu_page/parameters.yml \ + --charset dan_humu_page/charset.pkl \ + --use-language-model \ + --output dan_humu_page/predict/ +``` + +It will create the following JSON file named `dan_humu_page/predict/example.json` + +```json +{ + "text": "Oslo\n39 \nOresden den 24te Rasser!\nH\u00f8jst\u00e6redesherr Hartvig - assert!\nUllereder fra den f\u00f8rste tide da\njeg havder den tilfredsstillelser at vide den ar-\ndistiske ledelser af Kristiania theater i Deres\nhronder, har jeg g\u00e5t hernede med et stille\nh\u00e5b om fra Dem at modtage et forelag, sig -\nsende tils at lade \"K\u00e6rlighedens \u00abKomedie\u00bb\nopf\u00f8re fore det norske purblikum.\nEt s\u00e5dant forslag er imidlertid, imod\nforventning; ikke fremkommet, og jeg n\u00f8des der-\nfor tils self at grivbe initiativet, hvilket hervede\nsker, idet jeg\nbeder\nbet\nragte stigkket some ved denne\nskrivelse officielde indleveret til theatret. No-\nget exemplar af bogen vedlagger jeg ikke da\ndenne (i 2den udgave) med Lethed kan er -\nholdet deroppe.\nDe bet\u00e6nkeligheder, jeg i sin tid n\u00e6-\nrede mod stykkets opf\u00f8relse, er for l\u00e6nge si -\ndem forsvundne. Af mange begn er jeg kom-\nmen til den overbevisning at almenlreden\naru har f\u00e5tt sine \u00f8gne opladte for den sand -\nMed at dette arbejde i sin indersten id\u00e9 hviler\np\u00e5 et ubedinget meralsk grundlag, og brad\nstykkets hele kunstneriske struktuve ang\u00e5r,", + "language_model": [ + { + "confidence": 0.68, + "polygon": [ + [ + 264, + 118 + ], + [ + 410, + 118 + ], + [ + 410, + 185 + ], + [ + 264, + 185 + ] + ], + "text": "Oslo", + "text_confidence": 0.8 + } + ], + "attention_gif": "dan_humu_page/predict/example_line.gif" +} +``` + +<img src="../../assets/example_line_polygon.gif" >