Document prediction with language model

c64f30f2 · Solene Tarride · 57684efb · c64f30f2
Commit c64f30f2 authored 1 year ago by Solene Tarride
--- a/docs/usage/predict.md
+++ b/docs/usage/predict.md
@@ -4,27 +4,57 @@ Use the `teklia-dan predict` command to apply a trained DAN model on an image.

 ## Description of parameters

-| Parameter                   | Description                                                                                                                 | Type    | Default       |
-| --------------------------- | --------------------------------------------------------------------------------------------------------------------------- | ------- | ------------- |
-| `--image`                   | Path to the image to predict. Must not be provided with `--image-dir`.                                                      | `Path`  |               |
-| `--image-dir`               | Path to the folder where the images to predict are stored. Must not be provided with `--image`.                             | `Path`  |               |
-| `--image-extension`         | The extension of the images in the folder. Ignored if `--image-dir` is not provided.                                        | `str`   | .jpg          |
-| `--model`                   | Path to the model to use for prediction                                                                                     | `Path`  |               |
-| `--parameters`              | Path to the YAML parameters file.                                                                                           | `Path`  |               |
-| `--charset`                 | Path to the charset file.                                                                                                   | `Path`  |               |
-| `--output`                  | Path to the output folder. Results will be saved in this directory.                                                         | `Path`  |               |
-| `--confidence-score`        | Whether to return confidence scores.                                                                                        | `bool`  | `False`       |
-| `--confidence-score-levels` | Level to return confidence scores. Should be any combination of `["line", "word", "char"]`.                                 | `str`   |               |
-| `--attention-map`           | Whether to plot attention maps.                                                                                             | `bool`  | `False`       |
-| `--attention-map-scale`     | Image scaling factor before creating the GIF.                                                                               | `float` | `0.5`         |
-| `--attention-map-level`     | Level to plot the attention maps. Should be in `["line", "word", "char"]`.                                                  | `str`   | `"line"`      |
-| `--predict-objects`         | Whether to return polygons coordinates.                                                                                     | `bool`  | `False`       |
-| `--word-separators`         | List of word separators.                                                                                                    | `list`  | `[" ", "\n"]` |
-| `--line-separators`         | List of line separators.                                                                                                    | `list`  | `["\n"]`      |
-| `--threshold-method`        | Method to use for attention mask thresholding. Should be in `["otsu", "simple"]`.                                           | `str`   | `"otsu"`      |
-| `--threshold-value `        | Threshold to use for the "simple" thresholding method.                                                                      | `int`   | `0`           |
-| `--batch-size `             | Size of the batches for prediction.                                                                                         | `int`   | `1`           |
-| `--start-token `            | Use a specific starting token at the beginning of the prediction. Useful when making predictions on different single pages. | `str`   | `None`        |
+| Parameter                   | Description                                                                                                                                                     | Type    | Default       |
+| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ------------- |
+| `--image`                   | Path to the image to predict. Must not be provided with `--image-dir`.                                                                                          | `Path`  |               |
+| `--image-dir`               | Path to the folder where the images to predict are stored. Must not be provided with `--image`.                                                                 | `Path`  |               |
+| `--image-extension`         | The extension of the images in the folder. Ignored if `--image-dir` is not provided.                                                                            | `str`   | .jpg          |
+| `--model`                   | Path to the model to use for prediction                                                                                                                         | `Path`  |               |
+| `--parameters`              | Path to the YAML parameters file.                                                                                                                               | `Path`  |               |
+| `--charset`                 | Path to the charset file.                                                                                                                                       | `Path`  |               |
+| `--output`                  | Path to the output folder. Results will be saved in this directory.                                                                                             | `Path`  |               |
+| `--confidence-score`        | Whether to return confidence scores.                                                                                                                            | `bool`  | `False`       |
+| `--confidence-score-levels` | Level to return confidence scores. Should be any combination of `["line", "word", "char"]`.                                                                     | `str`   |               |
+| `--attention-map`           | Whether to plot attention maps.                                                                                                                                 | `bool`  | `False`       |
+| `--attention-map-scale`     | Image scaling factor before creating the GIF.                                                                                                                   | `float` | `0.5`         |
+| `--attention-map-level`     | Level to plot the attention maps. Should be in `["line", "word", "char"]`.                                                                                      | `str`   | `"line"`      |
+| `--predict-objects`         | Whether to return polygons coordinates.                                                                                                                         | `bool`  | `False`       |
+| `--word-separators`         | List of word separators.                                                                                                                                        | `list`  | `[" ", "\n"]` |
+| `--line-separators`         | List of line separators.                                                                                                                                        | `list`  | `["\n"]`      |
+| `--threshold-method`        | Method to use for attention mask thresholding. Should be in `["otsu", "simple"]`.                                                                               | `str`   | `"otsu"`      |
+| `--threshold-value `        | Threshold to use for the "simple" thresholding method.                                                                                                          | `int`   | `0`           |
+| `--batch-size `             | Size of the batches for prediction.                                                                                                                             | `int`   | `1`           |
+| `--start-token `            | Use a specific starting token at the beginning of the prediction. Useful when making predictions on different single pages.                                     | `str`   | `None`        |
+| `--use-language-model`      | Whether to use an external n-gram language model to rescore hypotheses. See [the next section](#rescoring-hypotheses-with-a-n-gram-language-model) for details. | `bool`  | `False`       |
+
+## Rescoring hypotheses with a N-gram language model
+
+A dataset extracted with the `teklia-dan dataset extract` command should contain the files required to build a language model (in the `language_model` folder).
+
+To refine DAN's predictions with a language model, follow these steps:
+
+1. Install and build [kenlm](https://github.com/kpu/kenlm)
+1. Build a 6-gram language model using the following command
+
+```sh
+bin/lmplz --order 6 \
+    --text my_dataset/language_model/corpus.txt \
+    --arpa my_dataset/language_model/model.arpa
+```
+
+1. Update `inference_parameters.yml`. The `weight` parameter defines how much weight to give to the language model. It should be set carefully (usually between 0.5 and 2.0) as it will affect the quality of the predictions.
+
+```yaml
+parameters:
+  ...
+  language_model:
+    model: my_dataset/language_model/model.arpa
+    lexicon: my_dataset/language_model/lexicon.txt
+    tokens: my_dataset/language_model/tokens.txt
+    weight: 1.0
+```
+
+1. Predict with the `--use-language-model` argument.

 ## Examples

@@ -158,3 +188,53 @@ It will create the following JSON file named `dan_humu_page/predict/example.json
 ```

 <img src="../../assets/example_line_polygon.gif" >
+
+### Predict with an external n-gram language model
+
+To run a prediction with the n-gram language model, run this command:
+
+```shell
+teklia-dan predict \
+    --image dan_humu_page/example.jpg \
+    --model dan_humu_page/model.pt \
+    --parameters dan_humu_page/parameters.yml \
+    --charset dan_humu_page/charset.pkl \
+    --use-language-model \
+    --output dan_humu_page/predict/
+```
+
+It will create the following JSON file named `dan_humu_page/predict/example.json`
+
+```json
+{
+  "text": "Oslo\n39 \nOresden den 24te Rasser!\nH\u00f8jst\u00e6redesherr Hartvig - assert!\nUllereder fra den f\u00f8rste tide da\njeg havder den tilfredsstillelser at vide den ar-\ndistiske ledelser af Kristiania theater i Deres\nhronder, har jeg g\u00e5t hernede med et stille\nh\u00e5b om fra Dem at modtage et forelag, sig -\nsende tils at lade \"K\u00e6rlighedens \u00abKomedie\u00bb\nopf\u00f8re fore det norske purblikum.\nEt s\u00e5dant forslag er imidlertid, imod\nforventning; ikke fremkommet, og jeg n\u00f8des der-\nfor tils self at grivbe initiativet, hvilket hervede\nsker, idet jeg\nbeder\nbet\nragte stigkket some ved denne\nskrivelse officielde indleveret til theatret. No-\nget exemplar af bogen vedlagger jeg ikke da\ndenne (i 2den udgave) med Lethed kan er -\nholdet deroppe.\nDe bet\u00e6nkeligheder, jeg i sin tid n\u00e6-\nrede mod stykkets opf\u00f8relse, er for l\u00e6nge si -\ndem forsvundne. Af mange begn er jeg kom-\nmen til den overbevisning at almenlreden\naru har f\u00e5tt sine \u00f8gne opladte for den sand -\nMed at dette arbejde i sin indersten id\u00e9 hviler\np\u00e5 et ubedinget meralsk grundlag, og brad\nstykkets hele kunstneriske struktuve ang\u00e5r,",
+  "language_model": [
+    {
+      "confidence": 0.68,
+      "polygon": [
+        [
+          264,
+          118
+        ],
+        [
+          410,
+          118
+        ],
+        [
+          410,
+          185
+        ],
+        [
+          264,
+          185
+        ]
+      ],
+      "text": "Oslo",
+      "text_confidence": 0.8
+    }
+  ],
+  "attention_gif": "dan_humu_page/predict/example_line.gif"
+}
+```
+
+<img src="../../assets/example_line_polygon.gif" >