Add batch prediction code

changed milestone to %ML Prod - February 2023 n°2

added P2 label

assigned to @melodie.boillet

This code can be tested locally with the following DAN worker function. Note that for this test, the two images are stored locally.

    def process_element(self, element):
        input_image = None

        elements = ["8fa7330a-f971-4f21-a6a5-76d3627177c0", "e3647d68-bd0b-41ca-aace-dd4e6b94568e"]

        input_images = []
        input_sizes = []
        for element in elements:
            logger.info("Downloading image...")
            image_file = cv2.imread(element+".png")

            input_image = np.asarray(image_file)
            input_image = resize(
                input_image,
                max_height=self.h_max,
                max_width=self.w_max,
                output_height=self.config.get("input_height"),
                output_width=self.config.get("input_width"),
            )

            assert input_image is not None, "Image has not been downloaded"

            input_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB)
            logger.info("Image loaded.")

            input_sizes.append(input_image.shape[:2])
            input_images.append(self.model.preprocess(input_image))
            logging.debug("Image pre-processed")

        input_images = pad_images(input_images, self.config.get("padding_value", 0))
        input_tensor = torch.stack([torch.tensor(image).permute((2, 0, 1)) for image in input_images])

        # Run prediction
        logger.info("Predicting NER tags...")
        texts, confidence_scores = self.model.predict(input_tensor, input_sizes, confidences=True)

        for element, text, confidence_score in zip(elements, texts, confidence_scores):

            # Remove whitespaces before and after predicted text
            text, confidence_score = process_text_confidence(
                -1, len(text), text, confidence_score
            )
            assert len(text) == len(
                confidence_score
            ), "The number of tokens doesn't match the number of confidences."

            # PostProcessing
            logger.info("PostProcessing.")
            text, confidence_score = self.post_processor.post_process(
                text, confidence_score
            )
            assert len(text) == len(
                confidence_score
            ), "The number of tokens doesn't match the number of confidences after post-processing."
            # Create the transcription and the entities
            self.create_transcription_entities(element, text, confidence_score)

This function is an update of the original process_element function (https://gitlab.com/teklia/workers/dan/-/blob/main/worker_dan/worker.py#L260). Two parts have been added:

A call to the model preprocessing
The creation of the batch (with padding)

Note: We need to add padding_value: XX to the parameters of the model. In most of the case, this value is equal to 0.

After testing this code with the DAN POPP single page model, I obtained the following logs/results:

2023-02-21 08:20:48,386 INFO/dan: MLflow Logging available.
2023-02-21 08:20:48,393 WARNING/arkindex_worker: Missing ARKINDEX_WORKER_RUN_ID environment variable, worker is in read-only mode
2023-02-21 08:20:48,393 INFO/arkindex_worker: Worker will use /home/mboillet/.local/share/arkindex as working directory
2023-02-21 08:20:54,772 INFO/arkindex_worker: Running with local configuration from dev.yml
2023-02-21 08:20:54,773 INFO/arkindex_worker: Starting ML report for Local worker
2023-02-21 08:20:54,774 WARNING/worker_dan.worker: No GPU available, using CPU
2023-02-21 08:20:55,317 INFO/worker_dan.worker: Registered tokens : ['Ⓢ', 'Ⓕ', 'Ⓑ', 'Ⓛ', 'Ⓝ', 'Ⓒ', 'Ⓚ', 'Ⓔ', 'Ⓞ', 'Ⓟ']
2023-02-21 08:20:55,317 INFO/arkindex_worker: No worker activity will be stored as it is disabled for this process
2023-02-21 08:20:56,570 INFO/arkindex_worker: Processing page AD075DP_D2M8_273_0077_left_page.tif (8fa7330a-f971-4f21-a6a5-76d3627177c0) (1/1)
2023-02-21 08:20:56,570 INFO/worker_dan.worker: Downloading image...
2023-02-21 08:20:56,642 INFO/worker_dan.worker: Image loaded.
2023-02-21 08:20:56,727 INFO/worker_dan.worker: Downloading image...
2023-02-21 08:20:56,779 INFO/worker_dan.worker: Image loaded.
2023-02-21 08:20:57,012 INFO/worker_dan.worker: Predicting NER tags...
2023-02-21 08:21:09,676 INFO/root: Images processed
8fa7330a-f971-4f21-a6a5-76d3627177c0 ⓈSaurin ⒻRobert Ⓑ22 [0.9999556541442871, 0.9999924898147583, 0.999996542930603, 0.9999899864196777, 0.9999990463256836, 1.0, 1.0, 0.9999998807907104, 0.9999998807907104, 1.0, 0.9999983310699463, 0.9999997615814209, 0.9999995231628418, 0.9999978542327881, 1.0, 0.9999998807907104, 0.9999994039535522, 0.9999997615814209, 0.9999995231628418]
e3647d68-bd0b-41ca-aace-dd4e6b94568e ⓈBuge ⒻJules Ⓑ68 ⓁC [0.9999412298202515, 0.9999886751174927, 0.9999793767929077, 1.0, 0.9999942779541016, 0.9999947547912598, 0.9999991655349731, 0.9999998807907104, 0.9999998807907104, 1.0, 0.9999996423721313, 1.0, 1.0, 0.9999998807907104, 0.9999995231628418, 1.0, 0.9999994039535522, 0.9999982118606567, 0.9993196725845337]
2023-02-21 08:21:09,680 INFO/arkindex_worker: Saving ML report to /home/mboillet/.local/share/arkindex/ml_report.json

Here both images are processed in a batch and I print the image id, predicted characters and the confidences. For simplicity, I stopped the process at 20 predicted characters.

added 1 commit

4af23766 - Fix lint

Compare with previous version

mentioned in merge request !62 (merged)

requested review from @schneider-y

changed title from POC: Add batch prediction code to Add batch prediction code

We need to keep supporting the single image mode as well. Do we only need to pass input_tensor=torch.tensor(image), input_sizes=input_image.shape[:2] to this new predict method to make it work? Or does your processing code work already when we use a single image?

It still works for a single image, as long as the dimensions are correct:

Input tensor should be of size batch x 3 x H x W, with batch = 1 in this case
Input sizes should be of size batch x 2, with batch = 1 in this case

To add this first batch dimension, we can use .unsqueeze(0) function on a tensor. It adds a dimension at the first position of the tensor. So if the input image is of shape 3 x H x W, applying image.unsqueeze(0) will give a tensor of shape 1 x 3 x H x W.

LGTM

We will need to patch the worker with single image mode already or we won't be able to bump dan anymore.

merged

Add batch prediction code

Merge request reports

Activity