Add batch prediction code
Closes #31 (closed)
Merge request reports
Activity
changed milestone to %ML Prod - February 2023 n°2
added P2 label
This code can be tested locally with the following DAN worker function. Note that for this test, the two images are stored locally.
def process_element(self, element): input_image = None elements = ["8fa7330a-f971-4f21-a6a5-76d3627177c0", "e3647d68-bd0b-41ca-aace-dd4e6b94568e"] input_images = [] input_sizes = [] for element in elements: logger.info("Downloading image...") image_file = cv2.imread(element+".png") input_image = np.asarray(image_file) input_image = resize( input_image, max_height=self.h_max, max_width=self.w_max, output_height=self.config.get("input_height"), output_width=self.config.get("input_width"), ) assert input_image is not None, "Image has not been downloaded" input_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB) logger.info("Image loaded.") input_sizes.append(input_image.shape[:2]) input_images.append(self.model.preprocess(input_image)) logging.debug("Image pre-processed") input_images = pad_images(input_images, self.config.get("padding_value", 0)) input_tensor = torch.stack([torch.tensor(image).permute((2, 0, 1)) for image in input_images]) # Run prediction logger.info("Predicting NER tags...") texts, confidence_scores = self.model.predict(input_tensor, input_sizes, confidences=True) for element, text, confidence_score in zip(elements, texts, confidence_scores): # Remove whitespaces before and after predicted text text, confidence_score = process_text_confidence( -1, len(text), text, confidence_score ) assert len(text) == len( confidence_score ), "The number of tokens doesn't match the number of confidences." # PostProcessing logger.info("PostProcessing.") text, confidence_score = self.post_processor.post_process( text, confidence_score ) assert len(text) == len( confidence_score ), "The number of tokens doesn't match the number of confidences after post-processing." # Create the transcription and the entities self.create_transcription_entities(element, text, confidence_score)
This function is an update of the original
process_element
function (https://gitlab.com/teklia/workers/dan/-/blob/main/worker_dan/worker.py#L260). Two parts have been added:- A call to the model preprocessing
- The creation of the batch (with padding)
After testing this code with the DAN POPP single page model, I obtained the following logs/results:
2023-02-21 08:20:48,386 INFO/dan: MLflow Logging available. 2023-02-21 08:20:48,393 WARNING/arkindex_worker: Missing ARKINDEX_WORKER_RUN_ID environment variable, worker is in read-only mode 2023-02-21 08:20:48,393 INFO/arkindex_worker: Worker will use /home/mboillet/.local/share/arkindex as working directory 2023-02-21 08:20:54,772 INFO/arkindex_worker: Running with local configuration from dev.yml 2023-02-21 08:20:54,773 INFO/arkindex_worker: Starting ML report for Local worker 2023-02-21 08:20:54,774 WARNING/worker_dan.worker: No GPU available, using CPU 2023-02-21 08:20:55,317 INFO/worker_dan.worker: Registered tokens : ['Ⓢ', 'Ⓕ', 'Ⓑ', 'Ⓛ', 'Ⓝ', 'Ⓒ', 'Ⓚ', 'Ⓔ', 'Ⓞ', 'Ⓟ'] 2023-02-21 08:20:55,317 INFO/arkindex_worker: No worker activity will be stored as it is disabled for this process 2023-02-21 08:20:56,570 INFO/arkindex_worker: Processing page AD075DP_D2M8_273_0077_left_page.tif (8fa7330a-f971-4f21-a6a5-76d3627177c0) (1/1) 2023-02-21 08:20:56,570 INFO/worker_dan.worker: Downloading image... 2023-02-21 08:20:56,642 INFO/worker_dan.worker: Image loaded. 2023-02-21 08:20:56,727 INFO/worker_dan.worker: Downloading image... 2023-02-21 08:20:56,779 INFO/worker_dan.worker: Image loaded. 2023-02-21 08:20:57,012 INFO/worker_dan.worker: Predicting NER tags... 2023-02-21 08:21:09,676 INFO/root: Images processed 8fa7330a-f971-4f21-a6a5-76d3627177c0 ⓈSaurin ⒻRobert Ⓑ22 [0.9999556541442871, 0.9999924898147583, 0.999996542930603, 0.9999899864196777, 0.9999990463256836, 1.0, 1.0, 0.9999998807907104, 0.9999998807907104, 1.0, 0.9999983310699463, 0.9999997615814209, 0.9999995231628418, 0.9999978542327881, 1.0, 0.9999998807907104, 0.9999994039535522, 0.9999997615814209, 0.9999995231628418] e3647d68-bd0b-41ca-aace-dd4e6b94568e ⓈBuge ⒻJules Ⓑ68 ⓁC [0.9999412298202515, 0.9999886751174927, 0.9999793767929077, 1.0, 0.9999942779541016, 0.9999947547912598, 0.9999991655349731, 0.9999998807907104, 0.9999998807907104, 1.0, 0.9999996423721313, 1.0, 1.0, 0.9999998807907104, 0.9999995231628418, 1.0, 0.9999994039535522, 0.9999982118606567, 0.9993196725845337] 2023-02-21 08:21:09,680 INFO/arkindex_worker: Saving ML report to /home/mboillet/.local/share/arkindex/ml_report.json
Here both images are processed in a batch and I print the image id, predicted characters and the confidences. For simplicity, I stopped the process at 20 predicted characters.
mentioned in merge request !62 (merged)
We need to keep supporting the single image mode as well. Do we only need to pass
input_tensor=torch.tensor(image), input_sizes=input_image.shape[:2]
to this newpredict
method to make it work? Or does your processing code work already when we use a single image?Edited by Yoann SchneiderIt still works for a single image, as long as the dimensions are correct:
- Input tensor should be of size
batch x 3 x H x W
, with batch = 1 in this case - Input sizes should be of size
batch x 2
, with batch = 1 in this case
To add this first batch dimension, we can use
.unsqueeze(0)
function on a tensor. It adds a dimension at the first position of the tensor. So if the input image is of shape3 x H x W
, applyingimage.unsqueeze(0)
will give a tensor of shape1 x 3 x H x W
.Edited by Mélodie Boillet- Input tensor should be of size