Skip to content
Snippets Groups Projects
Commit c697ce22 authored by Eva Bardou's avatar Eva Bardou :frog: Committed by Bastien Abadie
Browse files

Write a tutorial on how to generate ground truth data for segmentation training

parent 92ad87ee
No related branches found
No related tags found
1 merge request!116Write a tutorial on how to generate ground truth data for segmentation training
Pipeline #171498 passed
content/tutorial/callico_track_import.jpg

54.1 KiB

+++
title = "Creating Ground Truth for segmentation"
title = "Creating ground truth for segmentation"
weight = 50
draft = true
+++
- dataset creation
- send in callico
- annotation campaign for lines
- publish back on arkindex
\ No newline at end of file
In this tutorial, you will learn how to create ground truth data for segmentation training using [the Callico collaborative annotation platform](https://teklia.com/our-solutions/callico/).
This section should be followed and carried out after completing the first steps described on [this page](@/tutorial/corpus.md).
## Preliminary requirement
You are about to learn how to manually segment text lines and illustrations on pages from the [**PELLET Casimir Marius**](https://europeana.transcribathon.eu/documents/story/?story=121795) dataset using Callico.
Prior to working with Callico, you have to create a new [element type](@/project/type.md) in Arkindex. To do so, navigate to your Arkindex project details page that you have created in the previous steps.
Once on your project details page, open the `Types` tab and add a new type as presented below:
* **Display name** - The name of your element type, you can use `Illustration`.
* **Slug** - The slug of your element type, you can use `illustration`.
* **Folder** - Whether your type is a folder or not, you have to leave it unchecked.
* **Color** (*optional*) - The color to display elements holding this type in Arkindex.
* **Actions** - Click on the `+` button to validate your type creation.
{{ figure(image="tutorial/arkindex_illustration_type.jpg", height=400, caption="Create the illustration type in Arkindex") }}
As you can see, the `Text line` element type already exists by default, so you don't need to create it.
## Import data to segment in Callico
{% info() %}
This section expects you to have a Callico account. You can follow [this link](https://demo.callico.eu/users/signup/) to register on Callico's demonstration instance.
{% end %}
### Create a Callico project
Once logged in to Callico, you have to create a new annotation project by clicking the `Create a project` green button at the top-right of [the homepage](https://demo.callico.eu/projects/).
{{ figure(image="tutorial/callico_homepage.jpg", height=500, caption="Callico's homepage") }}
Then, fill in the creation form as presented below:
{{ figure(image="tutorial/callico_create_project.jpg", height=500, caption="Callico's pre-filled project creation form") }}
* **Name** - Name of your project, should be unique, **do not copy** word by word the name from the screenshot above.
* **Description** (*optional*) - Description of your project, it supports Markdown.
* **Illustration** (*optional*) - Illustration of your project to display on various pages across Callico, you can use [this image](https://storage.teklia.com/tools-hedgedoc-uploads/uploads/713e4c86-68da-4c34-86f7-5e50b31cd3e7.jpeg) for example.
* **Provider** - Provider from which to import and export your project data, in this tutorial we will pick `Arkindex demo`.
* **Object identifier in provider** - The identifier of the object, in the provider, your Callico project should be linked to. As shown in the screenshot above, we are talking about an Arkindex project UUID, you have to replace the `aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee` value by yours which can be copied from your Arkindex project details page, just below its name.
{{ figure(image="tutorial/arkindex_project_id.jpg", height=200, caption="Find your project's UUID on Arkindex") }}
### Import an Arkindex dataset
Once your project is created, you are ready to an Arkindex dataset and its elements to Callico.
To do so, you can click on the `Import from Arkindex` action in the **Elements** section from the menu on the left of the project details page:
{{ figure(image="tutorial/callico_project_details.jpg", height=500, caption="Callico's project details page") }}
Then, fill in the import form as presented below, to import all `page` elements from your dataset containing data scraped from Europeana:
{{ figure(image="tutorial/callico_import_dataset.jpg", height=800, caption="Callico's pre-filled Arkindex import form") }}
* **Process name** - Name of your process to import elements from Arkindex.
* ~~**Element**~~ - To ignore in this tutorial.
* **Dataset** - UUID of your Arkindex dataset, you have to replace the `aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee` value by yours which can be copied from your Arkindex dataset details page, just below its name.
{{ figure(image="tutorial/arkindex_dataset_id.jpg", height=200, caption="Find your dataset's UUID on Arkindex") }}
* **Filter sets to import** - To filter the sets from your Arkindex dataset that should be imported to Callico, we do not use it in this tutorial.
* **Filter types to import** - To filter the Arkindex elements to import by their type, here we only want to import `page` elements from your dataset.
* All subsequent fields from this form are to be ignored in this tutorial, you can learn more about them in [a dedicated page from Callico's documentation](https://doc.callico.eu/projects/elements/import/arkindex/) if you wish to.
#### Track your import's progress
Once you have started the data import, you will be redirected to a new page allowing you to track its progress. Be aware that this page is not dynamically refreshed, you need to reload it manually to view updated status and logs.
{{ figure(image="tutorial/callico_track_import.jpg", height=500, caption="Track the Arkindex import progress in Callico") }}
## Create segmentation tasks to annotate
While Arkindex elements are being imported in your Callico project, you can start setting up your annotation campaign.
### Create a segmentation campaign
First, navigate back to your Callico's project details page by using the navbar at the top of the page and clicking on your project's name.
{{ figure(image="tutorial/callico_process_back_to_project.jpg", height=200, caption="Navigate back to your project from the process page") }}
From there, you can click on the `Create` action in the **Campaigns** section from the menu on the left of the project details page:
{{ figure(image="tutorial/callico_project_details.jpg", height=500, caption="Callico's project details page") }}
Then, fill in the creation form as presented below:
{{ figure(image="tutorial/callico_create_campaign.jpg", height=500, caption="Callico's pre-filled campaign creation form") }}
* **Name** - Name of your campaign, you can copy the name from the screenshot above.
* **Mode** - [Mode of your campaign](https://doc.callico.eu/campaigns/), you have to pick the `Elements` one to follow this tutorial.
* **Description** (*optional*) - Description of your campaign, it supports Markdown.
### Configure the newly created campaign
Once your segmentation campaign is created, you will be redirected to its configuration page, fill in the configuration form as presented below:
{{ figure(image="tutorial/callico_configure_campaign.jpg", height=700, caption="Callico's pre-filled campaign configuration form") }}
* **Name** - Keep it unchanged.
* **Description** (*optional*) - Keep it unchanged.
* **Number of tasks to assign per volunteer** - Set this field value to `10`, this will allow annotators to request tasks by batch containing 10 pages.
* ~~**Number of allowed assignments for available tasks**~~ - Ignore this field.
* **Element types to use to annotate** - The element types that will be used to draw elements on your pages, here we want to only keep `Illustration` and `Text line` types and uncheck all others.
### Create annotation tasks
After configuring your campaign, you will be redirected to its details page. From which, you will be able to access the form to create annotation tasks by clicking the `Create` action in the **Tasks** section from the menu on the left:
{{ figure(image="tutorial/callico_campaign_details.jpg", height=500, caption="Callico's campaign details page") }}
{% warning() %}
Please [make sure your import process is complete](@/tutorial/segmentation-ground-truth.md#track-your-import-s-progress) before creating your annotation tasks, otherwise you may miss pages during annotation.
{% end %}
Then, fill in the creation form as presented below:
{{ figure(image="tutorial/callico_create_tasks.jpg", height=500, caption="Callico's pre-filled task creation form") }}
* **Element type** - The element type to create your tasks on, here we are annotating `Pages`.
* ~~**Users**~~ - Ignore this field (do not worry, we will add users to your project in the next step).
* **Sequential** - Keep it unchanged, pages will be annotated by preserving their ordering, e.g.: `page 1` before `page 2` before `page 3` and so on.
* **Elements to use** - Keep it unchanged, we want to annotate all the imported pages.
* ~~**Maximum number of tasks per user**~~ - Ignore this field.
* **Create unassigned tasks** - This option allow the creation of annotation tasks which will be requested by the annotators as they go, in this tutorial, you must check it.
Once tasks are created, you will be redirected to the task list which should contain many items, one for each page to annotate from your dataset.
You can navigate back to your Callico's project details page by using the navbar at the top of the page and clicking on your project's name.
{{ figure(image="tutorial/callico_tasks_back_to_project.jpg", height=300, caption="Navigate back to your project from the task list") }}
## Invite collaborators
To invite users to join your project, you can click on the `Invite users` action in the **Project** section from the menu on the left of the project details page:
{{ figure(image="tutorial/callico_project_details_v2.jpg", height=500, caption="Callico's project details page") }}
From there, you can copy the invite link of your project to your clipboard by clicking on the `Copy` button on the right:
{{ figure(image="tutorial/callico_invite_link.jpg", height=200, caption="Copy the invite link of your project") }}
Once the link is copied, you can share it with users to invite them to collaborate on your project.
By opening the link, they will be asked to login into Callico (or register if they do not already have an account) before being added as `Contributors` to your project, meaning they will have the rights to annotate tasks from the campaign you just created.
{{ figure(image="tutorial/callico_join_project.jpg", height=300, caption="Join the project as a contributor") }}
If you wish to annotate the tasks by yourself, you can:
1. **Logout from Callico**
2. Open the invite link that was just copied
3. **Register using a different email address**
4. Accept the invitation to join the project as a contributor
## Annotate the segmentation tasks
In this section, we will place ourselves in the shoes of a `Contributor` user whose role is to annotate tasks from one or more campaigns.
Once you have joined the project as a contributor, you can request tasks.
{{ figure(image="tutorial/callico_project_details_v3.jpg", height=300, caption="Callico's project details page for contributors") }}
To request a task on your campaign, you can click on the `My tasks` blue button displayed next to the **Segment text lines and illustrations from pages** campaign name on the current page:
{{ figure(image="tutorial/callico_access_task_list.jpg", height=100, caption="Access the campaign task list") }}
You will be redirected to the task list for this campaign, showing available tasks, i.e. the ones that can be requested for annotation. From there, you can click on the `Annotate` green button on any task to request it.
{{ figure(image="tutorial/callico_request_task.jpg", height=600, caption="Request a single task") }}
You will be redirected to the annotation page of this specific task.
{{ figure(image="tutorial/callico_request_task_redirection.jpg", height=200, caption="Redirection after requesting a single task") }}
You can [annotate](@/tutorial/segmentation-ground-truth.md#annotate-your-tasks) this task, before being redirected to your task list for this campaign.
{{ figure(image="tutorial/callico_contrib_task_list_v2.jpg", height=600, caption="Redirection to the task list after a single annotation") }}
From there, you can request one or more tasks to continue annotating.
{% info() %}
You can also requests 10 tasks simultaneously by clicking on the `Requests tasks` button instead of `My tasks` described above. You'll be redirected towards your first task, then to the next one in the stack.
{% end %}
### Annotate your tasks
Now that you know how to request tasks, you will learn how to annotate segmentation tasks. Here is an annotation page:
{{ figure(image="tutorial/callico_annotate.jpg", height=700, caption="Callico's annotation page for a segmentation campaign") }}
#### Select a type
At any time during the annotation, you can pick the element type that will be assigned to the next elements that you will draw. In this tutorial, we want to segment:
- illustrations and
- text lines.
{{ figure(image="tutorial/callico_annotate_pick_type.jpg", height=200, caption="Pick an element type before drawing elements") }}
#### Draw a rectangle
Once you have selected a type, you can start drawing rectangles and/or [polygons](@/tutorial/segmentation-ground-truth.md#draw-a-polygon).
To draw a rectangle, you need to select the `Rectangle` tool (highlighted in blue just below) and start dragging your mouse on the image:
{{ figure(image="tutorial/callico_annotate_draw_rectangle.jpg", height=400, caption="Drag your mouse to draw a rectangle") }}
Then, simply release your mouse to validate your drawing:
{{ figure(image="tutorial/callico_annotate_draw_rectangle_v2.jpg", height=200, caption="Release your mouse to validate your drawing") }}
#### Draw a polygon
Once you have selected a type, you can start drawing polygons and/or [rectangles](@/tutorial/segmentation-ground-truth.md#draw-a-rectangle).
To draw a polygon, you need to select the `Polygon` tool (highlighted in blue just below) and start adding points on the image by clicking your mouse:
{{ figure(image="tutorial/callico_annotate_draw_polygon.jpg", height=400, caption="Add points to draw a polygon") }}
Then, finish your polygon by clicking on the very first point you added:
{{ figure(image="tutorial/callico_annotate_draw_polygon_v2.jpg", height=200, caption="Click on the first added point to validate your drawing") }}
#### Edit drawn elements
If one of your rectangle or polygon is poorly drawn:
{{ figure(image="tutorial/callico_annotate_edit_element.jpg", height=200, caption="Poorly drawn rectangle") }}
You can edit its placement or its points using the `Mouse` tool (highlighted in blue just below):
{{ figure(image="tutorial/callico_annotate_edit_element_v2.jpg", height=100, caption="Edit an element placement and points") }}
#### Delete wrongfully drawn elements
{% danger() %}
Deleting a drawn element is irreversible, be careful when using this feature, you can always [edit an element](@/tutorial/segmentation-ground-truth.md#edit-drawn-elements) if it is poorly placed.
{% end %}
When adding elements, you may have forgotten to select the right element type. Such elements can be deleted using the `Trash` red icon displayed in the drawn elements table on the right side of the annotation page:
{{ figure(image="tutorial/callico_annotate_delete_element.jpg", height=150, caption="Delete the drawn illustration that should be a text line") }}
#### Other tools on the image component
To ease the annotation, a few other tools are at your disposal:
{{ figure(image="tutorial/callico_annotate_extra_tools.jpg", height=200, caption="Image component extra tools") }}
* A slider to `Zoom in` or `Zoom out` the image being worked on,
* An `Open in a new tab` tool to better visualize large images,
* Two `Rotate left` and `Rotate right` tools to pivot your image.
{% warning() %}
Do not forget to validate your task by clicking the `Submit` green button when you are done annotating.
{% end %}
### Correct an annotated task
If you submitted a task without finishing your annotation or wish to correct drawn elements, you can edit it by accessing the `Annotated` tab in your task list and clicking on the `Change annotation` green button:
{{ figure(image="tutorial/callico_contrib_task_list_v3.jpg", height=600, caption="Correct an annotated task") }}
You will be redirected to the task annotation page, pre-filled with the last annotation you made:
{{ figure(image="tutorial/callico_correct.jpg", height=400, caption="Annotation page pre-filled with a previous version") }}
In this case, we may have forgotten to segment the stamps as illustrations, we can add them on the image and submit a new version for our task:
{{ figure(image="tutorial/callico_correct_v2.jpg", height=400, caption="Add illustrations and submit a new version") }}
The last version of an annotation task is the one that will be exported to the provider, it is the one published back to [Arkindex](https://demo.arkindex.org/) in our tutorial.
## Track and export annotations back to Arkindex
{% info() %}
If necessary, logout from your `Contributor` account and login with your first email address.
{% end %}
Back to your `Manager` account, you will be able to track your segmentation campaign progress from its details page:
{{ figure(image="tutorial/callico_campaign_details_v2.jpg", height=400, caption="Callico's campaign details page showing the ongoing progress") }}
Once it is completed, i.e. when all tasks from this tutorial are annotated, you can proceed with the export to Arkindex.
### Export results to Arkindex
To export your results back to Arkindex, you have to click on the `To Arkindex` action in the **Export results** section from the menu on the left of the campaign details page.
Then, fill in the export form as presented below:
{{ figure(image="tutorial/callico_export_results.jpg", height=300, caption="Callico's pre-filled Arkindex export form") }}
* **Process name** - Name of your process to export annotations to Arkindex.
* **Status of tasks to be exported** - Pick the `Annotated` value to export your tasks.
* ~~**Force the republication of annotations**~~ - To ignore in this tutorial.
* ~~**Publish each annotation separately**~~ - To ignore in this tutorial.
#### Track your export's progress
Once you have started the results export, you will be redirected to a new page allowing you to track its progress. Be aware that this page is not dynamically refreshed, you need to reload it manually to view updated status and logs.
{{ figure(image="tutorial/callico_track_export.jpg", height=300, caption="Track the export to Arkindex progress in Callico") }}
#### Check that your Arkindex export went smoothly
Once your export process is complete, you should check that the annotations for your segmentation tasks were properly published to Arkindex by browsing your dataset elements:
{{ figure(image="tutorial/arkindex_dataset_viewer.jpg", height=500, caption="Arkindex dataset details page showing its elements") }}
Congratulations, you have successfully segmented pages in Callico and exported the annotations back to Arkindex!
{{ figure(image="tutorial/arkindex_published_text_lines.jpg", height=300, caption="Annotated text lines and illustrations are available in Arkindex") }}
## Next step
Now that ground truth has been annotated on Callico and gathered to Arkindex, you can proceed and [train a Machine Learning segmentation model](@/tutorial/segmentation-training.md).
......@@ -20,6 +20,15 @@ div.info {
margin-bottom: 1rem;
}
div.danger {
border-radius: .375em;
background: #ffebeb;
color: #940000;
padding: 1.5rem;
padding-bottom: 0.2rem;
margin-bottom: 1rem;
}
.page__content table {
margin-bottom: 2rem;
}
......
<div class="danger">
<strong>Danger</strong>
<br />
{{ body | markdown | safe }}
</div>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment