Skip to content
Snippets Groups Projects

Releases

0.3.6

Released on 22 Dec 2023 • View on Gitlab

Breaking changes

  • The arkindex_worker.git module was removed. It was not used locally by any workers, this module was only used to expose some workflows from python-gitlab. Please refer to their documentation if your worker needs to communicate with a Git instance.

  • Following Arkindex's 1.5.3 release, the model_usage configuration parameter has been updated to a tri-enum. To migrate your workers:

    • model_usage: false becomes model_usage: disabled
    • model_usage: true becomes model_usage: required

    The supported value means that the model is supported by a worker but not required to make it work.

Project architecture

  • PEP 621 encourages user to store most of the package's metadata in the pyproject.toml. We followed this proposition both for the arkindex-worker package and the worker template.

Arkindex API

  • The details of the model available to the worker is now stored under the model_details attribute.
  • The list_corpus_entities API helper now stores the entities in the entities attribute instead of returning them.
  • A reminder was added to prevent making changes to the Arkindex Cache schema without bumping the Version of said cache.
  • Each dataset's archive is now properly deleted after processing.
  • The path to a Dataset's archive is now stored under the filepath property.
  • The new create_element_parent API helper allows to create a link between two elements.
  • The create_sub_element was updated to support creating children element without zones and under a parent without a zone.
  • A new user configuration type was introduced to be able to select Arkindex Models. Learn more about it in the documentation.

Worker template

  • When the provided slug had more than one word, it was invalid for either:

    • the package name, because the user used _ as word delimiter,
    • the module directory's name, because the user used - as word delimiter.

    The package name and the module directory's name are now both computed from the slug, making sure that:

     - the package name uses `-` as word delimiter,
     - the module directory's name uses `_` as word delimiter.

Documentation

  • A link to the documentation was added:

    • in the README,
    • as a GitLab badge on the repo.
  • Some sections in the documentation were renamed to improve readability.

Misc

0.3.5

Released on 8 Nov 2023 • View on Gitlab

Breaking changes

  • The arkindex_worker.reporting module has been removed as the JSON report file was no longer needed.
  • The --model-dir CLI argument was renamed to --extras-dir as it was more suited to its use. This folder now stores dataset archives, hence the more generic name.

Arkindex API

  • Following Arkindex 1.5.2 release,
    • new helpers for Task-related endpoints were introduced,
    • A new worker class is available, to support Dataset processes
    • new helpers for Dataset-related endpoints were introduced,
  • Added a unicity check on the input of the create_transcription_entities helper.
  • The partial_update_element helper was updated to better match the endpoint.

Documentation

  • Some modules were poorly displayed in the documentation. Class methods are now only listed under their class's section.

Release Management

  • A Makefile was added to the worker template to deploy new releases more easily. The default branch expects master, make sure to change it to main depending on your settings.
  • The base image used in the worker's docker image was changed from python:3.11 to python:3.11-slim, in an effort to reduce their size.

Misc

  • During the configuration stage, a summary of the worker is now logged instead of the revision's hash. This was changed to support workers not linked to any revision on Arkindex.
  • A retry mechanism on HTTP 50x errors was added. Additionally, when the requested size exceeds the maximum size allowed by the IIIF server, a new try is done with max instead of full as size parameter. More information about these parameters in the IIIF documentation.
  • When running the worker locally without the ARKINDEX_CORPUS_ID variable set in the environment, an explicit exception will be raised when trying to access the corpus_id attribute.
  • This release adds support for Python 3.12.

0.3.4

Released on 14 Sept 2023 • View on Gitlab

  • The worker template was updated to correctly install Git submodules if it depends on any.
  • Base-worker now uses ruff for linting. This tool replaces isort and flake8.
  • New Arkindex API helper to update an element, calling PartialUpdateElement.
  • New Arkindex API helper to list an element's parents, calling ListElementParents.
  • Worker Activity API is now disabled when the worker runs in read-only mode instead of relying on the --dev CLI argument. The update_activity API helper was updated following Arkindex 1.5.1 changes.
  • Worker can now resize the image of an element when opening them. This uses the IIIF resizing API.

0.3.3

Released on 26 May 2023 • View on Gitlab

  • The Timer class previously defined in arkindex_worker.utils was removed as it was already defined Teklia's python toolbox.
# Old usage
from arkindex_worker.utils import Timer
# New usage
from teklia_toolbox.time import Timer
  • The create_element_transcriptions API helper now accepts an element_confidence float field in the dictionaries provided through the transcriptions field. This confidence will be set on the created element.
  • More query filters are available on the list_element_children API helper. More details about their usage is available in the documentation:
    • transcription_worker_version
    • transcription_worker_run
    • with_metadata
    • worker_run
  • Arkindex Base-Worker now fully uses pathlib to handle filesystem paths as suggested by PEP 428.
  • Many helpers were added to handle ZSTD and TAR archives as well as delete files cleanly. More details about that in the documentation of the arkindex_worker.utils module.
  • A bug affecting the parsing of the configuration of workers that use a Machine learning model stored on an Arkindex instance was fixed.

0.3.2

Released on 8 March 2023 • View on Gitlab

  • A helper to use the new API endpoint to create transcription entities more efficiently was implemented.
  • Training workers may now publish a model configuration when creating a new model version on Arkindex. This will make the execution of a generic worker much smoother.
  • The model version API endpoints were updated in the latest Arkindex release and a new helper was introduced subsequently. However, there are no breaking changes and the main helper, publish_model_version, still has the same signature and behaviour.
  • The latest Arkindex release changed the way NER entities are stored and published.
    • The EntityType enum was removed as type slug are no longer restrcited to a small options,
    • create_entity now expects a type slug as a String,
    • a new helper list_corpus_entity_types was added to load the Entity types in the corpus,
    • a new helper check_required_entity_types to make sure that needed entity types are available in the corpus was added. Missing ones are created by default (this can be disabled).
  • The create_classifications helper now expects the UUID of each MLClass instead of their name.
  • In developer mode, the only way to set the corpus_id attribute is to use the ARKINDEX_CORPUS_ID environment variable. When it's not set, all API requests using the corpus_id as path parameter will fail with 500 status code. A warning log was added to help developers troubleshoot this error by advising them to set this variable.
  • The create_transcriptions helper no longer makes the API call in developer mode. This behaviour aligns with all other publication helpers.
  • Fixes hash computation when publishing a model using publish_model_version.
  • If a process is linked to a model version, its id will be available to the worker through its model_version_id attribute.
  • The URLs of the API endpoint related to Ponos were changed in the latest Arkindex release. Some changes were needed in the test suite.
  • The classes attribute no directly contains the classes of the corpus of the processed element.
# Old usage
self.classes = {
    "corpus_id": {
        "ml_class_1": "class_uuid",
        ...
    }
}

# New usage
self.classes = {
    "ml_class_1": "class_uuid",
    ...
}

0.3.1

Released on 8 November 2022 • View on Gitlab

  • A breaking change, affecting mostly the API, was introduced in Arkindex's 1.3.4 release:
    • Workers were mostly unaffected but the REST schema was updated.
  • Workers will progressively not be able to publish results with a worker_version_id anymore on Arkindex. They will have to use a related but more general field, worker_run_id:
    • Most publication API endpoint helpers have been updated accordingly,
    • A new version of the cache was released with the updated Django models.
  • Improvements to our Machine Learning training API to allow workers to use models published on Arkindex.
  • Support workers that have no configuration.
  • Allow publishing metadata with falsy but non-null values.
  • Add .polygon attribute shortcut on Element.
  • Add a major test speedup on our worker template.
  • Support cache usage on our metadata API endpoint helpers.
  • Drop support for Python 3.6 and add support for Python 3.11.
  • Update arkindex-client to version 1.0.11.
  • Update shapely to version 1.8.5-post1

0.3.0

Released on 12 September 2022 • View on Gitlab

  • A large refactoring effort was made on the worker initialization, to streamline most of the workflow:
    • developer setup is now set in a dedicated method configure_for_developers
    • cache setup is now set in a dedicated method configure_cache
    • deprecated useless attribute features
    • add a simpler debug mode for developers
    • depend only on Arkindex RetrieveWorkerRun API to get all the information needed, instead of relying on multiple API calls.
    • remove ARKINDEX_CORPUS_ID environment variable usage, replaced by corpus information from API, except for developers
    • do not erase defaults when reading configuration
  • Support new Machine Learning training APIs on Arkindex to allow workers to create model versions and publish them as zstandard archives on a remote S3-compatible bucket.
  • Add API helpers
    • list_corpus_entities
    • create_metadatas
    • list_metadata
    • list_transcription_entities
    • create_required_types
    • publish_model_version
    • create_model_version
    • upload_to_s3
  • Create missing element types when checking if they are available on the Arkindex instance (disabled by default).
  • Update arkindex-client to version 1.0.9.
  • Update automated rotation code (revert_orientation) to support reverse application

0.2.4

Released on 6 July 2022 • View on Gitlab

  • Document source code using Sphinx and docstrings with parameters. Documentation is available here.
  • Update workers inner config with default values from user_configuration
  • Support confidence in API helpers create_sub_element and create_elements as they are not available in Arkindex
  • Port rotation code from tesseract worker
  • Add helper to trim polygons so that they fit inside their image

0.2.3

Released on 28 March 2022 • View on Gitlab

  • Update arkindex-client to version 1.0.8.
  • Replace all transcription scores with confidences (also renamed on Arkindex)
  • Support cache versioning and detect compatibility in workers
  • Support confidence in create_transcription_entity API helper
  • Support Text orientation for transcriptions
  • Return the response payload in all creation helpers so that workers can use them
  • Support new metadata type URL

0.2.2

Released on 17 September 2021 • View on Gitlab

  • Update arkindex-client to version 1.0.7.
  • Detect already processed elements using worker activity, and skip them
  • Support rotation, mirroring and fix image crop in open_image method used by a lot of workers
  • Change default value for user_configuration from None to {} which simplifies usage code in workers
  • Support new metadata type Numeric
  • Add API helper create_classifications
  • Set worker version in transcription entities API helpers

0.2.1

Released on 30 June 2021 • View on Gitlab

  • Add API helper check_required_types
  • Add a developer mode via --dev argument to simplify boot process for local development
  • Send process_id when updating worker activities
  • Remove nb_best from ML classes list as it's not supported anymore by Arkindex

0.2.0

Released on 6 May 2021 • View on Gitlab

This is a larger release which brings a new caching system to share data across workers (avoiding a lot of API calls in some workflows), and split the codebase in multiple files for helpers & unit tests (one file per topic).

  • Add cache system using a local SQLite database, shared from workers to workers. Currently supports Arkindex models:
    • elements and their hierarchy,
    • transcriptions,
    • images,
    • classifications,
    • entities,
  • Add API helpers:
    • create_elements
    • create_transcriptions
    • create_transcription_entity
  • Split ElementsWorker API helpers and unit tests in sub files
  • Drop TranscriptionType & DataSource as they are not used anymore in Arkindex
  • Retry all managed API calls that result in a 50x

0.1.14

Released on 8 April 2021 • View on Gitlab

  • Support weak SSL DH key when downloading images (needed for some outdated IIIF servers with old SSL certs).

0.1.13

Released on 2 March 2021 • View on Gitlab

  • Support new Arkindex feature Worker Activity, to track process progress.
  • Add new API helpers:
    • list_element_children
    • list_transcriptions
    • create_metadata
  • Extend git support with merge & rebase operations
  • Allow any worker type in cookiecutter template

0.1.12

Released on 8 December 2020 • View on Gitlab

  • Bugfix to avoid loading remote images from local file system
  • Deprecate TranscriptionType.

0.1.11

Released on 26 November 2020 • View on Gitlab

0.1.10

Released on 23 November 2020 • View on Gitlab

  • Support git base operations to allow workers to clone and checkout repositories
  • Setup automated CI task to update Python dependencies
  • Update arkindex-client to version 1.0.5.

0.1.9

Released on 19 October 2020 • View on Gitlab

  • Update arkindex-client to version 1.0.4.
  • Add API helpers:
    • get_worker_version
    • get_worker_version_slug
    • get_ml_result_slug

0.1.8

Released on 30 September 2020 • View on Gitlab

0.1.7

Released on 30 September 2020 • View on Gitlab

  • Support Arkindex secrets for workers, using API but also local storage for developers. More information on Arkindex documentation.
  • Do not crash when a worker tries to create a classification that already exists.

0.1.6

Released on 23 September 2020 • View on Gitlab

  • Automatically create missing Arkindex ML classes when using get_ml_class_id and creating classifications through API helpers.
  • Update arkindex-client to version 1.0.2.

0.1.5

Released on 22 September 2020 • View on Gitlab

  • Update arkindex-client to version 1.0.1.
  • Bugfix on score & confidence type checks in api helpers

0.1.4

Released on 2 September 2020 • View on Gitlab

  • Load worker configuration from Arkindex API, or local file (for developers)
  • Add API helpers:
    • load_corpus_classes
    • get_ml_class_id

0.1.3

Released on 25 August 2020 • View on Gitlab

  • Add API helper create_element_transcriptions
  • Return created instance ID in API helpers
  • Add cookiecutter variables to be able to easily rebuild

0.1.2

Released on 19 August 2020 • View on Gitlab

  • Use WORKER_VERSION_ID environment var in helper methods to identify the worker automatically
  • Add API helpers:
    • create_transcription
    • create_classification
    • create_entity
  • Extend cookiecutter template to generate clean Python packages
  • Add the Timer helper class in tools submodule

0.1.1

Released on 7 August 2020 • View on Gitlab

  • Add API helper create_sub_element
  • Add unit tests in cookiecutter template & base project.
  • Change cookiecutter base to use ElementsWorker

0.1.0

Released on 21 July 2020 • View on Gitlab

Initial version of the base worker, with cookiecutter support to easily create workers using this project.