-
Yoann Schneider authoredYoann Schneider authored
Releases
0.3.4
Released on 14 Sept 2023 • View on Gitlab
- The worker template was updated to correctly install Git submodules if it depends on any.
- Base-worker now uses ruff for linting. This tool replaces
isort
andflake8
. - New Arkindex API helper to update an element, calling PartialUpdateElement.
- New Arkindex API helper to list an element's parents, calling ListElementParents.
- Worker Activity API is now disabled when the worker runs in
read-only
mode instead of relying on the--dev
CLI argument. The update_activity API helper was updated following Arkindex 1.5.1 changes. - Worker can now resize the image of an element when opening them. This uses the IIIF resizing API.
0.3.3
Released on 26 May 2023 • View on Gitlab
- The
Timer
class previously defined inarkindex_worker.utils
was removed as it was already defined Teklia's python toolbox.
# Old usage
from arkindex_worker.utils import Timer
# New usage
from teklia_toolbox.time import Timer
- The create_element_transcriptions API helper now accepts an
element_confidence
float field in the dictionaries provided through thetranscriptions
field. This confidence will be set on the created element. - More query filters are available on the list_element_children API helper. More details about their usage is available in the documentation:
transcription_worker_version
transcription_worker_run
with_metadata
worker_run
-
Arkindex Base-Worker
now fully uses pathlib to handle filesystem paths as suggested by PEP 428. - Many helpers were added to handle ZSTD and TAR archives as well as delete files cleanly. More details about that in the documentation of the arkindex_worker.utils module.
- A bug affecting the parsing of the configuration of workers that use a Machine learning model stored on an Arkindex instance was fixed.
0.3.2
Released on 8 March 2023 • View on Gitlab
- A helper to use the new API endpoint to create transcription entities more efficiently was implemented.
- Training workers may now publish a model configuration when creating a new model version on Arkindex. This will make the execution of a generic worker much smoother.
- The model version API endpoints were updated in the latest Arkindex release and a new helper was introduced subsequently. However, there are no breaking changes and the main helper,
publish_model_version
, still has the same signature and behaviour. - The latest Arkindex release changed the way NER entities are stored and published.
- The
EntityType
enum was removed as type slug are no longer restrcited to a small options, - create_entity now expects a type slug as a String,
- a new helper list_corpus_entity_types was added to load the Entity types in the corpus,
- a new helper check_required_entity_types to make sure that needed entity types are available in the corpus was added. Missing ones are created by default (this can be disabled).
- The
- The create_classifications helper now expects the UUID of each MLClass instead of their name.
- In developer mode, the only way to set the
corpus_id
attribute is to use theARKINDEX_CORPUS_ID
environment variable. When it's not set, all API requests using thecorpus_id
as path parameter will fail with500
status code. A warning log was added to help developers troubleshoot this error by advising them to set this variable. - The create_transcriptions helper no longer makes the API call in developer mode. This behaviour aligns with all other publication helpers.
- Fixes hash computation when publishing a model using publish_model_version.
- If a process is linked to a model version, its id will be available to the worker through its
model_version_id
attribute. - The URLs of the API endpoint related to Ponos were changed in the latest Arkindex release. Some changes were needed in the test suite.
- The
classes
attribute no directly contains the classes of the corpus of the processed element.
# Old usage
self.classes = {
"corpus_id": {
"ml_class_1": "class_uuid",
...
}
}
# New usage
self.classes = {
"ml_class_1": "class_uuid",
...
}
0.3.1
Released on 8 November 2022 • View on Gitlab
- A breaking change, affecting mostly the API, was introduced in Arkindex's 1.3.4 release:
- Workers were mostly unaffected but the REST schema was updated.
- Workers will progressively not be able to publish results with a
worker_version_id
anymore on Arkindex. They will have to use a related but more general field,worker_run_id
:- Most publication API endpoint helpers have been updated accordingly,
- A new version of the cache was released with the updated Django models.
- Improvements to our Machine Learning training API to allow workers to use models published on Arkindex.
- Support workers that have no configuration.
- Allow publishing metadatas with falsy but non-null values.
- Add
.polygon
attribute shortcut onElement
. - Add a major test speedup on our worker template.
- Support cache usage on our metadata API endpoint helpers.
- Drop support for Python 3.6 and add support for Python 3.11.
- Update arkindex-client to version 1.0.11.
- Update shapely to version 1.8.5-post1
0.3.0
Released on 12 September 2022 • View on Gitlab
- A large refactoring effort was made on the worker initialization, to streamline most of the workflow:
- developer setup is now set in a dedicated method
configure_for_developers
- cache setup is now set in a dedicated method
configure_cache
- deprecated useless attribute
features
- add a simpler debug mode for developers
- depend only on Arkindex
RetrieveWorkerRun
API to get all the information needed, instead of relying on multiple API calls. - remove
ARKINDEX_CORPUS_ID
environment variable usage, replaced by corpus information from API, except for developers - do not erase defaults when reading configuration
- developer setup is now set in a dedicated method
- Support new Machine Learning training APIs on Arkindex to allow workers to create model versions and publish them as zstandard archives on a remote S3-compatible bucket.
- Add API helpers
list_corpus_entities
create_metadatas
list_metadata
list_transcription_entities
create_required_types
publish_model_version
create_model_version
upload_to_s3
- Create missing element types when checking if they are available on the Arkindex instance (disabled by default).
- Update arkindex-client to version 1.0.9.
- Update automated rotation code (
revert_orientation
) to support reverse application
0.2.4
Released on 6 July 2022 • View on Gitlab
- Document source code using Sphinx and docstrings with parameters. Documentation is available here.
- Update workers inner
config
with default values fromuser_configuration
- Support confidence in API helpers
create_sub_element
andcreate_elements
as they are not available in Arkindex - Port rotation code from tesseract worker
- Add helper to trim polygons so that they fit inside their image
0.2.3
Released on 28 March 2022 • View on Gitlab
- Update arkindex-client to version 1.0.8.
- Replace all transcription scores with confidences (also renamed on Arkindex)
- Support cache versioning and detect compatibility in workers
- Support confidence in
create_transcription_entity
API helper - Support Text orientation for transcriptions
- Return the response payload in all creation helpers so that workers can use them
- Support new metadata type
URL
0.2.2
Released on 17 September 2021 • View on Gitlab
- Update arkindex-client to version 1.0.7.
- Detect already processed elements using worker activity, and skip them
- Support rotation, mirroring and fix image crop in
open_image
method used by a lot of workers - Change default value for
user_configuration
fromNone
to{}
which simplifies usage code in workers - Support new metadata type
Numeric
- Add API helper
create_classifications
- Set worker version in transcription entities API helpers
0.2.1
Released on 30 June 2021 • View on Gitlab
- Add API helper
check_required_types
- Add a developer mode via
--dev
argument to simplify boot process for local development - Send
process_id
when updating worker activities - Remove
nb_best
from ML classes list as it's not supported anymore by Arkindex
0.2.0
Released on 6 May 2021 • View on Gitlab
This is a larger release which brings a new caching system to share data across workers (avoiding a lot of API calls in some workflows), and split the codebase in multiple files for helpers & unit tests (one file per topic).
- Add cache system using a local SQLite database, shared from workers to workers. Currently supports Arkindex models:
- elements and their hierarchy,
- transcriptions,
- images,
- classifications,
- entities,
- Add API helpers:
create_elements
create_transcriptions
create_transcription_entity
- Split ElementsWorker API helpers and unit tests in sub files
- Drop
TranscriptionType
&DataSource
as they are not used anymore in Arkindex - Retry all managed API calls that result in a 50x
0.1.14
Released on 8 April 2021 • View on Gitlab
- Support weak SSL DH key when downloading images (needed for some outdated IIIF servers with old SSL certs).
0.1.13
Released on 2 March 2021 • View on Gitlab
- Support new Arkindex feature Worker Activity, to track process progress.
- Add new API helpers:
list_element_children
list_transcriptions
create_metadata
- Extend git support with merge & rebase operations
- Allow any worker type in cookiecutter template
0.1.12
Released on 8 December 2020 • View on Gitlab
- Bugfix to avoid loading remote images from local file system
- Deprecate
TranscriptionType
.
0.1.11
Released on 26 November 2020 • View on Gitlab
- Update arkindex-client to version 1.0.6.
0.1.10
Released on 23 November 2020 • View on Gitlab
- Support git base operations to allow workers to clone and checkout repositories
- Setup automated CI task to update Python dependencies
- Update arkindex-client to version 1.0.5.
0.1.9
Released on 19 October 2020 • View on Gitlab
- Update arkindex-client to version 1.0.4.
- Add API helpers:
get_worker_version
get_worker_version_slug
get_ml_result_slug
0.1.8
Released on 30 September 2020 • View on Gitlab
- Update arkindex-client to version 1.0.3.
0.1.7
Released on 30 September 2020 • View on Gitlab
- Support Arkindex secrets for workers, using API but also local storage for developers. More information on Arkindex documentation.
- Do not crash when a worker tries to create a classification that already exists.
0.1.6
Released on 23 September 2020 • View on Gitlab
- Automatically create missing Arkindex ML classes when using
get_ml_class_id
and creating classifications through API helpers. - Update arkindex-client to version 1.0.2.
0.1.5
Released on 22 September 2020 • View on Gitlab
- Update arkindex-client to version 1.0.1.
- Bugfix on score & confidence type checks in api helpers
0.1.4
Released on 2 September 2020 • View on Gitlab
- Load worker configuration from Arkindex API, or local file (for developers)
- Add API helpers:
load_corpus_classes
get_ml_class_id
0.1.3
Released on 25 August 2020 • View on Gitlab
- Add API helper
create_element_transcriptions
- Return created instance ID in API helpers
- Add cookiecutter variables to be able to easily rebuild
0.1.2
Released on 19 August 2020 • View on Gitlab
- Use
WORKER_VERSION_ID
environment var in helper methods to identify the worker automatically - Add API helpers:
create_transcription
create_classification
create_entity
- Extend cookiecutter template to generate clean Python packages
- Add the
Timer
helper class in tools submodule
0.1.1
Released on 7 August 2020 • View on Gitlab
- Add API helper
create_sub_element
- Add unit tests in cookiecutter template & base project.
- Change cookiecutter base to use ElementsWorker
0.1.0
Released on 21 July 2020 • View on Gitlab
Initial version of the base worker, with cookiecutter support to easily create workers using this project.