Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • workers/base-worker
1 result
Show changes
Showing
with 1208 additions and 246 deletions
# Models
::: arkindex_worker.models
# Reporting
::: arkindex_worker.reporting
# Generic Utilities
::: arkindex_worker.utils
# Releases
## 0.3.1
Released on **8 November 2022** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.3.1)
- A breaking change, affecting mostly the API, was introduced in [Arkindex's 1.3.4 release](https://teklia.com/solutions/arkindex/releases/1-3-4/):
- Workers were mostly unaffected but the REST schema was updated.
- Workers will progressively not be able to publish results with a `worker_version_id` anymore on Arkindex. They will have to use a related but more general field, `worker_run_id`:
- Most publication API endpoint helpers have been updated accordingly,
- A new version of the cache was released with the updated Django models.
- Improvements to our Machine Learning training API to allow workers to use models published on Arkindex.
- Support workers that have no configuration.
- Allow publishing metadatas with falsy but non-null values.
- Add `.polygon` attribute shortcut on `Element`.
- Add a major test speedup on our worker template.
- Support cache usage on our metadata API endpoint helpers.
- **Drop** support for Python 3.6 and **add** support for Python 3.11.
- Update arkindex-client to [version 1.0.11](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.11).
- Update shapely to [version 1.8.5-post1](https://github.com/shapely/shapely/releases/tag/1.8.5.post1)
## 0.3.0
Released on **12 September 2022** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.3.0)
- A large refactoring effort was made on the worker initialization, to streamline most of the workflow:
- developer setup is now set in a dedicated method `configure_for_developers`
- cache setup is now set in a dedicated method `configure_cache`
- deprecated useless attribute `features`
- add a simpler debug mode for developers
- depend only on Arkindex `RetrieveWorkerRun` API to get all the information needed, instead of relying on multiple API calls.
- remove `ARKINDEX_CORPUS_ID` environment variable usage, replaced by corpus information from API, except for developers
- do not erase defaults when reading configuration
- Support new Machine Learning training APIs on Arkindex to allow workers to create model versions and publish them as zstandard archives on a remote S3-compatible bucket.
- Add API helpers
- `list_corpus_entities`
- `create_metadatas`
- `list_metadata`
- `list_transcription_entities`
- `create_required_types`
- `publish_model_version`
- `create_model_version`
- `upload_to_s3`
- Create missing element types when checking if they are available on the Arkindex instance (disabled by default).
- Update arkindex-client to [version 1.0.9](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.9).
- Update automated rotation code (`revert_orientation`) to support reverse application
## 0.2.4
Released on **6 July 2022** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.2.4)
- Document source code using [Sphinx](https://www.sphinx-doc.org/en/master/) and docstrings with parameters. Documentation is available [here](https://teklia.gitlab.io/workers/base-worker/).
- Update workers inner `config` with default values from `user_configuration`
- Support confidence in API helpers `create_sub_element` and `create_elements` as they are not available in Arkindex
- Port rotation code from tesseract worker
- Add helper to trim polygons so that they fit inside their image
## 0.2.3
Released on **28 March 2022** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.2.3)
- Update arkindex-client to [version 1.0.8](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.8).
- Replace all transcription scores with confidences (also renamed on Arkindex)
- Support cache versioning and detect compatibility in workers
- Support confidence in `create_transcription_entity` API helper
- Support Text orientation for transcriptions
- Return the response payload in all creation helpers so that workers can use them
- Support new metadata type `URL`
## 0.2.2
Released on **17 September 2021** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.2.2)
- Update arkindex-client to [version 1.0.7](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.7).
- Detect already processed elements using worker activity, and skip them
- Support rotation, mirroring and fix image crop in `open_image` method used by a lot of workers
- Change default value for `user_configuration` from `None` to `{}` which simplifies usage code in workers
- Support new metadata type `Numeric`
- Add API helper `create_classifications`
- Set worker version in transcription entities API helpers
## 0.2.1
Released on **30 June 2021** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.2.1)
- Add API helper `check_required_types`
- Add a developer mode via `--dev` argument to simplify boot process for local development
- Send `process_id` when updating worker activities
- Remove `nb_best` from ML classes list as it's not supported anymore by Arkindex
## 0.2.0
Released on **6 May 2021** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.2.0)
This is a larger release which brings a new caching system to share data across workers (avoiding a lot of API calls in some workflows), and split the codebase in multiple files for helpers & unit tests (one file per topic).
- Add cache system using a local SQLite database, shared from workers to workers. Currently supports Arkindex models:
- elements and their hierarchy,
- transcriptions,
- images,
- classifications,
- entities,
- Add API helpers:
- `create_elements`
- `create_transcriptions`
- `create_transcription_entity`
- Split ElementsWorker API helpers and unit tests in sub files
- Drop `TranscriptionType` & `DataSource` as they are not used anymore in Arkindex
- Retry all managed API calls that result in a 50x
## 0.1.14
Released on **8 April 2021** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.14)
- Support weak SSL DH key when downloading images (needed for some outdated IIIF servers with old SSL certs).
## 0.1.13
Released on **2 March 2021** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.13)
- Support new Arkindex feature **Worker Activity**, to track process progress.
- Add new API helpers:
- `list_element_children`
- `list_transcriptions`
- `create_metadata`
- Extend git support with merge & rebase operations
- Allow any worker type in cookiecutter template
## 0.1.12
Released on **8 December 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.12)
- Bugfix to avoid loading remote images from local file system
- Deprecate `TranscriptionType`.
## 0.1.11
Released on **26 November 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.11)
- Update arkindex-client to [version 1.0.6](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.6).
## 0.1.10
Released on **23 November 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.10)
- Support git base operations to allow workers to clone and checkout repositories
- Setup automated CI task to update Python dependencies
- Update arkindex-client to [version 1.0.5](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.5).
## 0.1.9
Released on **19 October 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.9)
- Update arkindex-client to [version 1.0.4](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.4).
- Add API helpers:
- `get_worker_version`
- `get_worker_version_slug`
- `get_ml_result_slug`
## 0.1.8
Released on **30 September 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.8)
- Update arkindex-client to [version 1.0.3](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.3).
## 0.1.7
Released on **30 September 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.7)
- Support Arkindex secrets for workers, using API but also local storage for developers. More information on [Arkindex documentation](https://doc.arkindex.org/secrets/workers/).
- Do not crash when a worker tries to create a classification that already exists.
## 0.1.6
Released on **23 September 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.6)
- Automatically create missing Arkindex ML classes when using `get_ml_class_id` and creating classifications through API helpers.
- Update arkindex-client to [version 1.0.2](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.2).
## 0.1.5
Released on **22 September 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.5)
- Update arkindex-client to [version 1.0.1](https://gitlab.com/teklia/arkindex/api-client/-/releases/1.0.1).
- Bugfix on score & confidence type checks in api helpers
## 0.1.4
Released on **2 September 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.4)
- Load worker configuration from Arkindex API, or local file (for developers)
- Add API helpers:
- `load_corpus_classes`
- `get_ml_class_id`
## 0.1.3
Released on **25 August 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.3)
- Add API helper `create_element_transcriptions`
- Return created instance ID in API helpers
- Add cookiecutter variables to be able to easily rebuild
## 0.1.2
Released on **19 August 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.2)
- Use `WORKER_VERSION_ID` environment var in helper methods to identify the worker automatically
- Add API helpers:
- `create_transcription`
- `create_classification`
- `create_entity`
- Extend cookiecutter template to generate clean Python packages
- Add the `Timer` helper class in tools submodule
## 0.1.1
Released on **7 August 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.1)
- Add API helper `create_sub_element`
- Add unit tests in cookiecutter template & base project.
- Change cookiecutter base to use ElementsWorker
## 0.1.0
Released on **21 July 2020** • View on [Gitlab](https://gitlab.com/teklia/workers/base-worker/-/releases/0.1.0)
Initial version of the base worker, with cookiecutter support to easily create workers using this project.
site_name: Arkindex Workers
site_dir: public
theme:
name: material
# Branding
logo: assets/logo.png
favicon: assets/favicon.png
font:
text: Roboto
code: Roboto Mono
features:
- navigation.top
- navigation.tracking
- navigation.indexes
palette:
# Palette toggle for light mode
- media: "(prefers-color-scheme: light)"
scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode
# Palette toggle for dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
toggle:
icon: material/brightness-4
name: Switch to light mode
plugins:
- search
- autorefs
- mkdocstrings:
custom_templates: templates
handlers:
python:
import: # enable auto refs to the doc
- https://docs.python.org/3/objects.inv
- https://pillow.readthedocs.io/en/stable/objects.inv
- http://docs.peewee-orm.com/en/latest/objects.inv
- https://gnupg.readthedocs.io/en/latest/objects.inv
- https://shapely.readthedocs.io/en/stable/objects.inv
- https://tenacity.readthedocs.io/en/latest/objects.inv
options:
show_root_toc_entry: false
show_object_full_path: false
show_root_heading: yes
show_source: true
docstring_style: sphinx
merge_init_into_class: yes
show_category_heading: yes
separate_signature: yes
members_order: source
nav:
- Home: index.md
- How to create a worker:
- contents/workers/index.md
- Setting up a new worker: contents/workers/create.md
- Running your worker locally: contents/workers/run-local.md
- Maintaining a worker: contents/workers/maintenance.md
- GitLab CI for workers: contents/workers/ci/index.md
- YAML configuration: contents/workers/yaml.md
- Template structure: contents/workers/template-structure.md
- Using secrets in workers:
- contents/secrets/index.md
- Usage: contents/secrets/usage.md
- Python Reference:
- Base Worker: ref/base_worker.md
- Elements Worker: ref/elements_worker.md
- Arkindex API integration:
- ref/api/index.md
- Classification: ref/api/classification.md
- Element: ref/api/element.md
- Entity: ref/api/entity.md
- Metadata: ref/api/metadata.md
- Training: ref/api/training.md
- Transcription: ref/api/transcription.md
- WorkerVersion: ref/api/worker_version.md
- Models: ref/models.md
- Generic Utilities: ref/utils.md
- Git & Gitlab support: ref/git.md
- Image utilities: ref/image.md
- Reporting: ref/reporting.md
- Cache: ref/cache.md
- Releases: releases.md
- Documentation development: dev/build_docs.md
markdown_extensions:
- smarty
- toc:
permalink: True
- sane_lists
- pymdownx.highlight
- def_list # enable definition lists
- admonition # syntax coloration in code blocks
- codehilite
- pymdownx.details
- pymdownx.superfences
copyright: Copyright © Teklia
extra:
social:
- icon: fontawesome/regular/heart
name: Teklia official website
link: https://teklia.com
- icon: fontawesome/brands/gitlab
name: Git repository for this project
link: https://gitlab.com/teklia/workers/base-worker
- icon: fontawesome/brands/linkedin
name: Teklia @ LinkedIn
link: https://www.linkedin.com/company/teklia
arkindex-client==1.0.9
peewee==3.15.2
Pillow==9.2.0
python-gitlab==3.9.0
arkindex-client==1.0.11
peewee==3.15.4
Pillow==9.3.0
pymdown-extensions==9.9
python-gitlab==3.12.0
python-gnupg==0.5.0
sh==1.14.3
shapely==1.8.4
tenacity==8.0.1
zstandard==0.18.0
shapely==2.0.0
tenacity==8.1.0
zstandard==0.19.0
......@@ -23,7 +23,7 @@ setup(
author="Teklia",
author_email="contact@teklia.com",
url="https://teklia.com",
python_requires=">=3.6",
python_requires=">=3.7",
install_requires=install_requires,
extras_require={"docs": requirements("docs-requirements.txt")},
packages=find_packages(),
......
pytest==7.1.2
pytest-mock==3.8.2
pytest-responses==0.5.0
requests==2.27.1
pytest==7.2.0
pytest-mock==3.10.0
pytest-responses==0.5.1
requests==2.28.1
......@@ -104,9 +104,7 @@ def setup_api(responses, monkeypatch, cache_yaml):
@pytest.fixture(autouse=True)
def give_env_variable(request, monkeypatch):
"""Defines required environment variables"""
monkeypatch.setenv("WORKER_VERSION_ID", "12341234-1234-1234-1234-123412341234")
monkeypatch.setenv("ARKINDEX_WORKER_RUN_ID", "56785678-5678-5678-5678-567856785678")
monkeypatch.setenv("ARKINDEX_CORPUS_ID", "11111111-1111-1111-1111-111111111111")
@pytest.fixture
......@@ -151,6 +149,7 @@ def mock_worker_run_api(responses):
"name": "string",
"configuration": {},
},
"model_version": None,
"process": {
"name": None,
"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff",
......@@ -184,7 +183,7 @@ def mock_worker_run_api(responses):
responses.add(
responses.GET,
"http://testserver/api/v1/imports/workers/56785678-5678-5678-5678-567856785678/",
"http://testserver/api/v1/process/workers/56785678-5678-5678-5678-567856785678/",
status=200,
body=json.dumps(payload),
content_type="application/json",
......@@ -207,7 +206,6 @@ def mock_activity_calls(responses):
def mock_elements_worker(monkeypatch, mock_worker_run_api):
"""Build and configure an ElementsWorker with fixed CLI parameters to avoid issues with pytest"""
monkeypatch.setattr(sys, "argv", ["worker"])
worker = ElementsWorker()
worker.configure()
return worker
......@@ -239,9 +237,9 @@ def mock_base_worker_with_cache(mocker, monkeypatch, mock_worker_run_api):
"""Build a BaseWorker using SQLite cache, also mocking a PONOS_TASK"""
monkeypatch.setattr(sys, "argv", ["worker"])
monkeypatch.setenv("PONOS_TASK", "my_task")
worker = BaseWorker(support_cache=True)
worker.setup_api_client()
monkeypatch.setenv("PONOS_TASK", "my_task")
return worker
......@@ -282,6 +280,11 @@ def model_file_dir():
return SAMPLES_DIR / "model_files"
@pytest.fixture
def model_file_dir_with_subfolder():
return SAMPLES_DIR / "root_folder"
@pytest.fixture
def fake_dummy_worker():
api_client = MockApiClient()
......
Wow this is actually the data of the best model ever created on Arkindex
\ No newline at end of file
Wow this is actually the data of the best model ever created on Arkindex
\ No newline at end of file
Wow this is actually the data of the best model ever created on Arkindex
\ No newline at end of file
......@@ -11,13 +11,13 @@ import pytest
from arkindex.mock import MockApiClient
from arkindex_worker import logger
from arkindex_worker.worker import BaseWorker
from arkindex_worker.worker.base import ModelNotFoundError
def test_init_default_local_share(monkeypatch):
worker = BaseWorker()
assert worker.work_dir == os.path.expanduser("~/.local/share/arkindex")
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
def test_init_default_xdg_data_home(monkeypatch):
......@@ -26,14 +26,12 @@ def test_init_default_xdg_data_home(monkeypatch):
worker = BaseWorker()
assert worker.work_dir == f"{path}/arkindex"
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
def test_init_with_local_cache(monkeypatch):
worker = BaseWorker(support_cache=True)
assert worker.work_dir == os.path.expanduser("~/.local/share/arkindex")
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
assert worker.support_cache is True
......@@ -43,19 +41,6 @@ def test_init_var_ponos_data_given(monkeypatch):
worker = BaseWorker()
assert worker.work_dir == f"{path}/current"
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
def test_init_var_worker_version_id_missing(monkeypatch):
monkeypatch.setattr(sys, "argv", ["worker"])
monkeypatch.delenv("WORKER_VERSION_ID")
monkeypatch.delenv("ARKINDEX_WORKER_RUN_ID")
worker = BaseWorker()
worker.args = worker.parser.parse_args()
worker.configure_for_developers()
assert worker.worker_version_id is None
assert worker.is_read_only is True
assert worker.config == {} # default empty case
def test_init_var_worker_run_id_missing(monkeypatch):
......@@ -75,12 +60,11 @@ def test_init_var_worker_local_file(monkeypatch, tmp_path):
config.write_text("---\nlocalKey: abcdef123")
monkeypatch.setattr(sys, "argv", ["worker", "-c", str(config)])
monkeypatch.delenv("WORKER_VERSION_ID")
monkeypatch.delenv("ARKINDEX_WORKER_RUN_ID")
worker = BaseWorker()
worker.args = worker.parser.parse_args()
worker.configure_for_developers()
assert worker.worker_version_id is None
assert worker.worker_run_id is None
assert worker.is_read_only is True
assert worker.config == {"localKey": "abcdef123"} # Use a local file for devs
......@@ -94,7 +78,6 @@ def test_cli_default(mocker, mock_worker_run_api):
mocker.patch.object(sys, "argv", ["worker"])
worker.args = worker.parser.parse_args()
assert worker.is_read_only is False
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
assert worker.worker_run_id == "56785678-5678-5678-5678-567856785678"
worker.configure()
......@@ -102,6 +85,7 @@ def test_cli_default(mocker, mock_worker_run_api):
assert logger.level == logging.NOTSET
assert worker.api_client
assert worker.config == {"someKey": "someValue"} # from API
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
logger.setLevel(logging.NOTSET)
......@@ -113,7 +97,6 @@ def test_cli_arg_verbose_given(mocker, mock_worker_run_api):
mocker.patch.object(sys, "argv", ["worker", "-v"])
worker.args = worker.parser.parse_args()
assert worker.is_read_only is False
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
assert worker.worker_run_id == "56785678-5678-5678-5678-567856785678"
worker.configure()
......@@ -121,6 +104,7 @@ def test_cli_arg_verbose_given(mocker, mock_worker_run_api):
assert logger.level == logging.DEBUG
assert worker.api_client
assert worker.config == {"someKey": "someValue"} # from API
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
logger.setLevel(logging.NOTSET)
......@@ -133,13 +117,13 @@ def test_cli_envvar_debug_given(mocker, monkeypatch, mock_worker_run_api):
monkeypatch.setenv("ARKINDEX_DEBUG", "True")
worker.args = worker.parser.parse_args()
assert worker.is_read_only is False
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
assert worker.worker_run_id == "56785678-5678-5678-5678-567856785678"
worker.configure()
assert logger.level == logging.DEBUG
assert worker.api_client
assert worker.config == {"someKey": "someValue"} # from API
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
logger.setLevel(logging.NOTSET)
......@@ -155,7 +139,6 @@ def test_configure_dev_mode(mocker, monkeypatch):
assert worker.args.dev is True
assert worker.process_information is None
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
assert worker.worker_run_id == "56785678-5678-5678-5678-567856785678"
assert worker.is_read_only is True
assert worker.user_configuration == {}
......@@ -194,23 +177,27 @@ def test_configure_worker_run(mocker, monkeypatch, responses):
"configuration": {"configuration": {}},
},
"configuration": user_configuration,
"process": {"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff"},
"model_version": None,
"process": {
"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff",
"corpus": "11111111-1111-1111-1111-111111111111",
},
}
responses.add(
responses.GET,
"http://testserver/api/v1/imports/workers/56785678-5678-5678-5678-567856785678/",
"http://testserver/api/v1/process/workers/56785678-5678-5678-5678-567856785678/",
status=200,
body=json.dumps(payload),
content_type="application/json",
)
worker.args = worker.parser.parse_args()
assert worker.is_read_only is False
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
assert worker.worker_run_id == "56785678-5678-5678-5678-567856785678"
worker.configure()
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
assert worker.user_configuration == {"a": "b"}
......@@ -246,7 +233,15 @@ def test_configure_user_configuration_defaults(
},
"revision": {"hash": "deadbeef1234"},
"configuration": {
"configuration": {"param_1": "/some/path/file.pth", "param_2": 12}
"configuration": {"param_1": "/some/path/file.pth", "param_2": 12},
"user_configuration": {
"integer_parameter": {
"type": "int",
"title": "Lambda",
"default": 0,
"required": False,
}
},
},
},
"configuration": {
......@@ -257,11 +252,15 @@ def test_configure_user_configuration_defaults(
"param_5": True,
},
},
"process": {"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff"},
"model_version": None,
"process": {
"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff",
"corpus": "11111111-1111-1111-1111-111111111111",
},
}
responses.add(
responses.GET,
"http://testserver/api/v1/imports/workers/56785678-5678-5678-5678-567856785678/",
"http://testserver/api/v1/process/workers/56785678-5678-5678-5678-567856785678/",
status=200,
body=json.dumps(payload),
content_type="application/json",
......@@ -271,6 +270,7 @@ def test_configure_user_configuration_defaults(
assert worker.config == {"param_1": "/some/path/file.pth", "param_2": 12}
assert worker.user_configuration == {
"integer_parameter": 0,
"param_3": "Animula vagula blandula",
"param_5": True,
}
......@@ -305,16 +305,20 @@ def test_configure_user_config_debug(mocker, monkeypatch, responses, debug):
"revision": {"hash": "deadbeef1234"},
"configuration": {"configuration": {}},
},
"model_version": None,
"configuration": {
"id": "af0daaf4-983e-4703-a7ed-a10f146d6684",
"name": "BBB",
"configuration": {"debug": debug},
},
"process": {"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff"},
"process": {
"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff",
"corpus": "11111111-1111-1111-1111-111111111111",
},
}
responses.add(
responses.GET,
"http://testserver/api/v1/imports/workers/56785678-5678-5678-5678-567856785678/",
"http://testserver/api/v1/process/workers/56785678-5678-5678-5678-567856785678/",
status=200,
body=json.dumps(payload),
content_type="application/json",
......@@ -356,12 +360,16 @@ def test_configure_worker_run_missing_conf(mocker, monkeypatch, responses):
"revision": {"hash": "deadbeef1234"},
"configuration": {"configuration": {}},
},
"model_version": None,
"configuration": {"id": "bbbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb", "name": "BBB"},
"process": {"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff"},
"process": {
"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff",
"corpus": "11111111-1111-1111-1111-111111111111",
},
}
responses.add(
responses.GET,
"http://testserver/api/v1/imports/workers/56785678-5678-5678-5678-567856785678/",
"http://testserver/api/v1/process/workers/56785678-5678-5678-5678-567856785678/",
status=200,
body=json.dumps(payload),
content_type="application/json",
......@@ -369,7 +377,7 @@ def test_configure_worker_run_missing_conf(mocker, monkeypatch, responses):
worker.args = worker.parser.parse_args()
worker.configure()
assert worker.user_configuration is None
assert worker.user_configuration == {}
def test_configure_worker_run_no_worker_run_conf(mocker, monkeypatch, responses):
......@@ -403,12 +411,16 @@ def test_configure_worker_run_no_worker_run_conf(mocker, monkeypatch, responses)
"revision": {"hash": "deadbeef1234"},
"configuration": {},
},
"model_version": None,
"configuration": None,
"process": {"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff"},
"process": {
"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff",
"corpus": "11111111-1111-1111-1111-111111111111",
},
}
responses.add(
responses.GET,
"http://testserver/api/v1/imports/workers/56785678-5678-5678-5678-567856785678/",
"http://testserver/api/v1/process/workers/56785678-5678-5678-5678-567856785678/",
status=200,
body=json.dumps(payload),
content_type="application/json",
......@@ -416,7 +428,72 @@ def test_configure_worker_run_no_worker_run_conf(mocker, monkeypatch, responses)
worker.args = worker.parser.parse_args()
worker.configure()
assert worker.user_configuration is None
assert worker.user_configuration == {}
def test_configure_load_model_configuration(mocker, monkeypatch, responses):
worker = BaseWorker()
mocker.patch.object(sys, "argv", ["worker"])
payload = {
"id": "56785678-5678-5678-5678-567856785678",
"parents": [],
"worker_version_id": "12341234-1234-1234-1234-123412341234",
"model_version_id": "12341234-1234-1234-1234-123412341234",
"dataimport_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff",
"worker": {
"id": "deadbeef-1234-5678-1234-worker",
"name": "Fake worker",
"slug": "fake_worker",
"type": "classifier",
},
"configuration_id": None,
"worker_version": {
"id": "12341234-1234-1234-1234-123412341234",
"worker": {
"id": "deadbeef-1234-5678-1234-worker",
"name": "Fake worker",
"slug": "fake_worker",
"type": "classifier",
},
"revision": {"hash": "deadbeef1234"},
"configuration": {"configuration": {}},
},
"configuration": None,
"model_version": {
"id": "12341234-1234-1234-1234-123412341234",
"name": "Model version 1337",
"configuration": {
"param1": "value1",
"param2": 2,
"param3": None,
},
},
"process": {
"id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeffff",
"corpus": "11111111-1111-1111-1111-111111111111",
},
}
responses.add(
responses.GET,
"http://testserver/api/v1/process/workers/56785678-5678-5678-5678-567856785678/",
status=200,
body=json.dumps(payload),
content_type="application/json",
)
worker.args = worker.parser.parse_args()
assert worker.is_read_only is False
assert worker.worker_run_id == "56785678-5678-5678-5678-567856785678"
assert worker.model_configuration == {}
worker.configure()
assert worker.worker_version_id == "12341234-1234-1234-1234-123412341234"
assert worker.model_configuration == {
"param1": "value1",
"param2": 2,
"param3": None,
}
def test_load_missing_secret():
......@@ -521,3 +598,55 @@ def test_load_local_secret(monkeypatch, tmpdir):
# The remote api is checked first
assert len(worker.api_client.history) == 1
assert worker.api_client.history[0].operation == "RetrieveSecret"
def test_find_model_directory_ponos(monkeypatch):
monkeypatch.setenv("PONOS_TASK", "my_task")
monkeypatch.setenv("PONOS_DATA", "/data")
worker = BaseWorker()
assert worker.find_model_directory() == Path("/data/current")
def test_find_model_directory_from_cli(monkeypatch):
monkeypatch.setattr(sys, "argv", ["worker", "--model-dir", "models"])
monkeypatch.setattr("pathlib.Path.exists", lambda x: True)
worker = BaseWorker()
worker.args = worker.parser.parse_args()
worker.config = {}
assert worker.find_model_directory() == Path("models")
def test_find_model_directory_from_config(monkeypatch):
monkeypatch.setattr(sys, "argv", ["worker"])
monkeypatch.setattr("pathlib.Path.exists", lambda x: True)
worker = BaseWorker()
worker.args = worker.parser.parse_args()
worker.config = {"model_dir": "models"}
assert worker.find_model_directory() == Path("models")
@pytest.mark.parametrize(
"model_path, exists, error",
(
[
None,
True,
"No path to the model was provided. Please provide model_dir either through configuration or as CLI argument.",
],
["models", False, "The path models does not link to any directory"],
),
)
def test_find_model_directory_not_found(monkeypatch, model_path, exists, error):
if model_path:
monkeypatch.setattr(sys, "argv", ["worker", "--model-dir", model_path])
else:
monkeypatch.setattr(sys, "argv", ["worker"])
monkeypatch.setattr("pathlib.Path.exists", lambda x: exists)
worker = BaseWorker()
worker.args = worker.parser.parse_args()
worker.config = {"model_dir": model_path}
with pytest.raises(ModelNotFoundError, match=error):
worker.find_model_directory()
......@@ -58,12 +58,12 @@ def test_create_tables(tmp_path):
init_cache_db(db_path)
create_tables()
expected_schema = """CREATE TABLE "classifications" ("id" TEXT NOT NULL PRIMARY KEY, "element_id" TEXT NOT NULL, "class_name" TEXT NOT NULL, "confidence" REAL NOT NULL, "state" VARCHAR(10) NOT NULL, "worker_version_id" TEXT, FOREIGN KEY ("element_id") REFERENCES "elements" ("id"))
CREATE TABLE "elements" ("id" TEXT NOT NULL PRIMARY KEY, "parent_id" TEXT, "type" VARCHAR(50) NOT NULL, "image_id" TEXT, "polygon" text, "rotation_angle" INTEGER NOT NULL, "mirrored" INTEGER NOT NULL, "initial" INTEGER NOT NULL, "worker_version_id" TEXT, "confidence" REAL, FOREIGN KEY ("image_id") REFERENCES "images" ("id"))
CREATE TABLE "entities" ("id" TEXT NOT NULL PRIMARY KEY, "type" VARCHAR(50) NOT NULL, "name" TEXT NOT NULL, "validated" INTEGER NOT NULL, "metas" text, "worker_version_id" TEXT)
expected_schema = """CREATE TABLE "classifications" ("id" TEXT NOT NULL PRIMARY KEY, "element_id" TEXT NOT NULL, "class_name" TEXT NOT NULL, "confidence" REAL NOT NULL, "state" VARCHAR(10) NOT NULL, "worker_run_id" TEXT, FOREIGN KEY ("element_id") REFERENCES "elements" ("id"))
CREATE TABLE "elements" ("id" TEXT NOT NULL PRIMARY KEY, "parent_id" TEXT, "type" VARCHAR(50) NOT NULL, "image_id" TEXT, "polygon" text, "rotation_angle" INTEGER NOT NULL, "mirrored" INTEGER NOT NULL, "initial" INTEGER NOT NULL, "worker_version_id" TEXT, "worker_run_id" TEXT, "confidence" REAL, FOREIGN KEY ("image_id") REFERENCES "images" ("id"))
CREATE TABLE "entities" ("id" TEXT NOT NULL PRIMARY KEY, "type" VARCHAR(50) NOT NULL, "name" TEXT NOT NULL, "validated" INTEGER NOT NULL, "metas" text, "worker_run_id" TEXT)
CREATE TABLE "images" ("id" TEXT NOT NULL PRIMARY KEY, "width" INTEGER NOT NULL, "height" INTEGER NOT NULL, "url" TEXT NOT NULL)
CREATE TABLE "transcription_entities" ("transcription_id" TEXT NOT NULL, "entity_id" TEXT NOT NULL, "offset" INTEGER NOT NULL CHECK (offset >= 0), "length" INTEGER NOT NULL CHECK (length > 0), "worker_version_id" TEXT, "confidence" REAL, PRIMARY KEY ("transcription_id", "entity_id"), FOREIGN KEY ("transcription_id") REFERENCES "transcriptions" ("id"), FOREIGN KEY ("entity_id") REFERENCES "entities" ("id"))
CREATE TABLE "transcriptions" ("id" TEXT NOT NULL PRIMARY KEY, "element_id" TEXT NOT NULL, "text" TEXT NOT NULL, "confidence" REAL NOT NULL, "orientation" VARCHAR(50) NOT NULL, "worker_version_id" TEXT, FOREIGN KEY ("element_id") REFERENCES "elements" ("id"))"""
CREATE TABLE "transcription_entities" ("transcription_id" TEXT NOT NULL, "entity_id" TEXT NOT NULL, "offset" INTEGER NOT NULL CHECK (offset >= 0), "length" INTEGER NOT NULL CHECK (length > 0), "worker_run_id" TEXT, "confidence" REAL, PRIMARY KEY ("transcription_id", "entity_id"), FOREIGN KEY ("transcription_id") REFERENCES "transcriptions" ("id"), FOREIGN KEY ("entity_id") REFERENCES "entities" ("id"))
CREATE TABLE "transcriptions" ("id" TEXT NOT NULL PRIMARY KEY, "element_id" TEXT NOT NULL, "text" TEXT NOT NULL, "confidence" REAL NOT NULL, "orientation" VARCHAR(50) NOT NULL, "worker_version_id" TEXT, "worker_run_id" TEXT, FOREIGN KEY ("element_id") REFERENCES "elements" ("id"))"""
actual_schema = "\n".join(
[
......@@ -239,7 +239,7 @@ def test_element_open_image(
expected_url,
):
open_mock = mocker.patch(
"arkindex_worker.cache.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
image = CachedImage(
......
......@@ -2,6 +2,7 @@
import pytest
from requests import HTTPError
from arkindex_worker.cache import CachedElement
from arkindex_worker.models import Element
......@@ -49,7 +50,7 @@ def test_image_url_s3_resize():
def test_open_image(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -75,7 +76,7 @@ def test_open_image(mocker):
def test_open_image_resize_portrait(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -120,7 +121,7 @@ def test_open_image_resize_portrait(mocker):
def test_open_image_resize_partial_element(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -148,7 +149,7 @@ def test_open_image_resize_partial_element(mocker):
def test_open_image_resize_landscape(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -193,7 +194,7 @@ def test_open_image_resize_landscape(mocker):
def test_open_image_resize_square(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -237,7 +238,7 @@ def test_open_image_resize_square(mocker):
def test_open_image_resize_tiles(mocker):
mocker.patch("arkindex_worker.models.open_image", return_value="an image!")
mocker.patch("arkindex_worker.image.open_image", return_value="an image!")
elt = Element(
{
"zone": {
......@@ -261,7 +262,7 @@ def test_open_image_requires_zone():
def test_open_image_s3(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -281,7 +282,7 @@ def test_open_image_s3_retry(mocker):
response_mock = mocker.MagicMock()
response_mock.status_code = 403
mocker.patch(
"arkindex_worker.models.open_image",
"arkindex_worker.image.open_image",
return_value="an image!",
side_effect=HTTPError(response=response_mock),
)
......@@ -303,7 +304,7 @@ def test_open_image_s3_retry_once(mocker):
response_mock = mocker.MagicMock()
response_mock.status_code = 403
mocker.patch(
"arkindex_worker.models.open_image",
"arkindex_worker.image.open_image",
side_effect=HTTPError(response=response_mock),
)
elt = Element(
......@@ -321,7 +322,7 @@ def test_open_image_s3_retry_once(mocker):
def test_open_image_use_full_image_false(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -344,7 +345,7 @@ def test_open_image_use_full_image_false(mocker):
def test_open_image_resize_use_full_image_false(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -374,7 +375,7 @@ def test_open_image_resize_use_full_image_false(mocker):
def test_open_image_rotation_mirror(mocker):
open_mock = mocker.patch(
"arkindex_worker.models.open_image", return_value="an image!"
"arkindex_worker.image.open_image", return_value="an image!"
)
elt = Element(
{
......@@ -402,3 +403,17 @@ def test_setattr_setitem():
element = Element({"name": "something"})
element.type = "page"
assert dict(element) == {"name": "something", "type": "page"}
def test_element_polygon():
polygon = [[0, 0], [181, 0], [181, 240], [0, 240], [0, 0]]
element = Element({"zone": {"polygon": polygon}})
cached_element = CachedElement(polygon=polygon)
assert element.polygon == polygon
assert element.polygon == cached_element.polygon
def test_element_no_polygon():
element = Element(id="element_id")
with pytest.raises(ValueError, match="Element element_id has no zone"):
_ = element.polygon
......@@ -3,6 +3,6 @@
BASE_API_CALLS = [
(
"GET",
"http://testserver/api/v1/imports/workers/56785678-5678-5678-5678-567856785678/",
"http://testserver/api/v1/process/workers/56785678-5678-5678-5678-567856785678/",
),
]