Skip to content

Bug on page_type

When starting an export (either page-xml or pdf), we immediately get an error about a missing key in the config

See process https://ee.preprod.arkindex.teklia.com/process/ffeabcac-bfb5-4ed9-be4d-98fc13a4c189/0

2024-12-06 13:09:10,327 INFO/arkindex_worker: Worker will use /data/current as working directory
2024-12-06 13:09:11,493 INFO/arkindex_worker: Loaded Worker ExportPageXML @ version 1 using configuration 'Configuration for process 6fbbda2d-a7e3-4f14-bc1a-5201b069a431' from API
2024-12-06 13:09:11,494 INFO/arkindex_worker: Loaded user configuration from WorkerRun
2024-12-06 13:09:11,494 INFO/arkindex_worker: User configuration retrieved
2024-12-06 13:09:11,794 INFO/arkindex_worker: Loaded 23 element types in corpus (fc323eb0-7a7a-4933-bff5-dcc3f4340bd2).
2024-12-06 13:09:12,095 INFO/arkindex_worker: Downloading export (d27fa9c7-2d07-485f-a3f5-f08737790280)...
2024-12-06 13:09:12,720 INFO/arkindex_worker: Downloaded export (d27fa9c7-2d07-485f-a3f5-f08737790280) @ `/tmp/test-bastien-20241206-130004.sqlite`
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/local/bin/worker-export-pagexml:8 in <module>                           │
│                                                                              │
│   5 from worker_export.pagexml import main                                   │
│   6 if __name__ == '__main__':                                               │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])     │
│ ❱ 8 │   sys.exit(main())                                                     │
│   9                                                                          │
│                                                                              │
│ /usr/local/lib/python3.11/site-packages/worker_export/pagexml.py:61 in main  │
│                                                                              │
│   58                                                                         │
│   59                                                                         │
│   60 def main() -> None:                                                     │
│ ❱ 61 │   PAGEXMLExporter(description="Export data from Arkindex to PAGE XML  │
│   62                                                                         │
│   63                                                                         │
│   64 if __name__ == "__main__":                                              │
│                                                                              │
│ /usr/local/lib/python3.11/site-packages/worker_export/base.py:97 in run      │
│                                                                              │
│    94 │   │   self.output_dir = Path(tempfile.mkdtemp(prefix=f"{self.mode}-e │
│    95 │                                                                      │
│    96 │   def run(self) -> None:                                             │
│ ❱  97 │   │   super().run()                                                  │
│    98 │   │                                                                  │
│    99 │   │   if not next(self.output_dir.rglob(f"*.{self.file_ext}"), None) │
│   100 │   │   │   logger.warning(self.empty_warning.format(self.page_type))  │
│                                                                              │
│ /usr/local/lib/python3.11/site-packages/arkindex_worker/worker/__init__.py:1 │
│ 51 in run                                                                    │
│                                                                              │
│   148 │   │   It calls [process_element][arkindex_worker.worker.ElementsWork │
│   149 │   │   catching exceptions, and handles saving WorkerActivity updates │
│   150 │   │   """                                                            │
│ ❱ 151 │   │   self.configure()                                               │
│   152 │   │                                                                  │
│   153 │   │   # List all elements either from JSON file                      │
│   154 │   │   # or direct list of elements on CLI                            │
│                                                                              │
│ /usr/local/lib/python3.11/site-packages/worker_export/pagexml.py:25 in       │
│ configure                                                                    │
│                                                                              │
│   22 │   def configure(self) -> None:                                        │
│   23 │   │   super().configure()                                             │
│   24 │   │                                                                   │
│ ❱ 25 │   │   self.page_type = self.config["page_type"]                       │
│   26 │   │   self.paragraph_type = self.config.get("paragraph_type")         │
│   27 │   │   self.line_type = self.config["line_type"]                       │
│   28 │   │   self.transcription_source = uuid_or_manual(                     │
╰──────────────────────────────────────────────────────────────────────────────╯
KeyError: 'page_type'