Skip to content
Snippets Groups Projects
Commit e9dbd2a9 authored by kermorvant's avatar kermorvant
Browse files

Merge branch 'on-premise' into 'master'

On Premise documentation

See merge request teklia/arkindex/doc!75
parents 1abda0a7 44fd6e30
No related branches found
No related tags found
1 merge request!75On Premise documentation
Pipeline #13845 passed
graph TD
lb[Load balancer] --> frontend[Arkindex frontend]
frontend --> backend
lb --> backend[Arkindex backend]
lb --> cantaloupe[IIIF server]
lb --> minio[S3-compatible storage]
cantaloupe --> minio
backend --> worker[Arkindex internal worker]
worker --> backend
backend --> redis[Cache]
backend --> db[Database]
backend --> solr[Search engine]
content/howto/on_premise/architecture.png

45 KiB

+++
title = "Deploy Arkindex on-premise"
description = "Deploy Arkindex on your own infrastucture"
weight = 100
+++
If you are interested in using Arkindex on your own documents, but **cannot publish them on our own instances** (due to privacy or regulatory concerns), it's possible to deploy the full Arkindex platform on your own infrastructure.
In the following sections, we'll describe the requirements needed to run an efficient and scalable Arkindex infrastructure using **Docker containers** on your own hardware. This setup is able to handle millions of documents to process with multiple Machine Learning processes.
## Architecture
The main part of the architecture uses a set of open-source software along with our own proprietary software.
{{ figure(image="howto/on_premise/architecture.png", height=400, caption="Arkindex platform architecture") }}
The open source components here are:
- Traefik as load balancer,
- Cantaloupe as IIIF server,
- Minio as S3-compatible storage server,
- Redis as cache,
- PostgreSQL as database,
- Solr as search engine.
You'll also need to run a set of workers on dedicated servers: this is where the Machine Learning processes will run.
{{ figure(image="howto/on_premise/workers.png", height=400, caption="Arkindex workers for Machine Learning") }}
Each worker in the diagram represents a dedicated server, running our in-house job scheduling agents and dedicated Machine Learning tasks.
## Hardware
### Platform
We recommend to use Docker Swarm to aggregate several web servers along with at least one server for databases.
At least 2 web nodes must run for efficient results in production.
#### Web node spec
These servers can be virtual machines (VPS) or dedicated servers on bare metal, with recommended specifications:
- 4 CPU cores, 2Ghz by core
- 4Gb of RAM
- 80Gb of storage
Should host these services:
- arkindex backend & frontend
- arkindex internal worker
- load balancer
- (optionally) IIIF server
#### Database server spec
This server must be a dedicated server on bare metal, using SSD for database storage, with recommended specifications:
- 8 to 12 cores, 2.6Ghz by core
- 32Gb of RAM
- 500 Gb of storage (heavily depends on the size of your datasets)
Should host these services:
- PostgreSQL database
- Redis server
- (optionally) Solr server
- (optionally) Minio instance
### Machine Learning Workers
Each worker can be an independent server, and is not necessarily connected directly to the platform (it only needs to communicate through the REST API of the platform, no database access is needed).
The requirement of each server depends on the type of your processes and datasets. We recommend to use bare-metal servers with at least 8 cores at 2Ghz and 16Gb of RAM. You may also need some GPUs for specific use cases. Please describe your datasets with samples so we can reply with specific requirements for any inquiry.
## Requirements
- Use Linux servers and Docker. We provide support for the Ubuntu LTS distribution, and only provide Docker images to run our software.
- Your instance must be able to make regular API calls (once a day) on a remote server to validate its licence. The server does **not** need to be exposed to Internet, but simply be able to make requests towards a domain.
## Deliverables
- Docker images:
- backend
- agent to run processes
- relevant Machine Learning workers used in processes (DLA, HTR, NER, ...)
- frontend assets
- Documentation to deploy and manage an instance using [Ansible playbook](https://www.ansible.com/)
## Pricing
Please [contact us](https://teklia.com/company/contact/) if you are interested in this solution for your company or institution.
We can also provide a private instance that we manage on our servers (hosted in Europe or North America).
graph TD
subgraph arkindex
lb[Load Balancer] --> backend[Arkindex backend]
backend --> database[Database]
end
subgraph worker 1
agent_1[Agent] -..-> lb
agent_1 --> task_1
agent_1 --> task_2
end
subgraph worker 2
agent_2[Agent] -..-> lb
agent_2 --> task_3
agent_2 --> task_4
end
content/howto/on_premise/workers.png

45.9 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment