Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • arkindex/doc
1 result
Show changes
Commits on Source (6)
Showing
with 275 additions and 19 deletions
+++
title = "Deployment"
sort_by = "weight"
weight = 90
insert_anchor_links = "right"
+++
This documentation is aimed at **system administrators** and **business leaders** who wish to deploy the Arkindex platform on their own hardware or cloud provider instead of using instances provided by Teklia (like [demo.arkindex.org](https://demo.arkindex.org)).
If you are interested in using Arkindex on your own documents, but **cannot publish them on publicly available instances** (due to privacy or regulatory concerns), it's possible to deploy the full Arkindex platform on your own infrastructure.
We currently offer [two editions of Arkindex](@/overview/license.md):
1. Community Edition, under the AGPL-v3 open-source license, suitable if your project is also open-source,
2. Enterprise Edition, suitable for all proprietary projects.
Please [contact us](https://teklia.com/company/contact/) if you are interested in Arkindex for your company or institution. We offer dedicated services for all your arkindex needs (training, setup, project consulting, feature development,...)
## Requirements
Arkindex needs a few hard-requirements to run on your own hardware:
- [Docker](https://docs.docker.com/get-docker/) is needed, as we only deploy through Docker images,
- Linux servers are the only Operating System support. We heavily recommend using [Ubuntu LTS](https://ubuntu.com/download/desktop)
- All your images must be hosted on a [IIIF](https://iiif.io/) server, or you'll need to expose them through a local IIIF server
- A domain name for the platform server:
- ideally this is a public domain name if your server is reachable on Internet (like `arkindex.company.com`),
- or an internal domain name, provided by your company's system administrator.
- An SSL certificate for that domain name:
- it can be provided by [Let's Encrypt](https://letsencrypt.org/) freely and automatically if your server is reachable on Internet
- otherwise an internal certificate , provided by your company's system administrator.
To run the Enterprise Edition, your servers must be able to make regular API calls (a few times a day) on a remote server to validate its licence. The server does **not** necessarily need to be exposed to Internet, but simply be able to make requests towards a domain.
## Content
In this section you'll find out how to:
1. pick an [architecture](@/deployment/architecture.md) that suits your needs
2. choose your [own hardware](@/deployment/hardware.md) to run Arkindex efficiently that suits your needs
3. setup [Arkindex](@/deployment/setup.md) in production mode,
4. configure [Arkindex](@/deployment/configuration.md)
+++
title = "Architectures"
sort_by = "weight"
weight = 10
+++
We'll use different terms for the components of our product:
- **Platform server** is the server that will run the **Backend** code responsible for the **Rest API**,
- Arkindex needs to run some specific asynchronous tasks that require direct access to the database: the **local worker** will execute these tasks,
- In the Enterprise Edition, some intensive Machine Learning tasks will be executed by **Remote workers**, using a proprietary software called **Ponos**. One instance of Ponos is called an **Agent**.
# Overview
The main part of the architecture uses a set of open-source software along with our own software.
{{ figure(image="deployment/architecture/overview.png", height=400, caption="Arkindex platform architecture") }}
The open source components here are:
- Traefik as load balancer,
- Cantaloupe as IIIF server,
- Minio as S3-compatible storage server,
- Redis as cache,
- PostgreSQL as database,
- Solr as search engine.
## Machine Learning
In the Enterprise Edition, you'll also need to run a set of workers on dedicated servers: this is where the Machine Learning processes will run.
{{ figure(image="deployment/architecture/workers.png", height=400, caption="Arkindex workers for Machine Learning") }}
Each worker in the diagram represents a dedicated server, running our in-house job scheduling agents and dedicated Machine Learning tasks.
# Common cases
We only cover the most common cases here; if you have questions about your own architecture, please Please [contact us](https://teklia.com/company/contact/).
## Single Server
This is the simplest option, a standalone server that hosts all the services using **Docker container**.
A single `docker-compose.yml` can efficiently deploy the whole stack.
{{ figure(image="deployment/architecture/single.png", height=400, caption="Arkindex stack on a single server") }}
**Pros :**
- Simple to deploy and maintain
- Cheap
**Cons :**
- Limited disk space
- Limited performance
- Single point of failure
## Cluster
With more budget, you can deploy Arkindex across several servers, still using docker-compose along with placement constraints on [Docker Swarm](https://docs.docker.com/engine/swarm/).
A **Docker Swarm cluster** enables you to run Docker services instead of containers, with multiple containers per service so you can benefit from higher throughput and eliminate single points of failures.
{{ figure(image="deployment/architecture/cluster.png", height=400, caption="Arkindex stack on a Docker Swarm cluster") }}
**Pros :**
- High performance
- Services replica for high availability
- Network segregation for better security
**Cons :**
- Limited disk space
- Harder to maintain and monitor
## Cloud provider
You can also deploy Arkindex using a Cloud provider (like Amazon AWS, Google GCP, Microsoft Azure), using their managed services to replace self-hosting databases and shared S3-compatible storage.
Most cloud providers provide manged offers for the services required by Arkindex (Load balancer, Postgresql, S3-compatible storage, search engine & redis cache). You'll then just need to run Arkindex containers:
- through managed Docker containers
- by building your own Docker swarm cluster on their VPS offering
{{ figure(image="deployment/architecture/cloud.png", height=400, caption="Arkindex stack on a cloud provider") }}
**Pros :**
- High performance
- Low maintenance for non-hosted services
- Unlimited disk space
**Cons :**
- Expensive
- Vendor lock-in
graph TD
lb[Load balancer] --> docker
subgraph docker[Containers]
backend[Arkindex backend]
cantaloupe[IIIF server]
worker[Arkindex internal worker]
end
subgraph services[Managed service]
minio[S3-compatible storage]
redis[Cache]
db[Database]
solr[Search engine]
end
subgraph gpu[GPU-enabled services]
ponos -->ml_task[Machine Learning Task]
end
docker --> services
gpu --> docker
content/deployment/architecture/cloud.png

32.1 KiB

graph TD
subgraph server_web1[WebServices n°2]
lb[Load balancer]
lb --> backend[Arkindex backend]
lb --> cantaloupe[IIIF server]
end
subgraph server_web2[WebServices n°2]
lb --> backend2[Arkindex backend]
lb --> cantaloupe2[IIIF server]
end
subgraph storage[File storage]
lb --> minio
end
subgraph server_db1[Databases]
minio[S3-compatible storage]
redis[Cache]
db[Database]
solr[Search engine]
end
subgraph server_worker1[Worker n°1]
worker[Arkindex internal worker]
worker2[Arkindex internal worker n°2]
end
subgraph server_worker2[Worker n°2]
ponos -->ml_task[Machine Learning Task]
end
server_web1 -..-> server_db1
server_web2 -..-> server_db1
server_worker1 -..-> server_db1
server_worker2 -..-> server_web1
server_worker2 -..-> server_web2
content/deployment/architecture/cluster.png

68 KiB

graph TD
subgraph server
lb[Load balancer] --> frontend[Arkindex frontend]
frontend --> backend
lb --> backend[Arkindex backend]
lb --> cantaloupe[IIIF server]
lb --> minio[S3-compatible storage]
cantaloupe --> minio
backend --> worker[Arkindex internal worker]
worker --> backend
backend --> redis[Cache]
backend --> db[Database]
backend --> solr[Search engine]
end
content/deployment/architecture/single.png

38.3 KiB

+++
title = "Configure Arkindex backend"
title = "Settings"
description = "All the configuration options available to setup your Arkindex backend"
# Clean slug for parent folder
path = "howto/on-premise/configuration/"
weight = 40
+++
You will find on this page all the configuration settings available for the Arkindex backend. These settings must be stored in a YAML file, and exposed using a Docker volume to the backend and worker container. The configuration path is set through `CONFIG_PATH` environment variable.
......
+++
title = "Deploy Arkindex on-premise"
description = "Deploy Arkindex on your own infrastucture"
weight = 110
title = "Hardware"
sort_by = "weight"
weight = 20
+++
If you are interested in using Arkindex on your own documents, but **cannot publish them on our own instances** (due to privacy or regulatory concerns), it's possible to deploy the full Arkindex platform on your own infrastructure.
In the following sections, we'll describe the requirements needed to run an efficient and scalable Arkindex infrastructure using **Docker containers** on your own hardware. This setup is able to handle millions of documents to process with multiple Machine Learning processes.
## Architecture
The main part of the architecture uses a set of open-source software along with our own proprietary software.
{{ figure(image="howto/on_premise/architecture.png", height=400, caption="Arkindex platform architecture") }}
The open source components here are:
- Traefik as load balancer,
- Cantaloupe as IIIF server,
- Minio as S3-compatible storage server,
- Redis as cache,
- PostgreSQL as database,
- Solr as search engine.
You'll also need to run a set of workers on dedicated servers: this is where the Machine Learning processes will run.
{{ figure(image="howto/on_premise/workers.png", height=400, caption="Arkindex workers for Machine Learning") }}
Each worker in the diagram represents a dedicated server, running our in-house job scheduling agents and dedicated Machine Learning tasks.
## Hardware
### Platform
......@@ -75,27 +49,3 @@ The requirement of each server depends on the type of your processes and dataset
## Requirements
- Use Linux servers and Docker. We provide support for the Ubuntu LTS distribution, and only provide Docker images to run our software.
- Your instance must be able to make regular API calls (once a day) on a remote server to validate its licence. The server does **not** need to be exposed to Internet, but simply be able to make requests towards a domain.
## Deliverables
- Docker images:
- backend
- agent to run processes
- relevant Machine Learning workers used in processes (DLA, HTR, NER, ...)
- frontend assets
- Documentation to deploy and manage an instance using [Ansible playbook](https://www.ansible.com/)
## Pricing
Please [contact us](https://teklia.com/company/contact/) if you are interested in this solution for your company or institution.
We can also provide a private instance that we manage on our servers (hosted in Europe or North America).
## Run with docker
More information on [running Arkindex using docker-compose](@/howto/on_premise/docker_compose.md)
+++
title = "Deploy Arkindex with docker-compose"
title = "Docker setup"
description = "Deploy Arkindex on your own infrastucture using Linux and docker-compose"
# Clean slug for parent folder
path = "howto/on-premise/docker-compose/"
weight = 30
+++
This documentation is written for **system administrators**.
We'll use different terms for the components of our product:
- **Platform server** is the server that will run the **Backend** code responsible for the **Rest API**,
- Arkindex needs to run some specific asynchronous tasks that require direct access to the database: the **local worker** will execute these tasks,
- Some intensive Machine Learning tasks will be executed by **Remote workers**, using a proprietary software called **Ponos**. One instance of Ponos is called an **Agent**.
## Requirements
- A bare metal server running Linux Ubuntu LTS (20.04 or 22.04) for the platform
- If you plan to run Machine Learning processes, you'll need another server
- [Docker installed on that server](https://docs.docker.com/desktop/install/linux-install/)
- [docker-compose](https://docs.docker.com/desktop/install/linux-install/)
- A domain name for the platform server:
- ideally this is a public domain name if your server is reachable on Internet (like `arkindex.company.com`),
- or an internal domain name, provided by your company's system administrator.
- An SSL certificate for that domain name:
- it can be provided by [Let's Encrypt](https://letsencrypt.org/) freely and automatically if your server is reachable on Internet
- otherwise an internal certificate , provided by your company's system administrator.
## Third-party services
......@@ -50,7 +38,7 @@ Teklia will provide you with several docker images (to load using [docker load](
- the tasks image, `registry.gitlab.teklia.com/arkindex/tasks:X.Y.Z`, will be used to by the remote workers (file imports, thumbnails generation, ...).
- the ponos image, `registry.gitlab.teklia.com/arkindex/ponos-agent:X.Y.Z` will be used to actually run the asynchronous tasks across all your remote workers.
-
{{ figure(image="howto/on_premise/stack.png", height=250, caption="Arkindex Platform and a single Worker") }}
{{ figure(image="deployment/stack.png", height=250, caption="Arkindex Platform and a single Worker") }}
The backend image mentioned above will run in two containers on your application server:
1. for the API, this is really the heart of Arkindex,
......@@ -67,7 +55,7 @@ Of course your setup may differ, you could use external services (databases, sea
### Configuration
All the configuration options for the backend are detailed [on this page](@/howto/on_premise/configuration.md).
All the configuration options for the backend are detailed [on this page](@/deployment/configuration.md).
A minimal configuration file is also available in the [public repository](https://gitlab.teklia.com/arkindex/public-architecture/-/blob/master/config.yml).
......