Skip to content

Add an example on how to extract a DAN dataset from Arkindex

The various commands for dataset formatting are described at https://atr.pages.teklia.com/dan/usage/datasets/.

However, this section lacks a practical example that explains the order in which these commands should be run and why they are useful.

We should provide two examples starting from a publicly available Arkindex corpus:

  • text only
  • text + entities