Create two charsets for HTR and NER tokens

teklia-dan dataset format currently creates a charset.pkl file that contains every character and NER token. It would be useful to create two different charsets:

charset_htr.pkl containing only characters, punctuation, etc
charset_ner.pkl containing only NER tokens

To do that, we could to add a new --tokens argument to the format subcommand for entity token mapping.

Other things to update:

Prediction (loading the charset)
DAN worker (loading the charset)

Edited Jun 12, 2023 by Solene Tarride