Skip to content

Download and format a HuggingFace Dataset

We should support datasets from HuggingFace like RIMES.

This would be a new command pylaia-download which supports HF datasets and format the dataset in PyLaia format. We would need:

  • files needed for training
  • files needed for LM training
  • files needed for evaluation
  • files needed for prediction/testing