Skip to content

Implement split command

Create a new split.py module where the code related to data (mostly KaldiPartitionSplitter renamed to Partitioner).

We will update the command atr-data-generator split so that it

  • creates a Partitioner instance filled with CLI args
  • calls create_partitions method on it

Needed CLI args are:

  • --train-size, float, defaults to None
  • --val-size, float, defaults to None
  • --test-size, float, defaults to None
  • output, pathlib.Path path where the data will be stored (no more .../Lines like this btw, look for files in args.output directly)
  • --existing, boolean flag, behave like the current split.use_existing_split argument (maybe useless? discuss with the users before)

Notes:

  • either --existing or at least two of --x-size must be specified
  • if --existing is used, the --x-size are ignored.
Edited by Yoann Schneider