Use a dedicated command to build language model
Since the charset is built into the download
command, it doesn't make sense to have --subword-vocab-size
and --tokens
parameters. Also we (almost) never use a language model for our training. We should have a new command, to move this code, which builds the files relating to the language model.