Use a dedicated command to build language model

Since the charset is built into the download command, it doesn't make sense to have --subword-vocab-size and --tokens parameters. Also we (almost) never use a language model for our training. We should have a new command, to move this code, which builds the files relating to the language model.