Skip to content

KaldiFormat extraction

Depends #26 (closed)

Implement a new class KaldiDataGenerator(DataGenerator) that generates the dataset in kaldi format.

Add a run(self) method on DataGenerator that will be called on the atr-data-generator extract command call.

   def run(self):
      transcriptions: List[Transcription] = self.get_line_transcriptions()
      self.export(transcriptions)

   def export(self, transcriptions):
      raise NotImplementedError

The KaldiDataGenerator will override the export method with whatever code is needed to do that. IMO it's only one file export per transcription but I could be wrong.