Directly format the data during the extraction stage
To speed up the data generation process, data could be formatted directly during extraction (labels would no longer be stored in .txt files, but the charset and labels.json files would be generated directly). We would only need to run a single command to generate the data.