UntitledData loading optimizationData loading optimizationGPU-CPU optimizationGPU-CPU optimizationTraining a TransformerTraining a Transformer