qwen2_5_omni_all_1015_reverse
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6604
- Token Acc: 0.7973
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 3.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Token Acc |
|---|---|---|---|---|
| 0.9096 | 0.1140 | 50 | 0.9297 | 0.7298 |
| 0.9652 | 0.2279 | 100 | 0.9691 | 0.7263 |
| 0.9185 | 0.3419 | 150 | 0.9586 | 0.7291 |
| 0.9112 | 0.4558 | 200 | 0.9426 | 0.7318 |
| 0.8981 | 0.5698 | 250 | 0.9181 | 0.7380 |
| 0.8853 | 0.6838 | 300 | 0.9057 | 0.7405 |
| 0.8656 | 0.7977 | 350 | 0.8893 | 0.7448 |
| 0.8565 | 0.9117 | 400 | 0.8699 | 0.7472 |
| 0.7611 | 1.0251 | 450 | 0.8599 | 0.7518 |
| 0.7483 | 1.1390 | 500 | 0.8348 | 0.7568 |
| 0.7281 | 1.2530 | 550 | 0.8193 | 0.7599 |
| 0.7189 | 1.3670 | 600 | 0.8009 | 0.7641 |
| 0.7267 | 1.4809 | 650 | 0.7874 | 0.7677 |
| 0.6823 | 1.5949 | 700 | 0.7717 | 0.7697 |
| 0.6717 | 1.7088 | 750 | 0.7521 | 0.7747 |
| 0.6650 | 1.8228 | 800 | 0.7360 | 0.7782 |
| 0.6431 | 1.9368 | 850 | 0.7218 | 0.7816 |
| 0.5259 | 2.0501 | 900 | 0.7128 | 0.7842 |
| 0.5419 | 2.1641 | 950 | 0.6984 | 0.7872 |
| 0.5329 | 2.2781 | 1000 | 0.6899 | 0.7905 |
| 0.5434 | 2.3920 | 1050 | 0.6797 | 0.7926 |
| 0.5034 | 2.5060 | 1100 | 0.6729 | 0.7942 |
| 0.5021 | 2.6199 | 1150 | 0.6675 | 0.7955 |
| 0.5372 | 2.7339 | 1200 | 0.6622 | 0.7969 |
| 0.5031 | 2.8479 | 1250 | 0.6609 | 0.7972 |
| 0.5067 | 2.9618 | 1300 | 0.6604 | 0.7971 |
| 0.4980 | 3.0 | 1317 | 0.6604 | 0.7973 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.8.0+cu128
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- 357