qwen2_5_omni_all_1015_reverse

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6604
  • Token Acc: 0.7973

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Token Acc
0.9096 0.1140 50 0.9297 0.7298
0.9652 0.2279 100 0.9691 0.7263
0.9185 0.3419 150 0.9586 0.7291
0.9112 0.4558 200 0.9426 0.7318
0.8981 0.5698 250 0.9181 0.7380
0.8853 0.6838 300 0.9057 0.7405
0.8656 0.7977 350 0.8893 0.7448
0.8565 0.9117 400 0.8699 0.7472
0.7611 1.0251 450 0.8599 0.7518
0.7483 1.1390 500 0.8348 0.7568
0.7281 1.2530 550 0.8193 0.7599
0.7189 1.3670 600 0.8009 0.7641
0.7267 1.4809 650 0.7874 0.7677
0.6823 1.5949 700 0.7717 0.7697
0.6717 1.7088 750 0.7521 0.7747
0.6650 1.8228 800 0.7360 0.7782
0.6431 1.9368 850 0.7218 0.7816
0.5259 2.0501 900 0.7128 0.7842
0.5419 2.1641 950 0.6984 0.7872
0.5329 2.2781 1000 0.6899 0.7905
0.5434 2.3920 1050 0.6797 0.7926
0.5034 2.5060 1100 0.6729 0.7942
0.5021 2.6199 1150 0.6675 0.7955
0.5372 2.7339 1200 0.6622 0.7969
0.5031 2.8479 1250 0.6609 0.7972
0.5067 2.9618 1300 0.6604 0.7971
0.4980 3.0 1317 0.6604 0.7973

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
357
Safetensors
Model size
9B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support