qwen2_5_omni_all_1015_reverse

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 3.0

Training Loss	Epoch	Step	Validation Loss	Token Acc
0.9096	0.1140	50	0.9297	0.7298
0.9652	0.2279	100	0.9691	0.7263
0.9185	0.3419	150	0.9586	0.7291
0.9112	0.4558	200	0.9426	0.7318
0.8981	0.5698	250	0.9181	0.7380
0.8853	0.6838	300	0.9057	0.7405
0.8656	0.7977	350	0.8893	0.7448
0.8565	0.9117	400	0.8699	0.7472
0.7611	1.0251	450	0.8599	0.7518
0.7483	1.1390	500	0.8348	0.7568
0.7281	1.2530	550	0.8193	0.7599
0.7189	1.3670	600	0.8009	0.7641
0.7267	1.4809	650	0.7874	0.7677
0.6823	1.5949	700	0.7717	0.7697
0.6717	1.7088	750	0.7521	0.7747
0.6650	1.8228	800	0.7360	0.7782
0.6431	1.9368	850	0.7218	0.7816
0.5259	2.0501	900	0.7128	0.7842
0.5419	2.1641	950	0.6984	0.7872
0.5329	2.2781	1000	0.6899	0.7905
0.5434	2.3920	1050	0.6797	0.7926
0.5034	2.5060	1100	0.6729	0.7942
0.5021	2.6199	1150	0.6675	0.7955
0.5372	2.7339	1200	0.6622	0.7969
0.5031	2.8479	1250	0.6609	0.7972
0.5067	2.9618	1300	0.6604	0.7971
0.4980	3.0	1317	0.6604	0.7973

Safetensors

Model size

9B params

Tensor type

BF16