| 2025-08-29 02:09:10 - pico-train - INFO - Step 500 -- ๐ Evaluation Results | |
| 2025-08-29 02:09:10 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-29 02:09:11 - pico-train - INFO - ================================================== | |
| 2025-08-29 02:09:11 - pico-train - INFO - โจ Training Configuration | |
| 2025-08-29 02:09:11 - pico-train - INFO - ================================================== | |
| 2025-08-29 02:09:11 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ checkpointing: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ checkpoints_dir: checkpoints โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ evaluation: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ eval_results_dir: eval_results โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ hf_checkpoint: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ collection_slug: null โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ learning_dynamics: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ batch_size: 1 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ eval_data: null โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ layer_suffixes: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ - attention.v_proj โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ - attention.o_proj โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ - swiglu.w_2 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ sequence_idx: -1 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ logs_dir: logs โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma29k-v3 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ runs_dir: runs โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ save_every_n_steps: 500 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ save_to_hf: true โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ training: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ auto_resume: true โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ data: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ dataloader: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ batch_size: 4 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ dataset: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ name: pico-lm/pretokenized-dolma โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ tokenizer: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ vocab_size: 50304 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ evaluation: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ metrics: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ - paloma โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ paloma: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ batch_size: 1 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ dataset_split: val โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ max_length: 2048 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ model: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ activation_hidden_dim: 384 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ attention_n_heads: 12 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ attention_n_kv_heads: 4 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ batch_size: 1024 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ d_model: 96 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ max_seq_len: 2048 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ model_type: pico_decoder โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ n_layers: 12 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ norm_eps: 1.0e-06 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ position_emb_theta: 10000.0 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ vocab_size: 50304 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ monitoring: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ logging: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ log_every_n_steps: 25 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ log_level: INFO โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ save_to_wandb: false โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ wandb: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ entity: boymyc โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ project: pico-decoder-tiny โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ training: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ fabric: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ accelerator: cuda โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ num_devices: 1 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ num_nodes: 1 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ precision: bf16-mixed โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ max_steps: 20000 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ optimization: โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ gradient_accumulation_steps: 4 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ lr: 5.0e-05 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ lr_scheduler: linear_with_warmup โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ lr_warmup_steps: 8000 โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ optimizer: adamw โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โ โ | |
| 2025-08-29 02:09:11 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ | |
| 2025-08-29 02:09:11 - pico-train - INFO - ================================================== | |
| 2025-08-29 02:09:11 - pico-train - INFO - โญ Runtime Summary: | |
| 2025-08-29 02:09:11 - pico-train - INFO - ================================================== | |
| 2025-08-29 02:09:11 - pico-train - INFO - Starting from step: 500 | |
| 2025-08-29 02:09:11 - pico-train - INFO - Model Setup: | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Total Parameters: 11,282,784 | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 | |
| 2025-08-29 02:09:11 - pico-train - INFO - Distributed Setup: | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Number of Devices: 1 | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Device Type: NVIDIA GeForce RTX 5090 | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Available Memory: 33.68 GB | |
| 2025-08-29 02:09:11 - pico-train - INFO - Software Setup: | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Python Version: 3.10.12 | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ CUDA Version: 12.8 | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Operating System: Linux 6.8.0-63-generic | |
| 2025-08-29 02:09:11 - pico-train - INFO - Batch Size Configuration: | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Global Batch Size: 4 | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Per Device Batch Size: 1 | |
| 2025-08-29 02:09:11 - pico-train - INFO - โโ Gradient Accumulation Steps: 4 | |
| 2025-08-29 02:09:11 - pico-train - INFO - ================================================== | |
| 2025-08-29 02:09:12 - pico-train - INFO - Step 500 -- ๐ Training Metrics | |
| 2025-08-29 02:09:12 - pico-train - INFO - โโโ Loss: 10.8854 | |
| 2025-08-29 02:09:12 - pico-train - INFO - โโโ Learning Rate: 3.13e-06 | |
| 2025-08-29 02:09:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:09:12 - pico-train - INFO - Step 500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 02:09:27 - pico-train - INFO - Step 525 -- ๐ Training Metrics | |
| 2025-08-29 02:09:27 - pico-train - INFO - โโโ Loss: 10.8890 | |
| 2025-08-29 02:09:27 - pico-train - INFO - โโโ Learning Rate: 3.28e-06 | |
| 2025-08-29 02:09:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:09:40 - pico-train - INFO - Step 550 -- ๐ Training Metrics | |
| 2025-08-29 02:09:40 - pico-train - INFO - โโโ Loss: 10.8846 | |
| 2025-08-29 02:09:40 - pico-train - INFO - โโโ Learning Rate: 3.44e-06 | |
| 2025-08-29 02:09:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:09:53 - pico-train - INFO - Step 575 -- ๐ Training Metrics | |
| 2025-08-29 02:09:53 - pico-train - INFO - โโโ Loss: 10.8657 | |
| 2025-08-29 02:09:53 - pico-train - INFO - โโโ Learning Rate: 3.59e-06 | |
| 2025-08-29 02:09:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:10:05 - pico-train - INFO - Step 600 -- ๐ Training Metrics | |
| 2025-08-29 02:10:05 - pico-train - INFO - โโโ Loss: 10.8590 | |
| 2025-08-29 02:10:05 - pico-train - INFO - โโโ Learning Rate: 3.75e-06 | |
| 2025-08-29 02:10:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:10:18 - pico-train - INFO - Step 625 -- ๐ Training Metrics | |
| 2025-08-29 02:10:18 - pico-train - INFO - โโโ Loss: 10.8328 | |
| 2025-08-29 02:10:18 - pico-train - INFO - โโโ Learning Rate: 3.91e-06 | |
| 2025-08-29 02:10:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:10:31 - pico-train - INFO - Step 650 -- ๐ Training Metrics | |
| 2025-08-29 02:10:31 - pico-train - INFO - โโโ Loss: 10.8166 | |
| 2025-08-29 02:10:31 - pico-train - INFO - โโโ Learning Rate: 4.06e-06 | |
| 2025-08-29 02:10:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:10:43 - pico-train - INFO - Step 675 -- ๐ Training Metrics | |
| 2025-08-29 02:10:43 - pico-train - INFO - โโโ Loss: 10.7913 | |
| 2025-08-29 02:10:43 - pico-train - INFO - โโโ Learning Rate: 4.22e-06 | |
| 2025-08-29 02:10:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:10:56 - pico-train - INFO - Step 700 -- ๐ Training Metrics | |
| 2025-08-29 02:10:56 - pico-train - INFO - โโโ Loss: 10.7609 | |
| 2025-08-29 02:10:56 - pico-train - INFO - โโโ Learning Rate: 4.37e-06 | |
| 2025-08-29 02:10:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:11:09 - pico-train - INFO - Step 725 -- ๐ Training Metrics | |
| 2025-08-29 02:11:09 - pico-train - INFO - โโโ Loss: 10.7322 | |
| 2025-08-29 02:11:09 - pico-train - INFO - โโโ Learning Rate: 4.53e-06 | |
| 2025-08-29 02:11:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:11:22 - pico-train - INFO - Step 750 -- ๐ Training Metrics | |
| 2025-08-29 02:11:22 - pico-train - INFO - โโโ Loss: 10.7121 | |
| 2025-08-29 02:11:22 - pico-train - INFO - โโโ Learning Rate: 4.69e-06 | |
| 2025-08-29 02:11:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:11:34 - pico-train - INFO - Step 775 -- ๐ Training Metrics | |
| 2025-08-29 02:11:34 - pico-train - INFO - โโโ Loss: 10.6877 | |
| 2025-08-29 02:11:34 - pico-train - INFO - โโโ Learning Rate: 4.84e-06 | |
| 2025-08-29 02:11:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:11:47 - pico-train - INFO - Step 800 -- ๐ Training Metrics | |
| 2025-08-29 02:11:47 - pico-train - INFO - โโโ Loss: 10.6436 | |
| 2025-08-29 02:11:47 - pico-train - INFO - โโโ Learning Rate: 5.00e-06 | |
| 2025-08-29 02:11:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:12:00 - pico-train - INFO - Step 825 -- ๐ Training Metrics | |
| 2025-08-29 02:12:00 - pico-train - INFO - โโโ Loss: 10.6256 | |
| 2025-08-29 02:12:00 - pico-train - INFO - โโโ Learning Rate: 5.16e-06 | |
| 2025-08-29 02:12:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:12:13 - pico-train - INFO - Step 850 -- ๐ Training Metrics | |
| 2025-08-29 02:12:13 - pico-train - INFO - โโโ Loss: 10.5961 | |
| 2025-08-29 02:12:13 - pico-train - INFO - โโโ Learning Rate: 5.31e-06 | |
| 2025-08-29 02:12:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:12:25 - pico-train - INFO - Step 875 -- ๐ Training Metrics | |
| 2025-08-29 02:12:25 - pico-train - INFO - โโโ Loss: 10.5443 | |
| 2025-08-29 02:12:25 - pico-train - INFO - โโโ Learning Rate: 5.47e-06 | |
| 2025-08-29 02:12:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:12:38 - pico-train - INFO - Step 900 -- ๐ Training Metrics | |
| 2025-08-29 02:12:38 - pico-train - INFO - โโโ Loss: 10.5197 | |
| 2025-08-29 02:12:38 - pico-train - INFO - โโโ Learning Rate: 5.63e-06 | |
| 2025-08-29 02:12:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:12:51 - pico-train - INFO - Step 925 -- ๐ Training Metrics | |
| 2025-08-29 02:12:51 - pico-train - INFO - โโโ Loss: 10.4854 | |
| 2025-08-29 02:12:51 - pico-train - INFO - โโโ Learning Rate: 5.78e-06 | |
| 2025-08-29 02:12:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:13:04 - pico-train - INFO - Step 950 -- ๐ Training Metrics | |
| 2025-08-29 02:13:04 - pico-train - INFO - โโโ Loss: 10.4826 | |
| 2025-08-29 02:13:04 - pico-train - INFO - โโโ Learning Rate: 5.94e-06 | |
| 2025-08-29 02:13:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:13:17 - pico-train - INFO - Step 975 -- ๐ Training Metrics | |
| 2025-08-29 02:13:17 - pico-train - INFO - โโโ Loss: 10.4557 | |
| 2025-08-29 02:13:17 - pico-train - INFO - โโโ Learning Rate: 6.09e-06 | |
| 2025-08-29 02:13:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:13:29 - pico-train - INFO - Step 1000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 02:15:26 - pico-train - INFO - Step 1000 -- ๐ Evaluation Results | |
| 2025-08-29 02:15:26 - pico-train - INFO - โโโ paloma: 7.125172406420199e+27 | |
| 2025-08-29 02:15:28 - pico-train - INFO - Step 1000 -- ๐ Training Metrics | |
| 2025-08-29 02:15:28 - pico-train - INFO - โโโ Loss: 10.4142 | |
| 2025-08-29 02:15:28 - pico-train - INFO - โโโ Learning Rate: 6.25e-06 | |
| 2025-08-29 02:15:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:15:28 - pico-train - INFO - Step 1000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 02:15:43 - pico-train - INFO - Step 1025 -- ๐ Training Metrics | |
| 2025-08-29 02:15:43 - pico-train - INFO - โโโ Loss: 10.3885 | |
| 2025-08-29 02:15:43 - pico-train - INFO - โโโ Learning Rate: 6.41e-06 | |
| 2025-08-29 02:15:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:15:55 - pico-train - INFO - Step 1050 -- ๐ Training Metrics | |
| 2025-08-29 02:15:55 - pico-train - INFO - โโโ Loss: 10.3737 | |
| 2025-08-29 02:15:55 - pico-train - INFO - โโโ Learning Rate: 6.56e-06 | |
| 2025-08-29 02:15:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:16:08 - pico-train - INFO - Step 1075 -- ๐ Training Metrics | |
| 2025-08-29 02:16:08 - pico-train - INFO - โโโ Loss: 10.3534 | |
| 2025-08-29 02:16:08 - pico-train - INFO - โโโ Learning Rate: 6.72e-06 | |
| 2025-08-29 02:16:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:16:21 - pico-train - INFO - Step 1100 -- ๐ Training Metrics | |
| 2025-08-29 02:16:21 - pico-train - INFO - โโโ Loss: 10.3219 | |
| 2025-08-29 02:16:21 - pico-train - INFO - โโโ Learning Rate: 6.88e-06 | |
| 2025-08-29 02:16:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:16:34 - pico-train - INFO - Step 1125 -- ๐ Training Metrics | |
| 2025-08-29 02:16:34 - pico-train - INFO - โโโ Loss: 10.3064 | |
| 2025-08-29 02:16:34 - pico-train - INFO - โโโ Learning Rate: 7.03e-06 | |
| 2025-08-29 02:16:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:16:46 - pico-train - INFO - Step 1150 -- ๐ Training Metrics | |
| 2025-08-29 02:16:46 - pico-train - INFO - โโโ Loss: 10.2761 | |
| 2025-08-29 02:16:46 - pico-train - INFO - โโโ Learning Rate: 7.19e-06 | |
| 2025-08-29 02:16:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:16:59 - pico-train - INFO - Step 1175 -- ๐ Training Metrics | |
| 2025-08-29 02:16:59 - pico-train - INFO - โโโ Loss: 10.2592 | |
| 2025-08-29 02:16:59 - pico-train - INFO - โโโ Learning Rate: 7.34e-06 | |
| 2025-08-29 02:16:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:17:12 - pico-train - INFO - Step 1200 -- ๐ Training Metrics | |
| 2025-08-29 02:17:12 - pico-train - INFO - โโโ Loss: 10.2420 | |
| 2025-08-29 02:17:12 - pico-train - INFO - โโโ Learning Rate: 7.50e-06 | |
| 2025-08-29 02:17:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:17:24 - pico-train - INFO - Step 1225 -- ๐ Training Metrics | |
| 2025-08-29 02:17:24 - pico-train - INFO - โโโ Loss: 10.2141 | |
| 2025-08-29 02:17:24 - pico-train - INFO - โโโ Learning Rate: 7.66e-06 | |
| 2025-08-29 02:17:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:17:37 - pico-train - INFO - Step 1250 -- ๐ Training Metrics | |
| 2025-08-29 02:17:37 - pico-train - INFO - โโโ Loss: 10.1882 | |
| 2025-08-29 02:17:37 - pico-train - INFO - โโโ Learning Rate: 7.81e-06 | |
| 2025-08-29 02:17:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:17:50 - pico-train - INFO - Step 1275 -- ๐ Training Metrics | |
| 2025-08-29 02:17:50 - pico-train - INFO - โโโ Loss: 10.1608 | |
| 2025-08-29 02:17:50 - pico-train - INFO - โโโ Learning Rate: 7.97e-06 | |
| 2025-08-29 02:17:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:18:02 - pico-train - INFO - Step 1300 -- ๐ Training Metrics | |
| 2025-08-29 02:18:02 - pico-train - INFO - โโโ Loss: 10.1460 | |
| 2025-08-29 02:18:02 - pico-train - INFO - โโโ Learning Rate: 8.13e-06 | |
| 2025-08-29 02:18:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:18:15 - pico-train - INFO - Step 1325 -- ๐ Training Metrics | |
| 2025-08-29 02:18:15 - pico-train - INFO - โโโ Loss: 10.0944 | |
| 2025-08-29 02:18:15 - pico-train - INFO - โโโ Learning Rate: 8.28e-06 | |
| 2025-08-29 02:18:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:18:28 - pico-train - INFO - Step 1350 -- ๐ Training Metrics | |
| 2025-08-29 02:18:28 - pico-train - INFO - โโโ Loss: 10.0885 | |
| 2025-08-29 02:18:28 - pico-train - INFO - โโโ Learning Rate: 8.44e-06 | |
| 2025-08-29 02:18:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:18:41 - pico-train - INFO - Step 1375 -- ๐ Training Metrics | |
| 2025-08-29 02:18:41 - pico-train - INFO - โโโ Loss: 10.0748 | |
| 2025-08-29 02:18:41 - pico-train - INFO - โโโ Learning Rate: 8.59e-06 | |
| 2025-08-29 02:18:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:18:53 - pico-train - INFO - Step 1400 -- ๐ Training Metrics | |
| 2025-08-29 02:18:53 - pico-train - INFO - โโโ Loss: 10.0425 | |
| 2025-08-29 02:18:53 - pico-train - INFO - โโโ Learning Rate: 8.75e-06 | |
| 2025-08-29 02:18:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:19:06 - pico-train - INFO - Step 1425 -- ๐ Training Metrics | |
| 2025-08-29 02:19:06 - pico-train - INFO - โโโ Loss: 10.0422 | |
| 2025-08-29 02:19:06 - pico-train - INFO - โโโ Learning Rate: 8.91e-06 | |
| 2025-08-29 02:19:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:19:19 - pico-train - INFO - Step 1450 -- ๐ Training Metrics | |
| 2025-08-29 02:19:19 - pico-train - INFO - โโโ Loss: 10.0039 | |
| 2025-08-29 02:19:19 - pico-train - INFO - โโโ Learning Rate: 9.06e-06 | |
| 2025-08-29 02:19:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:19:32 - pico-train - INFO - Step 1475 -- ๐ Training Metrics | |
| 2025-08-29 02:19:32 - pico-train - INFO - โโโ Loss: 9.9736 | |
| 2025-08-29 02:19:32 - pico-train - INFO - โโโ Learning Rate: 9.22e-06 | |
| 2025-08-29 02:19:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:19:44 - pico-train - INFO - Step 1500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 02:21:41 - pico-train - INFO - Step 1500 -- ๐ Evaluation Results | |
| 2025-08-29 02:21:41 - pico-train - INFO - โโโ paloma: 6.5469212698356e+18 | |
| 2025-08-29 02:21:44 - pico-train - INFO - Step 1500 -- ๐ Training Metrics | |
| 2025-08-29 02:21:44 - pico-train - INFO - โโโ Loss: 9.9729 | |
| 2025-08-29 02:21:44 - pico-train - INFO - โโโ Learning Rate: 9.38e-06 | |
| 2025-08-29 02:21:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:21:44 - pico-train - INFO - Step 1500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 02:22:00 - pico-train - INFO - Step 1525 -- ๐ Training Metrics | |
| 2025-08-29 02:22:00 - pico-train - INFO - โโโ Loss: 9.9379 | |
| 2025-08-29 02:22:00 - pico-train - INFO - โโโ Learning Rate: 9.53e-06 | |
| 2025-08-29 02:22:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:22:13 - pico-train - INFO - Step 1550 -- ๐ Training Metrics | |
| 2025-08-29 02:22:13 - pico-train - INFO - โโโ Loss: 9.8819 | |
| 2025-08-29 02:22:13 - pico-train - INFO - โโโ Learning Rate: 9.69e-06 | |
| 2025-08-29 02:22:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:22:26 - pico-train - INFO - Step 1575 -- ๐ Training Metrics | |
| 2025-08-29 02:22:26 - pico-train - INFO - โโโ Loss: 9.8702 | |
| 2025-08-29 02:22:26 - pico-train - INFO - โโโ Learning Rate: 9.84e-06 | |
| 2025-08-29 02:22:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:22:38 - pico-train - INFO - Step 1600 -- ๐ Training Metrics | |
| 2025-08-29 02:22:38 - pico-train - INFO - โโโ Loss: 9.8571 | |
| 2025-08-29 02:22:38 - pico-train - INFO - โโโ Learning Rate: 1.00e-05 | |
| 2025-08-29 02:22:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:22:51 - pico-train - INFO - Step 1625 -- ๐ Training Metrics | |
| 2025-08-29 02:22:51 - pico-train - INFO - โโโ Loss: 9.8356 | |
| 2025-08-29 02:22:51 - pico-train - INFO - โโโ Learning Rate: 1.02e-05 | |
| 2025-08-29 02:22:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:23:04 - pico-train - INFO - Step 1650 -- ๐ Training Metrics | |
| 2025-08-29 02:23:04 - pico-train - INFO - โโโ Loss: 9.7973 | |
| 2025-08-29 02:23:04 - pico-train - INFO - โโโ Learning Rate: 1.03e-05 | |
| 2025-08-29 02:23:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:23:16 - pico-train - INFO - Step 1675 -- ๐ Training Metrics | |
| 2025-08-29 02:23:16 - pico-train - INFO - โโโ Loss: 9.7745 | |
| 2025-08-29 02:23:16 - pico-train - INFO - โโโ Learning Rate: 1.05e-05 | |
| 2025-08-29 02:23:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:23:29 - pico-train - INFO - Step 1700 -- ๐ Training Metrics | |
| 2025-08-29 02:23:29 - pico-train - INFO - โโโ Loss: 9.7673 | |
| 2025-08-29 02:23:29 - pico-train - INFO - โโโ Learning Rate: 1.06e-05 | |
| 2025-08-29 02:23:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:23:42 - pico-train - INFO - Step 1725 -- ๐ Training Metrics | |
| 2025-08-29 02:23:42 - pico-train - INFO - โโโ Loss: 9.7406 | |
| 2025-08-29 02:23:42 - pico-train - INFO - โโโ Learning Rate: 1.08e-05 | |
| 2025-08-29 02:23:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:23:55 - pico-train - INFO - Step 1750 -- ๐ Training Metrics | |
| 2025-08-29 02:23:55 - pico-train - INFO - โโโ Loss: 9.7312 | |
| 2025-08-29 02:23:55 - pico-train - INFO - โโโ Learning Rate: 1.09e-05 | |
| 2025-08-29 02:23:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:24:07 - pico-train - INFO - Step 1775 -- ๐ Training Metrics | |
| 2025-08-29 02:24:07 - pico-train - INFO - โโโ Loss: 9.6563 | |
| 2025-08-29 02:24:07 - pico-train - INFO - โโโ Learning Rate: 1.11e-05 | |
| 2025-08-29 02:24:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:24:20 - pico-train - INFO - Step 1800 -- ๐ Training Metrics | |
| 2025-08-29 02:24:20 - pico-train - INFO - โโโ Loss: 9.6515 | |
| 2025-08-29 02:24:20 - pico-train - INFO - โโโ Learning Rate: 1.13e-05 | |
| 2025-08-29 02:24:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:24:33 - pico-train - INFO - Step 1825 -- ๐ Training Metrics | |
| 2025-08-29 02:24:33 - pico-train - INFO - โโโ Loss: 9.6241 | |
| 2025-08-29 02:24:33 - pico-train - INFO - โโโ Learning Rate: 1.14e-05 | |
| 2025-08-29 02:24:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:24:45 - pico-train - INFO - Step 1850 -- ๐ Training Metrics | |
| 2025-08-29 02:24:45 - pico-train - INFO - โโโ Loss: 9.6015 | |
| 2025-08-29 02:24:45 - pico-train - INFO - โโโ Learning Rate: 1.16e-05 | |
| 2025-08-29 02:24:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:24:58 - pico-train - INFO - Step 1875 -- ๐ Training Metrics | |
| 2025-08-29 02:24:58 - pico-train - INFO - โโโ Loss: 9.5933 | |
| 2025-08-29 02:24:58 - pico-train - INFO - โโโ Learning Rate: 1.17e-05 | |
| 2025-08-29 02:24:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:25:11 - pico-train - INFO - Step 1900 -- ๐ Training Metrics | |
| 2025-08-29 02:25:11 - pico-train - INFO - โโโ Loss: 9.5544 | |
| 2025-08-29 02:25:11 - pico-train - INFO - โโโ Learning Rate: 1.19e-05 | |
| 2025-08-29 02:25:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:25:23 - pico-train - INFO - Step 1925 -- ๐ Training Metrics | |
| 2025-08-29 02:25:23 - pico-train - INFO - โโโ Loss: 9.5407 | |
| 2025-08-29 02:25:23 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 | |
| 2025-08-29 02:25:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:25:36 - pico-train - INFO - Step 1950 -- ๐ Training Metrics | |
| 2025-08-29 02:25:36 - pico-train - INFO - โโโ Loss: 9.5431 | |
| 2025-08-29 02:25:36 - pico-train - INFO - โโโ Learning Rate: 1.22e-05 | |
| 2025-08-29 02:25:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:25:49 - pico-train - INFO - Step 1975 -- ๐ Training Metrics | |
| 2025-08-29 02:25:49 - pico-train - INFO - โโโ Loss: 9.4853 | |
| 2025-08-29 02:25:49 - pico-train - INFO - โโโ Learning Rate: 1.23e-05 | |
| 2025-08-29 02:25:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:26:01 - pico-train - INFO - Step 2000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 02:28:10 - pico-train - INFO - Step 2000 -- ๐ Evaluation Results | |
| 2025-08-29 02:28:10 - pico-train - INFO - โโโ paloma: 5.118641309912889e+18 | |
| 2025-08-29 02:28:12 - pico-train - INFO - Step 2000 -- ๐ Training Metrics | |
| 2025-08-29 02:28:12 - pico-train - INFO - โโโ Loss: 9.4665 | |
| 2025-08-29 02:28:12 - pico-train - INFO - โโโ Learning Rate: 1.25e-05 | |
| 2025-08-29 02:28:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:28:12 - pico-train - INFO - Step 2000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 02:28:29 - pico-train - INFO - Step 2025 -- ๐ Training Metrics | |
| 2025-08-29 02:28:29 - pico-train - INFO - โโโ Loss: 9.4621 | |
| 2025-08-29 02:28:29 - pico-train - INFO - โโโ Learning Rate: 1.27e-05 | |
| 2025-08-29 02:28:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:28:41 - pico-train - INFO - Step 2050 -- ๐ Training Metrics | |
| 2025-08-29 02:28:41 - pico-train - INFO - โโโ Loss: 9.4031 | |
| 2025-08-29 02:28:41 - pico-train - INFO - โโโ Learning Rate: 1.28e-05 | |
| 2025-08-29 02:28:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:28:54 - pico-train - INFO - Step 2075 -- ๐ Training Metrics | |
| 2025-08-29 02:28:54 - pico-train - INFO - โโโ Loss: 9.3699 | |
| 2025-08-29 02:28:54 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 | |
| 2025-08-29 02:28:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:29:07 - pico-train - INFO - Step 2100 -- ๐ Training Metrics | |
| 2025-08-29 02:29:07 - pico-train - INFO - โโโ Loss: 9.3422 | |
| 2025-08-29 02:29:07 - pico-train - INFO - โโโ Learning Rate: 1.31e-05 | |
| 2025-08-29 02:29:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:29:20 - pico-train - INFO - Step 2125 -- ๐ Training Metrics | |
| 2025-08-29 02:29:20 - pico-train - INFO - โโโ Loss: 9.3129 | |
| 2025-08-29 02:29:20 - pico-train - INFO - โโโ Learning Rate: 1.33e-05 | |
| 2025-08-29 02:29:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:29:32 - pico-train - INFO - Step 2150 -- ๐ Training Metrics | |
| 2025-08-29 02:29:32 - pico-train - INFO - โโโ Loss: 9.2917 | |
| 2025-08-29 02:29:32 - pico-train - INFO - โโโ Learning Rate: 1.34e-05 | |
| 2025-08-29 02:29:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:29:45 - pico-train - INFO - Step 2175 -- ๐ Training Metrics | |
| 2025-08-29 02:29:45 - pico-train - INFO - โโโ Loss: 9.2670 | |
| 2025-08-29 02:29:45 - pico-train - INFO - โโโ Learning Rate: 1.36e-05 | |
| 2025-08-29 02:29:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:29:58 - pico-train - INFO - Step 2200 -- ๐ Training Metrics | |
| 2025-08-29 02:29:58 - pico-train - INFO - โโโ Loss: 9.2512 | |
| 2025-08-29 02:29:58 - pico-train - INFO - โโโ Learning Rate: 1.38e-05 | |
| 2025-08-29 02:29:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:30:10 - pico-train - INFO - Step 2225 -- ๐ Training Metrics | |
| 2025-08-29 02:30:10 - pico-train - INFO - โโโ Loss: 9.2737 | |
| 2025-08-29 02:30:10 - pico-train - INFO - โโโ Learning Rate: 1.39e-05 | |
| 2025-08-29 02:30:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:30:23 - pico-train - INFO - Step 2250 -- ๐ Training Metrics | |
| 2025-08-29 02:30:23 - pico-train - INFO - โโโ Loss: 9.2357 | |
| 2025-08-29 02:30:23 - pico-train - INFO - โโโ Learning Rate: 1.41e-05 | |
| 2025-08-29 02:30:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:30:36 - pico-train - INFO - Step 2275 -- ๐ Training Metrics | |
| 2025-08-29 02:30:36 - pico-train - INFO - โโโ Loss: 9.1471 | |
| 2025-08-29 02:30:36 - pico-train - INFO - โโโ Learning Rate: 1.42e-05 | |
| 2025-08-29 02:30:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:30:49 - pico-train - INFO - Step 2300 -- ๐ Training Metrics | |
| 2025-08-29 02:30:49 - pico-train - INFO - โโโ Loss: 9.1305 | |
| 2025-08-29 02:30:49 - pico-train - INFO - โโโ Learning Rate: 1.44e-05 | |
| 2025-08-29 02:30:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:31:01 - pico-train - INFO - Step 2325 -- ๐ Training Metrics | |
| 2025-08-29 02:31:01 - pico-train - INFO - โโโ Loss: 9.1430 | |
| 2025-08-29 02:31:01 - pico-train - INFO - โโโ Learning Rate: 1.45e-05 | |
| 2025-08-29 02:31:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:31:14 - pico-train - INFO - Step 2350 -- ๐ Training Metrics | |
| 2025-08-29 02:31:14 - pico-train - INFO - โโโ Loss: 9.0948 | |
| 2025-08-29 02:31:14 - pico-train - INFO - โโโ Learning Rate: 1.47e-05 | |
| 2025-08-29 02:31:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:31:27 - pico-train - INFO - Step 2375 -- ๐ Training Metrics | |
| 2025-08-29 02:31:27 - pico-train - INFO - โโโ Loss: 9.0256 | |
| 2025-08-29 02:31:27 - pico-train - INFO - โโโ Learning Rate: 1.48e-05 | |
| 2025-08-29 02:31:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:31:39 - pico-train - INFO - Step 2400 -- ๐ Training Metrics | |
| 2025-08-29 02:31:39 - pico-train - INFO - โโโ Loss: 9.0664 | |
| 2025-08-29 02:31:39 - pico-train - INFO - โโโ Learning Rate: 1.50e-05 | |
| 2025-08-29 02:31:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:31:52 - pico-train - INFO - Step 2425 -- ๐ Training Metrics | |
| 2025-08-29 02:31:52 - pico-train - INFO - โโโ Loss: 9.0020 | |
| 2025-08-29 02:31:52 - pico-train - INFO - โโโ Learning Rate: 1.52e-05 | |
| 2025-08-29 02:31:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:32:05 - pico-train - INFO - Step 2450 -- ๐ Training Metrics | |
| 2025-08-29 02:32:05 - pico-train - INFO - โโโ Loss: 8.9518 | |
| 2025-08-29 02:32:05 - pico-train - INFO - โโโ Learning Rate: 1.53e-05 | |
| 2025-08-29 02:32:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:32:18 - pico-train - INFO - Step 2475 -- ๐ Training Metrics | |
| 2025-08-29 02:32:18 - pico-train - INFO - โโโ Loss: 8.9717 | |
| 2025-08-29 02:32:18 - pico-train - INFO - โโโ Learning Rate: 1.55e-05 | |
| 2025-08-29 02:32:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:32:30 - pico-train - INFO - Step 2500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 02:34:27 - pico-train - INFO - Step 2500 -- ๐ Evaluation Results | |
| 2025-08-29 02:34:27 - pico-train - INFO - โโโ paloma: 3.37924315167126e+18 | |
| 2025-08-29 02:34:29 - pico-train - INFO - Step 2500 -- ๐ Training Metrics | |
| 2025-08-29 02:34:29 - pico-train - INFO - โโโ Loss: 8.9536 | |
| 2025-08-29 02:34:29 - pico-train - INFO - โโโ Learning Rate: 1.56e-05 | |
| 2025-08-29 02:34:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:34:29 - pico-train - INFO - Step 2500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 02:34:45 - pico-train - INFO - Step 2525 -- ๐ Training Metrics | |
| 2025-08-29 02:34:45 - pico-train - INFO - โโโ Loss: 8.8812 | |
| 2025-08-29 02:34:45 - pico-train - INFO - โโโ Learning Rate: 1.58e-05 | |
| 2025-08-29 02:34:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:34:58 - pico-train - INFO - Step 2550 -- ๐ Training Metrics | |
| 2025-08-29 02:34:58 - pico-train - INFO - โโโ Loss: 8.8824 | |
| 2025-08-29 02:34:58 - pico-train - INFO - โโโ Learning Rate: 1.59e-05 | |
| 2025-08-29 02:34:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:35:11 - pico-train - INFO - Step 2575 -- ๐ Training Metrics | |
| 2025-08-29 02:35:11 - pico-train - INFO - โโโ Loss: 8.8564 | |
| 2025-08-29 02:35:11 - pico-train - INFO - โโโ Learning Rate: 1.61e-05 | |
| 2025-08-29 02:35:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:35:23 - pico-train - INFO - Step 2600 -- ๐ Training Metrics | |
| 2025-08-29 02:35:23 - pico-train - INFO - โโโ Loss: 8.8419 | |
| 2025-08-29 02:35:23 - pico-train - INFO - โโโ Learning Rate: 1.63e-05 | |
| 2025-08-29 02:35:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:35:36 - pico-train - INFO - Step 2625 -- ๐ Training Metrics | |
| 2025-08-29 02:35:36 - pico-train - INFO - โโโ Loss: 8.7865 | |
| 2025-08-29 02:35:36 - pico-train - INFO - โโโ Learning Rate: 1.64e-05 | |
| 2025-08-29 02:35:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:35:49 - pico-train - INFO - Step 2650 -- ๐ Training Metrics | |
| 2025-08-29 02:35:49 - pico-train - INFO - โโโ Loss: 8.7493 | |
| 2025-08-29 02:35:49 - pico-train - INFO - โโโ Learning Rate: 1.66e-05 | |
| 2025-08-29 02:35:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:36:02 - pico-train - INFO - Step 2675 -- ๐ Training Metrics | |
| 2025-08-29 02:36:02 - pico-train - INFO - โโโ Loss: 8.7255 | |
| 2025-08-29 02:36:02 - pico-train - INFO - โโโ Learning Rate: 1.67e-05 | |
| 2025-08-29 02:36:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:36:14 - pico-train - INFO - Step 2700 -- ๐ Training Metrics | |
| 2025-08-29 02:36:14 - pico-train - INFO - โโโ Loss: 8.6469 | |
| 2025-08-29 02:36:14 - pico-train - INFO - โโโ Learning Rate: 1.69e-05 | |
| 2025-08-29 02:36:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:36:27 - pico-train - INFO - Step 2725 -- ๐ Training Metrics | |
| 2025-08-29 02:36:27 - pico-train - INFO - โโโ Loss: 8.6799 | |
| 2025-08-29 02:36:27 - pico-train - INFO - โโโ Learning Rate: 1.70e-05 | |
| 2025-08-29 02:36:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:36:40 - pico-train - INFO - Step 2750 -- ๐ Training Metrics | |
| 2025-08-29 02:36:40 - pico-train - INFO - โโโ Loss: 8.6974 | |
| 2025-08-29 02:36:40 - pico-train - INFO - โโโ Learning Rate: 1.72e-05 | |
| 2025-08-29 02:36:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:36:53 - pico-train - INFO - Step 2775 -- ๐ Training Metrics | |
| 2025-08-29 02:36:53 - pico-train - INFO - โโโ Loss: 8.6441 | |
| 2025-08-29 02:36:53 - pico-train - INFO - โโโ Learning Rate: 1.73e-05 | |
| 2025-08-29 02:36:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:37:05 - pico-train - INFO - Step 2800 -- ๐ Training Metrics | |
| 2025-08-29 02:37:05 - pico-train - INFO - โโโ Loss: 8.6689 | |
| 2025-08-29 02:37:05 - pico-train - INFO - โโโ Learning Rate: 1.75e-05 | |
| 2025-08-29 02:37:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:37:18 - pico-train - INFO - Step 2825 -- ๐ Training Metrics | |
| 2025-08-29 02:37:18 - pico-train - INFO - โโโ Loss: 8.5732 | |
| 2025-08-29 02:37:18 - pico-train - INFO - โโโ Learning Rate: 1.77e-05 | |
| 2025-08-29 02:37:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:37:31 - pico-train - INFO - Step 2850 -- ๐ Training Metrics | |
| 2025-08-29 02:37:31 - pico-train - INFO - โโโ Loss: 8.5955 | |
| 2025-08-29 02:37:31 - pico-train - INFO - โโโ Learning Rate: 1.78e-05 | |
| 2025-08-29 02:37:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:37:44 - pico-train - INFO - Step 2875 -- ๐ Training Metrics | |
| 2025-08-29 02:37:44 - pico-train - INFO - โโโ Loss: 8.5823 | |
| 2025-08-29 02:37:44 - pico-train - INFO - โโโ Learning Rate: 1.80e-05 | |
| 2025-08-29 02:37:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:37:56 - pico-train - INFO - Step 2900 -- ๐ Training Metrics | |
| 2025-08-29 02:37:56 - pico-train - INFO - โโโ Loss: 8.5968 | |
| 2025-08-29 02:37:56 - pico-train - INFO - โโโ Learning Rate: 1.81e-05 | |
| 2025-08-29 02:37:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:38:09 - pico-train - INFO - Step 2925 -- ๐ Training Metrics | |
| 2025-08-29 02:38:09 - pico-train - INFO - โโโ Loss: 8.4721 | |
| 2025-08-29 02:38:09 - pico-train - INFO - โโโ Learning Rate: 1.83e-05 | |
| 2025-08-29 02:38:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:38:22 - pico-train - INFO - Step 2950 -- ๐ Training Metrics | |
| 2025-08-29 02:38:22 - pico-train - INFO - โโโ Loss: 8.4672 | |
| 2025-08-29 02:38:22 - pico-train - INFO - โโโ Learning Rate: 1.84e-05 | |
| 2025-08-29 02:38:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:38:34 - pico-train - INFO - Step 2975 -- ๐ Training Metrics | |
| 2025-08-29 02:38:34 - pico-train - INFO - โโโ Loss: 8.4033 | |
| 2025-08-29 02:38:34 - pico-train - INFO - โโโ Learning Rate: 1.86e-05 | |
| 2025-08-29 02:38:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:38:47 - pico-train - INFO - Step 3000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 02:40:44 - pico-train - INFO - Step 3000 -- ๐ Evaluation Results | |
| 2025-08-29 02:40:44 - pico-train - INFO - โโโ paloma: 6.892747900243237e+18 | |
| 2025-08-29 02:40:47 - pico-train - INFO - Step 3000 -- ๐ Training Metrics | |
| 2025-08-29 02:40:47 - pico-train - INFO - โโโ Loss: 8.4947 | |
| 2025-08-29 02:40:47 - pico-train - INFO - โโโ Learning Rate: 1.88e-05 | |
| 2025-08-29 02:40:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:40:47 - pico-train - INFO - Step 3000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 02:41:03 - pico-train - INFO - Step 3025 -- ๐ Training Metrics | |
| 2025-08-29 02:41:03 - pico-train - INFO - โโโ Loss: 8.3780 | |
| 2025-08-29 02:41:03 - pico-train - INFO - โโโ Learning Rate: 1.89e-05 | |
| 2025-08-29 02:41:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:41:16 - pico-train - INFO - Step 3050 -- ๐ Training Metrics | |
| 2025-08-29 02:41:16 - pico-train - INFO - โโโ Loss: 8.3581 | |
| 2025-08-29 02:41:16 - pico-train - INFO - โโโ Learning Rate: 1.91e-05 | |
| 2025-08-29 02:41:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:41:28 - pico-train - INFO - Step 3075 -- ๐ Training Metrics | |
| 2025-08-29 02:41:28 - pico-train - INFO - โโโ Loss: 8.3341 | |
| 2025-08-29 02:41:28 - pico-train - INFO - โโโ Learning Rate: 1.92e-05 | |
| 2025-08-29 02:41:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:41:41 - pico-train - INFO - Step 3100 -- ๐ Training Metrics | |
| 2025-08-29 02:41:41 - pico-train - INFO - โโโ Loss: 8.3391 | |
| 2025-08-29 02:41:41 - pico-train - INFO - โโโ Learning Rate: 1.94e-05 | |
| 2025-08-29 02:41:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:41:54 - pico-train - INFO - Step 3125 -- ๐ Training Metrics | |
| 2025-08-29 02:41:54 - pico-train - INFO - โโโ Loss: 8.3670 | |
| 2025-08-29 02:41:54 - pico-train - INFO - โโโ Learning Rate: 1.95e-05 | |
| 2025-08-29 02:41:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:42:07 - pico-train - INFO - Step 3150 -- ๐ Training Metrics | |
| 2025-08-29 02:42:07 - pico-train - INFO - โโโ Loss: 8.2370 | |
| 2025-08-29 02:42:07 - pico-train - INFO - โโโ Learning Rate: 1.97e-05 | |
| 2025-08-29 02:42:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:42:19 - pico-train - INFO - Step 3175 -- ๐ Training Metrics | |
| 2025-08-29 02:42:19 - pico-train - INFO - โโโ Loss: 8.2879 | |
| 2025-08-29 02:42:19 - pico-train - INFO - โโโ Learning Rate: 1.98e-05 | |
| 2025-08-29 02:42:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:42:32 - pico-train - INFO - Step 3200 -- ๐ Training Metrics | |
| 2025-08-29 02:42:32 - pico-train - INFO - โโโ Loss: 8.2706 | |
| 2025-08-29 02:42:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 | |
| 2025-08-29 02:42:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:42:45 - pico-train - INFO - Step 3225 -- ๐ Training Metrics | |
| 2025-08-29 02:42:45 - pico-train - INFO - โโโ Loss: 8.1983 | |
| 2025-08-29 02:42:45 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 | |
| 2025-08-29 02:42:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:42:57 - pico-train - INFO - Step 3250 -- ๐ Training Metrics | |
| 2025-08-29 02:42:57 - pico-train - INFO - โโโ Loss: 8.2174 | |
| 2025-08-29 02:42:57 - pico-train - INFO - โโโ Learning Rate: 2.03e-05 | |
| 2025-08-29 02:42:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:43:10 - pico-train - INFO - Step 3275 -- ๐ Training Metrics | |
| 2025-08-29 02:43:10 - pico-train - INFO - โโโ Loss: 8.2229 | |
| 2025-08-29 02:43:10 - pico-train - INFO - โโโ Learning Rate: 2.05e-05 | |
| 2025-08-29 02:43:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:43:23 - pico-train - INFO - Step 3300 -- ๐ Training Metrics | |
| 2025-08-29 02:43:23 - pico-train - INFO - โโโ Loss: 8.1398 | |
| 2025-08-29 02:43:23 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 | |
| 2025-08-29 02:43:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:43:36 - pico-train - INFO - Step 3325 -- ๐ Training Metrics | |
| 2025-08-29 02:43:36 - pico-train - INFO - โโโ Loss: 8.1430 | |
| 2025-08-29 02:43:36 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 | |
| 2025-08-29 02:43:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:43:49 - pico-train - INFO - Step 3350 -- ๐ Training Metrics | |
| 2025-08-29 02:43:49 - pico-train - INFO - โโโ Loss: 8.1471 | |
| 2025-08-29 02:43:49 - pico-train - INFO - โโโ Learning Rate: 2.09e-05 | |
| 2025-08-29 02:43:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:44:01 - pico-train - INFO - Step 3375 -- ๐ Training Metrics | |
| 2025-08-29 02:44:01 - pico-train - INFO - โโโ Loss: 8.0908 | |
| 2025-08-29 02:44:01 - pico-train - INFO - โโโ Learning Rate: 2.11e-05 | |
| 2025-08-29 02:44:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:44:14 - pico-train - INFO - Step 3400 -- ๐ Training Metrics | |
| 2025-08-29 02:44:14 - pico-train - INFO - โโโ Loss: 8.1165 | |
| 2025-08-29 02:44:14 - pico-train - INFO - โโโ Learning Rate: 2.13e-05 | |
| 2025-08-29 02:44:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:44:27 - pico-train - INFO - Step 3425 -- ๐ Training Metrics | |
| 2025-08-29 02:44:27 - pico-train - INFO - โโโ Loss: 8.0957 | |
| 2025-08-29 02:44:27 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 | |
| 2025-08-29 02:44:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:44:39 - pico-train - INFO - Step 3450 -- ๐ Training Metrics | |
| 2025-08-29 02:44:39 - pico-train - INFO - โโโ Loss: 8.1115 | |
| 2025-08-29 02:44:39 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 | |
| 2025-08-29 02:44:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:44:52 - pico-train - INFO - Step 3475 -- ๐ Training Metrics | |
| 2025-08-29 02:44:52 - pico-train - INFO - โโโ Loss: 8.0623 | |
| 2025-08-29 02:44:52 - pico-train - INFO - โโโ Learning Rate: 2.17e-05 | |
| 2025-08-29 02:44:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:45:04 - pico-train - INFO - Step 3500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 02:46:59 - pico-train - INFO - Step 3500 -- ๐ Evaluation Results | |
| 2025-08-29 02:46:59 - pico-train - INFO - โโโ paloma: 2.0436832271954907e+19 | |
| 2025-08-29 02:47:05 - pico-train - INFO - Step 3500 -- ๐ Training Metrics | |
| 2025-08-29 02:47:05 - pico-train - INFO - โโโ Loss: 8.0527 | |
| 2025-08-29 02:47:05 - pico-train - INFO - โโโ Learning Rate: 2.19e-05 | |
| 2025-08-29 02:47:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:47:05 - pico-train - INFO - Step 3500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 02:47:26 - pico-train - INFO - Step 3525 -- ๐ Training Metrics | |
| 2025-08-29 02:47:26 - pico-train - INFO - โโโ Loss: 7.9975 | |
| 2025-08-29 02:47:26 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 | |
| 2025-08-29 02:47:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:47:38 - pico-train - INFO - Step 3550 -- ๐ Training Metrics | |
| 2025-08-29 02:47:38 - pico-train - INFO - โโโ Loss: 7.9881 | |
| 2025-08-29 02:47:38 - pico-train - INFO - โโโ Learning Rate: 2.22e-05 | |
| 2025-08-29 02:47:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:47:51 - pico-train - INFO - Step 3575 -- ๐ Training Metrics | |
| 2025-08-29 02:47:51 - pico-train - INFO - โโโ Loss: 8.0060 | |
| 2025-08-29 02:47:51 - pico-train - INFO - โโโ Learning Rate: 2.23e-05 | |
| 2025-08-29 02:47:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:48:04 - pico-train - INFO - Step 3600 -- ๐ Training Metrics | |
| 2025-08-29 02:48:04 - pico-train - INFO - โโโ Loss: 7.9366 | |
| 2025-08-29 02:48:04 - pico-train - INFO - โโโ Learning Rate: 2.25e-05 | |
| 2025-08-29 02:48:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:48:16 - pico-train - INFO - Step 3625 -- ๐ Training Metrics | |
| 2025-08-29 02:48:16 - pico-train - INFO - โโโ Loss: 8.0252 | |
| 2025-08-29 02:48:16 - pico-train - INFO - โโโ Learning Rate: 2.27e-05 | |
| 2025-08-29 02:48:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:48:29 - pico-train - INFO - Step 3650 -- ๐ Training Metrics | |
| 2025-08-29 02:48:29 - pico-train - INFO - โโโ Loss: 7.9160 | |
| 2025-08-29 02:48:29 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 | |
| 2025-08-29 02:48:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:48:42 - pico-train - INFO - Step 3675 -- ๐ Training Metrics | |
| 2025-08-29 02:48:42 - pico-train - INFO - โโโ Loss: 7.9470 | |
| 2025-08-29 02:48:42 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 | |
| 2025-08-29 02:48:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:48:55 - pico-train - INFO - Step 3700 -- ๐ Training Metrics | |
| 2025-08-29 02:48:55 - pico-train - INFO - โโโ Loss: 7.8943 | |
| 2025-08-29 02:48:55 - pico-train - INFO - โโโ Learning Rate: 2.31e-05 | |
| 2025-08-29 02:48:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:49:07 - pico-train - INFO - Step 3725 -- ๐ Training Metrics | |
| 2025-08-29 02:49:07 - pico-train - INFO - โโโ Loss: 7.8951 | |
| 2025-08-29 02:49:07 - pico-train - INFO - โโโ Learning Rate: 2.33e-05 | |
| 2025-08-29 02:49:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:49:20 - pico-train - INFO - Step 3750 -- ๐ Training Metrics | |
| 2025-08-29 02:49:20 - pico-train - INFO - โโโ Loss: 7.9316 | |
| 2025-08-29 02:49:20 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 | |
| 2025-08-29 02:49:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:49:33 - pico-train - INFO - Step 3775 -- ๐ Training Metrics | |
| 2025-08-29 02:49:33 - pico-train - INFO - โโโ Loss: 7.9407 | |
| 2025-08-29 02:49:33 - pico-train - INFO - โโโ Learning Rate: 2.36e-05 | |
| 2025-08-29 02:49:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:49:46 - pico-train - INFO - Step 3800 -- ๐ Training Metrics | |
| 2025-08-29 02:49:46 - pico-train - INFO - โโโ Loss: 7.9385 | |
| 2025-08-29 02:49:46 - pico-train - INFO - โโโ Learning Rate: 2.38e-05 | |
| 2025-08-29 02:49:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:49:59 - pico-train - INFO - Step 3825 -- ๐ Training Metrics | |
| 2025-08-29 02:49:59 - pico-train - INFO - โโโ Loss: 7.8800 | |
| 2025-08-29 02:49:59 - pico-train - INFO - โโโ Learning Rate: 2.39e-05 | |
| 2025-08-29 02:49:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:50:11 - pico-train - INFO - Step 3850 -- ๐ Training Metrics | |
| 2025-08-29 02:50:11 - pico-train - INFO - โโโ Loss: 7.9207 | |
| 2025-08-29 02:50:11 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 | |
| 2025-08-29 02:50:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:50:24 - pico-train - INFO - Step 3875 -- ๐ Training Metrics | |
| 2025-08-29 02:50:24 - pico-train - INFO - โโโ Loss: 7.8258 | |
| 2025-08-29 02:50:24 - pico-train - INFO - โโโ Learning Rate: 2.42e-05 | |
| 2025-08-29 02:50:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:50:37 - pico-train - INFO - Step 3900 -- ๐ Training Metrics | |
| 2025-08-29 02:50:37 - pico-train - INFO - โโโ Loss: 7.9005 | |
| 2025-08-29 02:50:37 - pico-train - INFO - โโโ Learning Rate: 2.44e-05 | |
| 2025-08-29 02:50:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:50:50 - pico-train - INFO - Step 3925 -- ๐ Training Metrics | |
| 2025-08-29 02:50:50 - pico-train - INFO - โโโ Loss: 7.8232 | |
| 2025-08-29 02:50:50 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 | |
| 2025-08-29 02:50:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:51:03 - pico-train - INFO - Step 3950 -- ๐ Training Metrics | |
| 2025-08-29 02:51:03 - pico-train - INFO - โโโ Loss: 7.7847 | |
| 2025-08-29 02:51:03 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 | |
| 2025-08-29 02:51:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:51:15 - pico-train - INFO - Step 3975 -- ๐ Training Metrics | |
| 2025-08-29 02:51:15 - pico-train - INFO - โโโ Loss: 7.7909 | |
| 2025-08-29 02:51:15 - pico-train - INFO - โโโ Learning Rate: 2.48e-05 | |
| 2025-08-29 02:51:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:51:28 - pico-train - INFO - Step 4000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 02:53:29 - pico-train - INFO - Step 4000 -- ๐ Evaluation Results | |
| 2025-08-29 02:53:29 - pico-train - INFO - โโโ paloma: 4.1410268232311005e+19 | |
| 2025-08-29 02:53:31 - pico-train - INFO - Step 4000 -- ๐ Training Metrics | |
| 2025-08-29 02:53:31 - pico-train - INFO - โโโ Loss: 7.7419 | |
| 2025-08-29 02:53:31 - pico-train - INFO - โโโ Learning Rate: 2.50e-05 | |
| 2025-08-29 02:53:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:53:31 - pico-train - INFO - Step 4000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 02:53:47 - pico-train - INFO - Step 4025 -- ๐ Training Metrics | |
| 2025-08-29 02:53:47 - pico-train - INFO - โโโ Loss: 7.8031 | |
| 2025-08-29 02:53:47 - pico-train - INFO - โโโ Learning Rate: 2.52e-05 | |
| 2025-08-29 02:53:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:54:00 - pico-train - INFO - Step 4050 -- ๐ Training Metrics | |
| 2025-08-29 02:54:00 - pico-train - INFO - โโโ Loss: 7.7948 | |
| 2025-08-29 02:54:00 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 | |
| 2025-08-29 02:54:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:54:13 - pico-train - INFO - Step 4075 -- ๐ Training Metrics | |
| 2025-08-29 02:54:13 - pico-train - INFO - โโโ Loss: 7.7259 | |
| 2025-08-29 02:54:13 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 | |
| 2025-08-29 02:54:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:54:25 - pico-train - INFO - Step 4100 -- ๐ Training Metrics | |
| 2025-08-29 02:54:25 - pico-train - INFO - โโโ Loss: 7.8406 | |
| 2025-08-29 02:54:25 - pico-train - INFO - โโโ Learning Rate: 2.56e-05 | |
| 2025-08-29 02:54:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:54:38 - pico-train - INFO - Step 4125 -- ๐ Training Metrics | |
| 2025-08-29 02:54:38 - pico-train - INFO - โโโ Loss: 7.7938 | |
| 2025-08-29 02:54:38 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 | |
| 2025-08-29 02:54:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:54:51 - pico-train - INFO - Step 4150 -- ๐ Training Metrics | |
| 2025-08-29 02:54:51 - pico-train - INFO - โโโ Loss: 7.7101 | |
| 2025-08-29 02:54:51 - pico-train - INFO - โโโ Learning Rate: 2.59e-05 | |
| 2025-08-29 02:54:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:55:04 - pico-train - INFO - Step 4175 -- ๐ Training Metrics | |
| 2025-08-29 02:55:04 - pico-train - INFO - โโโ Loss: 7.6633 | |
| 2025-08-29 02:55:04 - pico-train - INFO - โโโ Learning Rate: 2.61e-05 | |
| 2025-08-29 02:55:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:55:17 - pico-train - INFO - Step 4200 -- ๐ Training Metrics | |
| 2025-08-29 02:55:17 - pico-train - INFO - โโโ Loss: 7.6830 | |
| 2025-08-29 02:55:17 - pico-train - INFO - โโโ Learning Rate: 2.63e-05 | |
| 2025-08-29 02:55:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:55:29 - pico-train - INFO - Step 4225 -- ๐ Training Metrics | |
| 2025-08-29 02:55:29 - pico-train - INFO - โโโ Loss: 7.7106 | |
| 2025-08-29 02:55:29 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 | |
| 2025-08-29 02:55:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:55:42 - pico-train - INFO - Step 4250 -- ๐ Training Metrics | |
| 2025-08-29 02:55:42 - pico-train - INFO - โโโ Loss: 7.7174 | |
| 2025-08-29 02:55:42 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 | |
| 2025-08-29 02:55:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:55:55 - pico-train - INFO - Step 4275 -- ๐ Training Metrics | |
| 2025-08-29 02:55:55 - pico-train - INFO - โโโ Loss: 7.7508 | |
| 2025-08-29 02:55:55 - pico-train - INFO - โโโ Learning Rate: 2.67e-05 | |
| 2025-08-29 02:55:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:56:08 - pico-train - INFO - Step 4300 -- ๐ Training Metrics | |
| 2025-08-29 02:56:08 - pico-train - INFO - โโโ Loss: 7.6831 | |
| 2025-08-29 02:56:08 - pico-train - INFO - โโโ Learning Rate: 2.69e-05 | |
| 2025-08-29 02:56:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:56:21 - pico-train - INFO - Step 4325 -- ๐ Training Metrics | |
| 2025-08-29 02:56:21 - pico-train - INFO - โโโ Loss: 7.6498 | |
| 2025-08-29 02:56:21 - pico-train - INFO - โโโ Learning Rate: 2.70e-05 | |
| 2025-08-29 02:56:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:56:33 - pico-train - INFO - Step 4350 -- ๐ Training Metrics | |
| 2025-08-29 02:56:33 - pico-train - INFO - โโโ Loss: 7.6668 | |
| 2025-08-29 02:56:33 - pico-train - INFO - โโโ Learning Rate: 2.72e-05 | |
| 2025-08-29 02:56:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:56:46 - pico-train - INFO - Step 4375 -- ๐ Training Metrics | |
| 2025-08-29 02:56:46 - pico-train - INFO - โโโ Loss: 7.6852 | |
| 2025-08-29 02:56:46 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 | |
| 2025-08-29 02:56:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:56:59 - pico-train - INFO - Step 4400 -- ๐ Training Metrics | |
| 2025-08-29 02:56:59 - pico-train - INFO - โโโ Loss: 7.6469 | |
| 2025-08-29 02:56:59 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 | |
| 2025-08-29 02:56:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:57:12 - pico-train - INFO - Step 4425 -- ๐ Training Metrics | |
| 2025-08-29 02:57:12 - pico-train - INFO - โโโ Loss: 7.7448 | |
| 2025-08-29 02:57:12 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 | |
| 2025-08-29 02:57:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:57:25 - pico-train - INFO - Step 4450 -- ๐ Training Metrics | |
| 2025-08-29 02:57:25 - pico-train - INFO - โโโ Loss: 7.7422 | |
| 2025-08-29 02:57:25 - pico-train - INFO - โโโ Learning Rate: 2.78e-05 | |
| 2025-08-29 02:57:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:57:37 - pico-train - INFO - Step 4475 -- ๐ Training Metrics | |
| 2025-08-29 02:57:38 - pico-train - INFO - โโโ Loss: 7.6918 | |
| 2025-08-29 02:57:38 - pico-train - INFO - โโโ Learning Rate: 2.80e-05 | |
| 2025-08-29 02:57:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 02:57:50 - pico-train - INFO - Step 4500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:00:10 - pico-train - INFO - Step 4500 -- ๐ Evaluation Results | |
| 2025-08-29 03:00:10 - pico-train - INFO - โโโ paloma: 3.4524340411684053e+19 | |
| 2025-08-29 03:00:13 - pico-train - INFO - Step 4500 -- ๐ Training Metrics | |
| 2025-08-29 03:00:13 - pico-train - INFO - โโโ Loss: 7.7084 | |
| 2025-08-29 03:00:13 - pico-train - INFO - โโโ Learning Rate: 2.81e-05 | |
| 2025-08-29 03:00:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:00:13 - pico-train - INFO - Step 4500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:00:29 - pico-train - INFO - Step 4525 -- ๐ Training Metrics | |
| 2025-08-29 03:00:29 - pico-train - INFO - โโโ Loss: 7.7220 | |
| 2025-08-29 03:00:29 - pico-train - INFO - โโโ Learning Rate: 2.83e-05 | |
| 2025-08-29 03:00:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:00:41 - pico-train - INFO - Step 4550 -- ๐ Training Metrics | |
| 2025-08-29 03:00:41 - pico-train - INFO - โโโ Loss: 7.6893 | |
| 2025-08-29 03:00:41 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 | |
| 2025-08-29 03:00:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:00:54 - pico-train - INFO - Step 4575 -- ๐ Training Metrics | |
| 2025-08-29 03:00:54 - pico-train - INFO - โโโ Loss: 7.6454 | |
| 2025-08-29 03:00:54 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 | |
| 2025-08-29 03:00:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:01:07 - pico-train - INFO - Step 4600 -- ๐ Training Metrics | |
| 2025-08-29 03:01:07 - pico-train - INFO - โโโ Loss: 7.6298 | |
| 2025-08-29 03:01:07 - pico-train - INFO - โโโ Learning Rate: 2.87e-05 | |
| 2025-08-29 03:01:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:01:20 - pico-train - INFO - Step 4625 -- ๐ Training Metrics | |
| 2025-08-29 03:01:20 - pico-train - INFO - โโโ Loss: 7.6420 | |
| 2025-08-29 03:01:20 - pico-train - INFO - โโโ Learning Rate: 2.89e-05 | |
| 2025-08-29 03:01:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:01:32 - pico-train - INFO - Step 4650 -- ๐ Training Metrics | |
| 2025-08-29 03:01:32 - pico-train - INFO - โโโ Loss: 7.6247 | |
| 2025-08-29 03:01:32 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 | |
| 2025-08-29 03:01:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:01:45 - pico-train - INFO - Step 4675 -- ๐ Training Metrics | |
| 2025-08-29 03:01:45 - pico-train - INFO - โโโ Loss: 7.6448 | |
| 2025-08-29 03:01:45 - pico-train - INFO - โโโ Learning Rate: 2.92e-05 | |
| 2025-08-29 03:01:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:01:58 - pico-train - INFO - Step 4700 -- ๐ Training Metrics | |
| 2025-08-29 03:01:58 - pico-train - INFO - โโโ Loss: 7.6506 | |
| 2025-08-29 03:01:58 - pico-train - INFO - โโโ Learning Rate: 2.94e-05 | |
| 2025-08-29 03:01:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:02:11 - pico-train - INFO - Step 4725 -- ๐ Training Metrics | |
| 2025-08-29 03:02:11 - pico-train - INFO - โโโ Loss: 7.6356 | |
| 2025-08-29 03:02:11 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 | |
| 2025-08-29 03:02:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:02:24 - pico-train - INFO - Step 4750 -- ๐ Training Metrics | |
| 2025-08-29 03:02:24 - pico-train - INFO - โโโ Loss: 7.6426 | |
| 2025-08-29 03:02:24 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 | |
| 2025-08-29 03:02:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:02:36 - pico-train - INFO - Step 4775 -- ๐ Training Metrics | |
| 2025-08-29 03:02:36 - pico-train - INFO - โโโ Loss: 7.6388 | |
| 2025-08-29 03:02:36 - pico-train - INFO - โโโ Learning Rate: 2.98e-05 | |
| 2025-08-29 03:02:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:02:49 - pico-train - INFO - Step 4800 -- ๐ Training Metrics | |
| 2025-08-29 03:02:49 - pico-train - INFO - โโโ Loss: 7.5216 | |
| 2025-08-29 03:02:49 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 | |
| 2025-08-29 03:02:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:03:02 - pico-train - INFO - Step 4825 -- ๐ Training Metrics | |
| 2025-08-29 03:03:02 - pico-train - INFO - โโโ Loss: 7.5367 | |
| 2025-08-29 03:03:02 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 | |
| 2025-08-29 03:03:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:03:15 - pico-train - INFO - Step 4850 -- ๐ Training Metrics | |
| 2025-08-29 03:03:15 - pico-train - INFO - โโโ Loss: 7.5084 | |
| 2025-08-29 03:03:15 - pico-train - INFO - โโโ Learning Rate: 3.03e-05 | |
| 2025-08-29 03:03:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:03:28 - pico-train - INFO - Step 4875 -- ๐ Training Metrics | |
| 2025-08-29 03:03:28 - pico-train - INFO - โโโ Loss: 7.6092 | |
| 2025-08-29 03:03:28 - pico-train - INFO - โโโ Learning Rate: 3.05e-05 | |
| 2025-08-29 03:03:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:03:40 - pico-train - INFO - Step 4900 -- ๐ Training Metrics | |
| 2025-08-29 03:03:40 - pico-train - INFO - โโโ Loss: 7.5760 | |
| 2025-08-29 03:03:40 - pico-train - INFO - โโโ Learning Rate: 3.06e-05 | |
| 2025-08-29 03:03:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:03:53 - pico-train - INFO - Step 4925 -- ๐ Training Metrics | |
| 2025-08-29 03:03:53 - pico-train - INFO - โโโ Loss: 7.5686 | |
| 2025-08-29 03:03:53 - pico-train - INFO - โโโ Learning Rate: 3.08e-05 | |
| 2025-08-29 03:03:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:04:06 - pico-train - INFO - Step 4950 -- ๐ Training Metrics | |
| 2025-08-29 03:04:06 - pico-train - INFO - โโโ Loss: 7.5583 | |
| 2025-08-29 03:04:06 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 | |
| 2025-08-29 03:04:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:04:19 - pico-train - INFO - Step 4975 -- ๐ Training Metrics | |
| 2025-08-29 03:04:19 - pico-train - INFO - โโโ Loss: 7.5818 | |
| 2025-08-29 03:04:19 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 | |
| 2025-08-29 03:04:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:04:31 - pico-train - INFO - Step 5000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:06:27 - pico-train - INFO - Step 5000 -- ๐ Evaluation Results | |
| 2025-08-29 03:06:27 - pico-train - INFO - โโโ paloma: 2.320698426399461e+19 | |
| 2025-08-29 03:06:30 - pico-train - INFO - Step 5000 -- ๐ Training Metrics | |
| 2025-08-29 03:06:30 - pico-train - INFO - โโโ Loss: 7.6004 | |
| 2025-08-29 03:06:30 - pico-train - INFO - โโโ Learning Rate: 3.13e-05 | |
| 2025-08-29 03:06:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:06:30 - pico-train - INFO - Step 5000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:06:46 - pico-train - INFO - Step 5025 -- ๐ Training Metrics | |
| 2025-08-29 03:06:46 - pico-train - INFO - โโโ Loss: 7.5371 | |
| 2025-08-29 03:06:46 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 | |
| 2025-08-29 03:06:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:06:58 - pico-train - INFO - Step 5050 -- ๐ Training Metrics | |
| 2025-08-29 03:06:58 - pico-train - INFO - โโโ Loss: 7.5179 | |
| 2025-08-29 03:06:58 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 | |
| 2025-08-29 03:06:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:07:11 - pico-train - INFO - Step 5075 -- ๐ Training Metrics | |
| 2025-08-29 03:07:11 - pico-train - INFO - โโโ Loss: 7.5255 | |
| 2025-08-29 03:07:11 - pico-train - INFO - โโโ Learning Rate: 3.17e-05 | |
| 2025-08-29 03:07:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:07:24 - pico-train - INFO - Step 5100 -- ๐ Training Metrics | |
| 2025-08-29 03:07:24 - pico-train - INFO - โโโ Loss: 7.5155 | |
| 2025-08-29 03:07:24 - pico-train - INFO - โโโ Learning Rate: 3.19e-05 | |
| 2025-08-29 03:07:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:07:37 - pico-train - INFO - Step 5125 -- ๐ Training Metrics | |
| 2025-08-29 03:07:37 - pico-train - INFO - โโโ Loss: 7.5660 | |
| 2025-08-29 03:07:37 - pico-train - INFO - โโโ Learning Rate: 3.20e-05 | |
| 2025-08-29 03:07:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:07:50 - pico-train - INFO - Step 5150 -- ๐ Training Metrics | |
| 2025-08-29 03:07:50 - pico-train - INFO - โโโ Loss: 7.4797 | |
| 2025-08-29 03:07:50 - pico-train - INFO - โโโ Learning Rate: 3.22e-05 | |
| 2025-08-29 03:07:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:08:02 - pico-train - INFO - Step 5175 -- ๐ Training Metrics | |
| 2025-08-29 03:08:02 - pico-train - INFO - โโโ Loss: 7.6224 | |
| 2025-08-29 03:08:02 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 | |
| 2025-08-29 03:08:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:08:15 - pico-train - INFO - Step 5200 -- ๐ Training Metrics | |
| 2025-08-29 03:08:15 - pico-train - INFO - โโโ Loss: 7.4821 | |
| 2025-08-29 03:08:15 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 | |
| 2025-08-29 03:08:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:08:28 - pico-train - INFO - Step 5225 -- ๐ Training Metrics | |
| 2025-08-29 03:08:28 - pico-train - INFO - โโโ Loss: 7.4765 | |
| 2025-08-29 03:08:28 - pico-train - INFO - โโโ Learning Rate: 3.27e-05 | |
| 2025-08-29 03:08:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:08:41 - pico-train - INFO - Step 5250 -- ๐ Training Metrics | |
| 2025-08-29 03:08:41 - pico-train - INFO - โโโ Loss: 7.4680 | |
| 2025-08-29 03:08:41 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 | |
| 2025-08-29 03:08:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:08:54 - pico-train - INFO - Step 5275 -- ๐ Training Metrics | |
| 2025-08-29 03:08:54 - pico-train - INFO - โโโ Loss: 7.5165 | |
| 2025-08-29 03:08:54 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 | |
| 2025-08-29 03:08:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:09:06 - pico-train - INFO - Step 5300 -- ๐ Training Metrics | |
| 2025-08-29 03:09:06 - pico-train - INFO - โโโ Loss: 7.5334 | |
| 2025-08-29 03:09:06 - pico-train - INFO - โโโ Learning Rate: 3.31e-05 | |
| 2025-08-29 03:09:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:09:19 - pico-train - INFO - Step 5325 -- ๐ Training Metrics | |
| 2025-08-29 03:09:19 - pico-train - INFO - โโโ Loss: 7.5053 | |
| 2025-08-29 03:09:19 - pico-train - INFO - โโโ Learning Rate: 3.33e-05 | |
| 2025-08-29 03:09:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:09:32 - pico-train - INFO - Step 5350 -- ๐ Training Metrics | |
| 2025-08-29 03:09:32 - pico-train - INFO - โโโ Loss: 7.5115 | |
| 2025-08-29 03:09:32 - pico-train - INFO - โโโ Learning Rate: 3.34e-05 | |
| 2025-08-29 03:09:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:09:45 - pico-train - INFO - Step 5375 -- ๐ Training Metrics | |
| 2025-08-29 03:09:45 - pico-train - INFO - โโโ Loss: 7.4736 | |
| 2025-08-29 03:09:45 - pico-train - INFO - โโโ Learning Rate: 3.36e-05 | |
| 2025-08-29 03:09:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:09:57 - pico-train - INFO - Step 5400 -- ๐ Training Metrics | |
| 2025-08-29 03:09:57 - pico-train - INFO - โโโ Loss: 7.4520 | |
| 2025-08-29 03:09:57 - pico-train - INFO - โโโ Learning Rate: 3.38e-05 | |
| 2025-08-29 03:09:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:10:10 - pico-train - INFO - Step 5425 -- ๐ Training Metrics | |
| 2025-08-29 03:10:10 - pico-train - INFO - โโโ Loss: 7.4596 | |
| 2025-08-29 03:10:10 - pico-train - INFO - โโโ Learning Rate: 3.39e-05 | |
| 2025-08-29 03:10:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:10:23 - pico-train - INFO - Step 5450 -- ๐ Training Metrics | |
| 2025-08-29 03:10:23 - pico-train - INFO - โโโ Loss: 7.4518 | |
| 2025-08-29 03:10:23 - pico-train - INFO - โโโ Learning Rate: 3.41e-05 | |
| 2025-08-29 03:10:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:10:36 - pico-train - INFO - Step 5475 -- ๐ Training Metrics | |
| 2025-08-29 03:10:36 - pico-train - INFO - โโโ Loss: 7.4308 | |
| 2025-08-29 03:10:36 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 | |
| 2025-08-29 03:10:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:10:48 - pico-train - INFO - Step 5500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:12:44 - pico-train - INFO - Step 5500 -- ๐ Evaluation Results | |
| 2025-08-29 03:12:44 - pico-train - INFO - โโโ paloma: 3.1834097890526753e+19 | |
| 2025-08-29 03:12:46 - pico-train - INFO - Step 5500 -- ๐ Training Metrics | |
| 2025-08-29 03:12:46 - pico-train - INFO - โโโ Loss: 7.4627 | |
| 2025-08-29 03:12:46 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 | |
| 2025-08-29 03:12:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:12:46 - pico-train - INFO - Step 5500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:13:02 - pico-train - INFO - Step 5525 -- ๐ Training Metrics | |
| 2025-08-29 03:13:02 - pico-train - INFO - โโโ Loss: 7.4095 | |
| 2025-08-29 03:13:02 - pico-train - INFO - โโโ Learning Rate: 3.45e-05 | |
| 2025-08-29 03:13:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:13:14 - pico-train - INFO - Step 5550 -- ๐ Training Metrics | |
| 2025-08-29 03:13:14 - pico-train - INFO - โโโ Loss: 7.4423 | |
| 2025-08-29 03:13:14 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 | |
| 2025-08-29 03:13:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:13:27 - pico-train - INFO - Step 5575 -- ๐ Training Metrics | |
| 2025-08-29 03:13:27 - pico-train - INFO - โโโ Loss: 7.4600 | |
| 2025-08-29 03:13:27 - pico-train - INFO - โโโ Learning Rate: 3.48e-05 | |
| 2025-08-29 03:13:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:13:40 - pico-train - INFO - Step 5600 -- ๐ Training Metrics | |
| 2025-08-29 03:13:40 - pico-train - INFO - โโโ Loss: 7.3457 | |
| 2025-08-29 03:13:40 - pico-train - INFO - โโโ Learning Rate: 3.50e-05 | |
| 2025-08-29 03:13:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:13:53 - pico-train - INFO - Step 5625 -- ๐ Training Metrics | |
| 2025-08-29 03:13:53 - pico-train - INFO - โโโ Loss: 7.4838 | |
| 2025-08-29 03:13:53 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 | |
| 2025-08-29 03:13:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:14:06 - pico-train - INFO - Step 5650 -- ๐ Training Metrics | |
| 2025-08-29 03:14:06 - pico-train - INFO - โโโ Loss: 7.4556 | |
| 2025-08-29 03:14:06 - pico-train - INFO - โโโ Learning Rate: 3.53e-05 | |
| 2025-08-29 03:14:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:14:19 - pico-train - INFO - Step 5675 -- ๐ Training Metrics | |
| 2025-08-29 03:14:19 - pico-train - INFO - โโโ Loss: 7.4220 | |
| 2025-08-29 03:14:19 - pico-train - INFO - โโโ Learning Rate: 3.55e-05 | |
| 2025-08-29 03:14:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:14:31 - pico-train - INFO - Step 5700 -- ๐ Training Metrics | |
| 2025-08-29 03:14:31 - pico-train - INFO - โโโ Loss: 7.4307 | |
| 2025-08-29 03:14:31 - pico-train - INFO - โโโ Learning Rate: 3.56e-05 | |
| 2025-08-29 03:14:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:14:44 - pico-train - INFO - Step 5725 -- ๐ Training Metrics | |
| 2025-08-29 03:14:44 - pico-train - INFO - โโโ Loss: 7.3795 | |
| 2025-08-29 03:14:44 - pico-train - INFO - โโโ Learning Rate: 3.58e-05 | |
| 2025-08-29 03:14:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:14:57 - pico-train - INFO - Step 5750 -- ๐ Training Metrics | |
| 2025-08-29 03:14:57 - pico-train - INFO - โโโ Loss: 7.3855 | |
| 2025-08-29 03:14:57 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 | |
| 2025-08-29 03:14:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:15:10 - pico-train - INFO - Step 5775 -- ๐ Training Metrics | |
| 2025-08-29 03:15:10 - pico-train - INFO - โโโ Loss: 7.3518 | |
| 2025-08-29 03:15:10 - pico-train - INFO - โโโ Learning Rate: 3.61e-05 | |
| 2025-08-29 03:15:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:15:23 - pico-train - INFO - Step 5800 -- ๐ Training Metrics | |
| 2025-08-29 03:15:23 - pico-train - INFO - โโโ Loss: 7.3794 | |
| 2025-08-29 03:15:23 - pico-train - INFO - โโโ Learning Rate: 3.63e-05 | |
| 2025-08-29 03:15:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:15:35 - pico-train - INFO - Step 5825 -- ๐ Training Metrics | |
| 2025-08-29 03:15:35 - pico-train - INFO - โโโ Loss: 7.3591 | |
| 2025-08-29 03:15:35 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 | |
| 2025-08-29 03:15:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:15:48 - pico-train - INFO - Step 5850 -- ๐ Training Metrics | |
| 2025-08-29 03:15:48 - pico-train - INFO - โโโ Loss: 7.3489 | |
| 2025-08-29 03:15:48 - pico-train - INFO - โโโ Learning Rate: 3.66e-05 | |
| 2025-08-29 03:15:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:16:01 - pico-train - INFO - Step 5875 -- ๐ Training Metrics | |
| 2025-08-29 03:16:01 - pico-train - INFO - โโโ Loss: 7.4108 | |
| 2025-08-29 03:16:01 - pico-train - INFO - โโโ Learning Rate: 3.67e-05 | |
| 2025-08-29 03:16:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:16:14 - pico-train - INFO - Step 5900 -- ๐ Training Metrics | |
| 2025-08-29 03:16:14 - pico-train - INFO - โโโ Loss: 7.3580 | |
| 2025-08-29 03:16:14 - pico-train - INFO - โโโ Learning Rate: 3.69e-05 | |
| 2025-08-29 03:16:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:16:26 - pico-train - INFO - Step 5925 -- ๐ Training Metrics | |
| 2025-08-29 03:16:26 - pico-train - INFO - โโโ Loss: 7.3131 | |
| 2025-08-29 03:16:26 - pico-train - INFO - โโโ Learning Rate: 3.70e-05 | |
| 2025-08-29 03:16:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:16:39 - pico-train - INFO - Step 5950 -- ๐ Training Metrics | |
| 2025-08-29 03:16:39 - pico-train - INFO - โโโ Loss: 7.2905 | |
| 2025-08-29 03:16:39 - pico-train - INFO - โโโ Learning Rate: 3.72e-05 | |
| 2025-08-29 03:16:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:16:52 - pico-train - INFO - Step 5975 -- ๐ Training Metrics | |
| 2025-08-29 03:16:52 - pico-train - INFO - โโโ Loss: 7.3466 | |
| 2025-08-29 03:16:52 - pico-train - INFO - โโโ Learning Rate: 3.73e-05 | |
| 2025-08-29 03:16:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:17:04 - pico-train - INFO - Step 6000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:19:19 - pico-train - INFO - Step 6000 -- ๐ Evaluation Results | |
| 2025-08-29 03:19:19 - pico-train - INFO - โโโ paloma: 4.457139025979801e+19 | |
| 2025-08-29 03:19:20 - pico-train - INFO - Step 6000 -- ๐ Training Metrics | |
| 2025-08-29 03:19:20 - pico-train - INFO - โโโ Loss: 7.3765 | |
| 2025-08-29 03:19:20 - pico-train - INFO - โโโ Learning Rate: 3.75e-05 | |
| 2025-08-29 03:19:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:19:20 - pico-train - INFO - Step 6000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:19:36 - pico-train - INFO - Step 6025 -- ๐ Training Metrics | |
| 2025-08-29 03:19:36 - pico-train - INFO - โโโ Loss: 7.2870 | |
| 2025-08-29 03:19:36 - pico-train - INFO - โโโ Learning Rate: 3.77e-05 | |
| 2025-08-29 03:19:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:19:49 - pico-train - INFO - Step 6050 -- ๐ Training Metrics | |
| 2025-08-29 03:19:49 - pico-train - INFO - โโโ Loss: 7.3333 | |
| 2025-08-29 03:19:49 - pico-train - INFO - โโโ Learning Rate: 3.78e-05 | |
| 2025-08-29 03:19:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:20:01 - pico-train - INFO - Step 6075 -- ๐ Training Metrics | |
| 2025-08-29 03:20:01 - pico-train - INFO - โโโ Loss: 7.3098 | |
| 2025-08-29 03:20:01 - pico-train - INFO - โโโ Learning Rate: 3.80e-05 | |
| 2025-08-29 03:20:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:20:14 - pico-train - INFO - Step 6100 -- ๐ Training Metrics | |
| 2025-08-29 03:20:14 - pico-train - INFO - โโโ Loss: 7.2594 | |
| 2025-08-29 03:20:14 - pico-train - INFO - โโโ Learning Rate: 3.81e-05 | |
| 2025-08-29 03:20:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:20:27 - pico-train - INFO - Step 6125 -- ๐ Training Metrics | |
| 2025-08-29 03:20:27 - pico-train - INFO - โโโ Loss: 7.3327 | |
| 2025-08-29 03:20:27 - pico-train - INFO - โโโ Learning Rate: 3.83e-05 | |
| 2025-08-29 03:20:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:20:40 - pico-train - INFO - Step 6150 -- ๐ Training Metrics | |
| 2025-08-29 03:20:40 - pico-train - INFO - โโโ Loss: 7.3030 | |
| 2025-08-29 03:20:40 - pico-train - INFO - โโโ Learning Rate: 3.84e-05 | |
| 2025-08-29 03:20:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:20:53 - pico-train - INFO - Step 6175 -- ๐ Training Metrics | |
| 2025-08-29 03:20:53 - pico-train - INFO - โโโ Loss: 7.2523 | |
| 2025-08-29 03:20:53 - pico-train - INFO - โโโ Learning Rate: 3.86e-05 | |
| 2025-08-29 03:20:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:21:06 - pico-train - INFO - Step 6200 -- ๐ Training Metrics | |
| 2025-08-29 03:21:06 - pico-train - INFO - โโโ Loss: 7.2546 | |
| 2025-08-29 03:21:06 - pico-train - INFO - โโโ Learning Rate: 3.87e-05 | |
| 2025-08-29 03:21:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:21:18 - pico-train - INFO - Step 6225 -- ๐ Training Metrics | |
| 2025-08-29 03:21:18 - pico-train - INFO - โโโ Loss: 7.3242 | |
| 2025-08-29 03:21:18 - pico-train - INFO - โโโ Learning Rate: 3.89e-05 | |
| 2025-08-29 03:21:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:21:31 - pico-train - INFO - Step 6250 -- ๐ Training Metrics | |
| 2025-08-29 03:21:31 - pico-train - INFO - โโโ Loss: 7.2035 | |
| 2025-08-29 03:21:31 - pico-train - INFO - โโโ Learning Rate: 3.91e-05 | |
| 2025-08-29 03:21:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:21:44 - pico-train - INFO - Step 6275 -- ๐ Training Metrics | |
| 2025-08-29 03:21:44 - pico-train - INFO - โโโ Loss: 7.2334 | |
| 2025-08-29 03:21:44 - pico-train - INFO - โโโ Learning Rate: 3.92e-05 | |
| 2025-08-29 03:21:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:21:57 - pico-train - INFO - Step 6300 -- ๐ Training Metrics | |
| 2025-08-29 03:21:57 - pico-train - INFO - โโโ Loss: 7.2295 | |
| 2025-08-29 03:21:57 - pico-train - INFO - โโโ Learning Rate: 3.94e-05 | |
| 2025-08-29 03:21:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:22:10 - pico-train - INFO - Step 6325 -- ๐ Training Metrics | |
| 2025-08-29 03:22:10 - pico-train - INFO - โโโ Loss: 7.3051 | |
| 2025-08-29 03:22:10 - pico-train - INFO - โโโ Learning Rate: 3.95e-05 | |
| 2025-08-29 03:22:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:22:22 - pico-train - INFO - Step 6350 -- ๐ Training Metrics | |
| 2025-08-29 03:22:22 - pico-train - INFO - โโโ Loss: 7.3188 | |
| 2025-08-29 03:22:22 - pico-train - INFO - โโโ Learning Rate: 3.97e-05 | |
| 2025-08-29 03:22:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:22:35 - pico-train - INFO - Step 6375 -- ๐ Training Metrics | |
| 2025-08-29 03:22:35 - pico-train - INFO - โโโ Loss: 7.3212 | |
| 2025-08-29 03:22:35 - pico-train - INFO - โโโ Learning Rate: 3.98e-05 | |
| 2025-08-29 03:22:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:22:48 - pico-train - INFO - Step 6400 -- ๐ Training Metrics | |
| 2025-08-29 03:22:48 - pico-train - INFO - โโโ Loss: 7.2465 | |
| 2025-08-29 03:22:48 - pico-train - INFO - โโโ Learning Rate: 4.00e-05 | |
| 2025-08-29 03:22:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:23:01 - pico-train - INFO - Step 6425 -- ๐ Training Metrics | |
| 2025-08-29 03:23:01 - pico-train - INFO - โโโ Loss: 7.2081 | |
| 2025-08-29 03:23:01 - pico-train - INFO - โโโ Learning Rate: 4.02e-05 | |
| 2025-08-29 03:23:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:23:14 - pico-train - INFO - Step 6450 -- ๐ Training Metrics | |
| 2025-08-29 03:23:14 - pico-train - INFO - โโโ Loss: 7.2852 | |
| 2025-08-29 03:23:14 - pico-train - INFO - โโโ Learning Rate: 4.03e-05 | |
| 2025-08-29 03:23:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:23:26 - pico-train - INFO - Step 6475 -- ๐ Training Metrics | |
| 2025-08-29 03:23:26 - pico-train - INFO - โโโ Loss: 7.2074 | |
| 2025-08-29 03:23:26 - pico-train - INFO - โโโ Learning Rate: 4.05e-05 | |
| 2025-08-29 03:23:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:23:39 - pico-train - INFO - Step 6500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:25:35 - pico-train - INFO - Step 6500 -- ๐ Evaluation Results | |
| 2025-08-29 03:25:35 - pico-train - INFO - โโโ paloma: 7.3062353841856406e+19 | |
| 2025-08-29 03:25:37 - pico-train - INFO - Step 6500 -- ๐ Training Metrics | |
| 2025-08-29 03:25:37 - pico-train - INFO - โโโ Loss: 7.2520 | |
| 2025-08-29 03:25:37 - pico-train - INFO - โโโ Learning Rate: 4.06e-05 | |
| 2025-08-29 03:25:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:25:37 - pico-train - INFO - Step 6500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:25:52 - pico-train - INFO - Step 6525 -- ๐ Training Metrics | |
| 2025-08-29 03:25:52 - pico-train - INFO - โโโ Loss: 7.2115 | |
| 2025-08-29 03:25:52 - pico-train - INFO - โโโ Learning Rate: 4.08e-05 | |
| 2025-08-29 03:25:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:26:05 - pico-train - INFO - Step 6550 -- ๐ Training Metrics | |
| 2025-08-29 03:26:05 - pico-train - INFO - โโโ Loss: 7.2435 | |
| 2025-08-29 03:26:05 - pico-train - INFO - โโโ Learning Rate: 4.09e-05 | |
| 2025-08-29 03:26:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:26:18 - pico-train - INFO - Step 6575 -- ๐ Training Metrics | |
| 2025-08-29 03:26:18 - pico-train - INFO - โโโ Loss: 7.1962 | |
| 2025-08-29 03:26:18 - pico-train - INFO - โโโ Learning Rate: 4.11e-05 | |
| 2025-08-29 03:26:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:26:30 - pico-train - INFO - Step 6600 -- ๐ Training Metrics | |
| 2025-08-29 03:26:30 - pico-train - INFO - โโโ Loss: 7.1631 | |
| 2025-08-29 03:26:30 - pico-train - INFO - โโโ Learning Rate: 4.12e-05 | |
| 2025-08-29 03:26:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:26:43 - pico-train - INFO - Step 6625 -- ๐ Training Metrics | |
| 2025-08-29 03:26:43 - pico-train - INFO - โโโ Loss: 7.2525 | |
| 2025-08-29 03:26:43 - pico-train - INFO - โโโ Learning Rate: 4.14e-05 | |
| 2025-08-29 03:26:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:26:56 - pico-train - INFO - Step 6650 -- ๐ Training Metrics | |
| 2025-08-29 03:26:56 - pico-train - INFO - โโโ Loss: 7.2133 | |
| 2025-08-29 03:26:56 - pico-train - INFO - โโโ Learning Rate: 4.16e-05 | |
| 2025-08-29 03:26:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:27:09 - pico-train - INFO - Step 6675 -- ๐ Training Metrics | |
| 2025-08-29 03:27:09 - pico-train - INFO - โโโ Loss: 7.2248 | |
| 2025-08-29 03:27:09 - pico-train - INFO - โโโ Learning Rate: 4.17e-05 | |
| 2025-08-29 03:27:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:27:22 - pico-train - INFO - Step 6700 -- ๐ Training Metrics | |
| 2025-08-29 03:27:22 - pico-train - INFO - โโโ Loss: 7.1928 | |
| 2025-08-29 03:27:22 - pico-train - INFO - โโโ Learning Rate: 4.19e-05 | |
| 2025-08-29 03:27:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:27:34 - pico-train - INFO - Step 6725 -- ๐ Training Metrics | |
| 2025-08-29 03:27:34 - pico-train - INFO - โโโ Loss: 7.1698 | |
| 2025-08-29 03:27:34 - pico-train - INFO - โโโ Learning Rate: 4.20e-05 | |
| 2025-08-29 03:27:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:27:47 - pico-train - INFO - Step 6750 -- ๐ Training Metrics | |
| 2025-08-29 03:27:47 - pico-train - INFO - โโโ Loss: 7.3037 | |
| 2025-08-29 03:27:47 - pico-train - INFO - โโโ Learning Rate: 4.22e-05 | |
| 2025-08-29 03:27:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:28:00 - pico-train - INFO - Step 6775 -- ๐ Training Metrics | |
| 2025-08-29 03:28:00 - pico-train - INFO - โโโ Loss: 7.2451 | |
| 2025-08-29 03:28:00 - pico-train - INFO - โโโ Learning Rate: 4.23e-05 | |
| 2025-08-29 03:28:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:28:13 - pico-train - INFO - Step 6800 -- ๐ Training Metrics | |
| 2025-08-29 03:28:13 - pico-train - INFO - โโโ Loss: 7.1373 | |
| 2025-08-29 03:28:13 - pico-train - INFO - โโโ Learning Rate: 4.25e-05 | |
| 2025-08-29 03:28:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:28:25 - pico-train - INFO - Step 6825 -- ๐ Training Metrics | |
| 2025-08-29 03:28:25 - pico-train - INFO - โโโ Loss: 7.1390 | |
| 2025-08-29 03:28:25 - pico-train - INFO - โโโ Learning Rate: 4.27e-05 | |
| 2025-08-29 03:28:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:28:38 - pico-train - INFO - Step 6850 -- ๐ Training Metrics | |
| 2025-08-29 03:28:38 - pico-train - INFO - โโโ Loss: 7.1296 | |
| 2025-08-29 03:28:38 - pico-train - INFO - โโโ Learning Rate: 4.28e-05 | |
| 2025-08-29 03:28:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:28:51 - pico-train - INFO - Step 6875 -- ๐ Training Metrics | |
| 2025-08-29 03:28:51 - pico-train - INFO - โโโ Loss: 7.0961 | |
| 2025-08-29 03:28:51 - pico-train - INFO - โโโ Learning Rate: 4.30e-05 | |
| 2025-08-29 03:28:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:29:04 - pico-train - INFO - Step 6900 -- ๐ Training Metrics | |
| 2025-08-29 03:29:04 - pico-train - INFO - โโโ Loss: 7.1408 | |
| 2025-08-29 03:29:04 - pico-train - INFO - โโโ Learning Rate: 4.31e-05 | |
| 2025-08-29 03:29:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:29:16 - pico-train - INFO - Step 6925 -- ๐ Training Metrics | |
| 2025-08-29 03:29:16 - pico-train - INFO - โโโ Loss: 7.1852 | |
| 2025-08-29 03:29:16 - pico-train - INFO - โโโ Learning Rate: 4.33e-05 | |
| 2025-08-29 03:29:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:29:29 - pico-train - INFO - Step 6950 -- ๐ Training Metrics | |
| 2025-08-29 03:29:29 - pico-train - INFO - โโโ Loss: 7.2067 | |
| 2025-08-29 03:29:29 - pico-train - INFO - โโโ Learning Rate: 4.34e-05 | |
| 2025-08-29 03:29:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:29:42 - pico-train - INFO - Step 6975 -- ๐ Training Metrics | |
| 2025-08-29 03:29:42 - pico-train - INFO - โโโ Loss: 7.0681 | |
| 2025-08-29 03:29:42 - pico-train - INFO - โโโ Learning Rate: 4.36e-05 | |
| 2025-08-29 03:29:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:29:54 - pico-train - INFO - Step 7000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:31:56 - pico-train - INFO - Step 7000 -- ๐ Evaluation Results | |
| 2025-08-29 03:31:56 - pico-train - INFO - โโโ paloma: 1.2357969480287024e+20 | |
| 2025-08-29 03:31:58 - pico-train - INFO - Step 7000 -- ๐ Training Metrics | |
| 2025-08-29 03:31:58 - pico-train - INFO - โโโ Loss: 7.1813 | |
| 2025-08-29 03:31:58 - pico-train - INFO - โโโ Learning Rate: 4.37e-05 | |
| 2025-08-29 03:31:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:31:58 - pico-train - INFO - Step 7000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:32:14 - pico-train - INFO - Step 7025 -- ๐ Training Metrics | |
| 2025-08-29 03:32:14 - pico-train - INFO - โโโ Loss: 7.1992 | |
| 2025-08-29 03:32:14 - pico-train - INFO - โโโ Learning Rate: 4.39e-05 | |
| 2025-08-29 03:32:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:32:27 - pico-train - INFO - Step 7050 -- ๐ Training Metrics | |
| 2025-08-29 03:32:27 - pico-train - INFO - โโโ Loss: 7.1409 | |
| 2025-08-29 03:32:27 - pico-train - INFO - โโโ Learning Rate: 4.41e-05 | |
| 2025-08-29 03:32:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:32:40 - pico-train - INFO - Step 7075 -- ๐ Training Metrics | |
| 2025-08-29 03:32:40 - pico-train - INFO - โโโ Loss: 7.1271 | |
| 2025-08-29 03:32:40 - pico-train - INFO - โโโ Learning Rate: 4.42e-05 | |
| 2025-08-29 03:32:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:32:52 - pico-train - INFO - Step 7100 -- ๐ Training Metrics | |
| 2025-08-29 03:32:52 - pico-train - INFO - โโโ Loss: 7.1720 | |
| 2025-08-29 03:32:52 - pico-train - INFO - โโโ Learning Rate: 4.44e-05 | |
| 2025-08-29 03:32:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:33:06 - pico-train - INFO - Step 7125 -- ๐ Training Metrics | |
| 2025-08-29 03:33:06 - pico-train - INFO - โโโ Loss: 7.1515 | |
| 2025-08-29 03:33:06 - pico-train - INFO - โโโ Learning Rate: 4.45e-05 | |
| 2025-08-29 03:33:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:33:19 - pico-train - INFO - Step 7150 -- ๐ Training Metrics | |
| 2025-08-29 03:33:19 - pico-train - INFO - โโโ Loss: 7.0898 | |
| 2025-08-29 03:33:19 - pico-train - INFO - โโโ Learning Rate: 4.47e-05 | |
| 2025-08-29 03:33:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:33:31 - pico-train - INFO - Step 7175 -- ๐ Training Metrics | |
| 2025-08-29 03:33:31 - pico-train - INFO - โโโ Loss: 7.0996 | |
| 2025-08-29 03:33:31 - pico-train - INFO - โโโ Learning Rate: 4.48e-05 | |
| 2025-08-29 03:33:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:33:44 - pico-train - INFO - Step 7200 -- ๐ Training Metrics | |
| 2025-08-29 03:33:44 - pico-train - INFO - โโโ Loss: 7.0610 | |
| 2025-08-29 03:33:44 - pico-train - INFO - โโโ Learning Rate: 4.50e-05 | |
| 2025-08-29 03:33:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:33:57 - pico-train - INFO - Step 7225 -- ๐ Training Metrics | |
| 2025-08-29 03:33:57 - pico-train - INFO - โโโ Loss: 7.1939 | |
| 2025-08-29 03:33:57 - pico-train - INFO - โโโ Learning Rate: 4.52e-05 | |
| 2025-08-29 03:33:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:34:10 - pico-train - INFO - Step 7250 -- ๐ Training Metrics | |
| 2025-08-29 03:34:10 - pico-train - INFO - โโโ Loss: 7.0355 | |
| 2025-08-29 03:34:10 - pico-train - INFO - โโโ Learning Rate: 4.53e-05 | |
| 2025-08-29 03:34:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:34:22 - pico-train - INFO - Step 7275 -- ๐ Training Metrics | |
| 2025-08-29 03:34:22 - pico-train - INFO - โโโ Loss: 7.0935 | |
| 2025-08-29 03:34:22 - pico-train - INFO - โโโ Learning Rate: 4.55e-05 | |
| 2025-08-29 03:34:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:34:35 - pico-train - INFO - Step 7300 -- ๐ Training Metrics | |
| 2025-08-29 03:34:35 - pico-train - INFO - โโโ Loss: 7.0689 | |
| 2025-08-29 03:34:35 - pico-train - INFO - โโโ Learning Rate: 4.56e-05 | |
| 2025-08-29 03:34:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:34:48 - pico-train - INFO - Step 7325 -- ๐ Training Metrics | |
| 2025-08-29 03:34:48 - pico-train - INFO - โโโ Loss: 7.0265 | |
| 2025-08-29 03:34:48 - pico-train - INFO - โโโ Learning Rate: 4.58e-05 | |
| 2025-08-29 03:34:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:35:01 - pico-train - INFO - Step 7350 -- ๐ Training Metrics | |
| 2025-08-29 03:35:01 - pico-train - INFO - โโโ Loss: 7.0963 | |
| 2025-08-29 03:35:01 - pico-train - INFO - โโโ Learning Rate: 4.59e-05 | |
| 2025-08-29 03:35:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:35:14 - pico-train - INFO - Step 7375 -- ๐ Training Metrics | |
| 2025-08-29 03:35:14 - pico-train - INFO - โโโ Loss: 7.1138 | |
| 2025-08-29 03:35:14 - pico-train - INFO - โโโ Learning Rate: 4.61e-05 | |
| 2025-08-29 03:35:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:35:26 - pico-train - INFO - Step 7400 -- ๐ Training Metrics | |
| 2025-08-29 03:35:26 - pico-train - INFO - โโโ Loss: 7.0414 | |
| 2025-08-29 03:35:26 - pico-train - INFO - โโโ Learning Rate: 4.63e-05 | |
| 2025-08-29 03:35:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:35:39 - pico-train - INFO - Step 7425 -- ๐ Training Metrics | |
| 2025-08-29 03:35:39 - pico-train - INFO - โโโ Loss: 7.0753 | |
| 2025-08-29 03:35:39 - pico-train - INFO - โโโ Learning Rate: 4.64e-05 | |
| 2025-08-29 03:35:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:35:52 - pico-train - INFO - Step 7450 -- ๐ Training Metrics | |
| 2025-08-29 03:35:52 - pico-train - INFO - โโโ Loss: 7.0603 | |
| 2025-08-29 03:35:52 - pico-train - INFO - โโโ Learning Rate: 4.66e-05 | |
| 2025-08-29 03:35:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:36:04 - pico-train - INFO - Step 7475 -- ๐ Training Metrics | |
| 2025-08-29 03:36:04 - pico-train - INFO - โโโ Loss: 7.0818 | |
| 2025-08-29 03:36:04 - pico-train - INFO - โโโ Learning Rate: 4.67e-05 | |
| 2025-08-29 03:36:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:36:17 - pico-train - INFO - Step 7500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:38:12 - pico-train - INFO - Step 7500 -- ๐ Evaluation Results | |
| 2025-08-29 03:38:12 - pico-train - INFO - โโโ paloma: 2.7199371732053928e+20 | |
| 2025-08-29 03:38:14 - pico-train - INFO - Step 7500 -- ๐ Training Metrics | |
| 2025-08-29 03:38:14 - pico-train - INFO - โโโ Loss: 7.0788 | |
| 2025-08-29 03:38:14 - pico-train - INFO - โโโ Learning Rate: 4.69e-05 | |
| 2025-08-29 03:38:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:38:14 - pico-train - INFO - Step 7500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:38:29 - pico-train - INFO - Step 7525 -- ๐ Training Metrics | |
| 2025-08-29 03:38:29 - pico-train - INFO - โโโ Loss: 6.9952 | |
| 2025-08-29 03:38:29 - pico-train - INFO - โโโ Learning Rate: 4.70e-05 | |
| 2025-08-29 03:38:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:38:42 - pico-train - INFO - Step 7550 -- ๐ Training Metrics | |
| 2025-08-29 03:38:42 - pico-train - INFO - โโโ Loss: 7.0114 | |
| 2025-08-29 03:38:42 - pico-train - INFO - โโโ Learning Rate: 4.72e-05 | |
| 2025-08-29 03:38:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:38:55 - pico-train - INFO - Step 7575 -- ๐ Training Metrics | |
| 2025-08-29 03:38:55 - pico-train - INFO - โโโ Loss: 7.0611 | |
| 2025-08-29 03:38:55 - pico-train - INFO - โโโ Learning Rate: 4.73e-05 | |
| 2025-08-29 03:38:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:39:07 - pico-train - INFO - Step 7600 -- ๐ Training Metrics | |
| 2025-08-29 03:39:07 - pico-train - INFO - โโโ Loss: 7.0057 | |
| 2025-08-29 03:39:07 - pico-train - INFO - โโโ Learning Rate: 4.75e-05 | |
| 2025-08-29 03:39:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:39:21 - pico-train - INFO - Step 7625 -- ๐ Training Metrics | |
| 2025-08-29 03:39:21 - pico-train - INFO - โโโ Loss: 7.0182 | |
| 2025-08-29 03:39:21 - pico-train - INFO - โโโ Learning Rate: 4.77e-05 | |
| 2025-08-29 03:39:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:39:33 - pico-train - INFO - Step 7650 -- ๐ Training Metrics | |
| 2025-08-29 03:39:33 - pico-train - INFO - โโโ Loss: 7.0271 | |
| 2025-08-29 03:39:33 - pico-train - INFO - โโโ Learning Rate: 4.78e-05 | |
| 2025-08-29 03:39:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:39:46 - pico-train - INFO - Step 7675 -- ๐ Training Metrics | |
| 2025-08-29 03:39:46 - pico-train - INFO - โโโ Loss: 7.0817 | |
| 2025-08-29 03:39:46 - pico-train - INFO - โโโ Learning Rate: 4.80e-05 | |
| 2025-08-29 03:39:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:39:59 - pico-train - INFO - Step 7700 -- ๐ Training Metrics | |
| 2025-08-29 03:39:59 - pico-train - INFO - โโโ Loss: 7.0859 | |
| 2025-08-29 03:39:59 - pico-train - INFO - โโโ Learning Rate: 4.81e-05 | |
| 2025-08-29 03:39:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:40:12 - pico-train - INFO - Step 7725 -- ๐ Training Metrics | |
| 2025-08-29 03:40:12 - pico-train - INFO - โโโ Loss: 6.9859 | |
| 2025-08-29 03:40:12 - pico-train - INFO - โโโ Learning Rate: 4.83e-05 | |
| 2025-08-29 03:40:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:40:24 - pico-train - INFO - Step 7750 -- ๐ Training Metrics | |
| 2025-08-29 03:40:24 - pico-train - INFO - โโโ Loss: 7.0380 | |
| 2025-08-29 03:40:24 - pico-train - INFO - โโโ Learning Rate: 4.84e-05 | |
| 2025-08-29 03:40:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:40:37 - pico-train - INFO - Step 7775 -- ๐ Training Metrics | |
| 2025-08-29 03:40:37 - pico-train - INFO - โโโ Loss: 6.9784 | |
| 2025-08-29 03:40:37 - pico-train - INFO - โโโ Learning Rate: 4.86e-05 | |
| 2025-08-29 03:40:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:40:50 - pico-train - INFO - Step 7800 -- ๐ Training Metrics | |
| 2025-08-29 03:40:50 - pico-train - INFO - โโโ Loss: 7.0304 | |
| 2025-08-29 03:40:50 - pico-train - INFO - โโโ Learning Rate: 4.87e-05 | |
| 2025-08-29 03:40:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:41:03 - pico-train - INFO - Step 7825 -- ๐ Training Metrics | |
| 2025-08-29 03:41:03 - pico-train - INFO - โโโ Loss: 7.0000 | |
| 2025-08-29 03:41:03 - pico-train - INFO - โโโ Learning Rate: 4.89e-05 | |
| 2025-08-29 03:41:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:41:15 - pico-train - INFO - Step 7850 -- ๐ Training Metrics | |
| 2025-08-29 03:41:15 - pico-train - INFO - โโโ Loss: 7.0159 | |
| 2025-08-29 03:41:15 - pico-train - INFO - โโโ Learning Rate: 4.91e-05 | |
| 2025-08-29 03:41:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:41:28 - pico-train - INFO - Step 7875 -- ๐ Training Metrics | |
| 2025-08-29 03:41:28 - pico-train - INFO - โโโ Loss: 6.9859 | |
| 2025-08-29 03:41:28 - pico-train - INFO - โโโ Learning Rate: 4.92e-05 | |
| 2025-08-29 03:41:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:41:41 - pico-train - INFO - Step 7900 -- ๐ Training Metrics | |
| 2025-08-29 03:41:41 - pico-train - INFO - โโโ Loss: 6.9348 | |
| 2025-08-29 03:41:41 - pico-train - INFO - โโโ Learning Rate: 4.94e-05 | |
| 2025-08-29 03:41:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:41:54 - pico-train - INFO - Step 7925 -- ๐ Training Metrics | |
| 2025-08-29 03:41:54 - pico-train - INFO - โโโ Loss: 6.9541 | |
| 2025-08-29 03:41:54 - pico-train - INFO - โโโ Learning Rate: 4.95e-05 | |
| 2025-08-29 03:41:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:42:06 - pico-train - INFO - Step 7950 -- ๐ Training Metrics | |
| 2025-08-29 03:42:06 - pico-train - INFO - โโโ Loss: 6.9342 | |
| 2025-08-29 03:42:06 - pico-train - INFO - โโโ Learning Rate: 4.97e-05 | |
| 2025-08-29 03:42:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:42:19 - pico-train - INFO - Step 7975 -- ๐ Training Metrics | |
| 2025-08-29 03:42:19 - pico-train - INFO - โโโ Loss: 7.0294 | |
| 2025-08-29 03:42:19 - pico-train - INFO - โโโ Learning Rate: 4.98e-05 | |
| 2025-08-29 03:42:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:42:31 - pico-train - INFO - Step 8000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:44:33 - pico-train - INFO - Step 8000 -- ๐ Evaluation Results | |
| 2025-08-29 03:44:33 - pico-train - INFO - โโโ paloma: 7.181862506006892e+20 | |
| 2025-08-29 03:44:34 - pico-train - INFO - Step 8000 -- ๐ Training Metrics | |
| 2025-08-29 03:44:34 - pico-train - INFO - โโโ Loss: 7.0412 | |
| 2025-08-29 03:44:34 - pico-train - INFO - โโโ Learning Rate: 5.00e-05 | |
| 2025-08-29 03:44:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:44:34 - pico-train - INFO - Step 8000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:44:49 - pico-train - INFO - Step 8025 -- ๐ Training Metrics | |
| 2025-08-29 03:44:49 - pico-train - INFO - โโโ Loss: 6.9111 | |
| 2025-08-29 03:44:49 - pico-train - INFO - โโโ Learning Rate: 4.99e-05 | |
| 2025-08-29 03:44:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:45:02 - pico-train - INFO - Step 8050 -- ๐ Training Metrics | |
| 2025-08-29 03:45:02 - pico-train - INFO - โโโ Loss: 7.0142 | |
| 2025-08-29 03:45:02 - pico-train - INFO - โโโ Learning Rate: 4.98e-05 | |
| 2025-08-29 03:45:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:45:15 - pico-train - INFO - Step 8075 -- ๐ Training Metrics | |
| 2025-08-29 03:45:15 - pico-train - INFO - โโโ Loss: 6.9201 | |
| 2025-08-29 03:45:15 - pico-train - INFO - โโโ Learning Rate: 4.97e-05 | |
| 2025-08-29 03:45:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:45:28 - pico-train - INFO - Step 8100 -- ๐ Training Metrics | |
| 2025-08-29 03:45:28 - pico-train - INFO - โโโ Loss: 6.9100 | |
| 2025-08-29 03:45:28 - pico-train - INFO - โโโ Learning Rate: 4.96e-05 | |
| 2025-08-29 03:45:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:45:41 - pico-train - INFO - Step 8125 -- ๐ Training Metrics | |
| 2025-08-29 03:45:41 - pico-train - INFO - โโโ Loss: 6.9728 | |
| 2025-08-29 03:45:41 - pico-train - INFO - โโโ Learning Rate: 4.95e-05 | |
| 2025-08-29 03:45:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:45:54 - pico-train - INFO - Step 8150 -- ๐ Training Metrics | |
| 2025-08-29 03:45:54 - pico-train - INFO - โโโ Loss: 6.9963 | |
| 2025-08-29 03:45:54 - pico-train - INFO - โโโ Learning Rate: 4.94e-05 | |
| 2025-08-29 03:45:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:46:07 - pico-train - INFO - Step 8175 -- ๐ Training Metrics | |
| 2025-08-29 03:46:07 - pico-train - INFO - โโโ Loss: 7.0077 | |
| 2025-08-29 03:46:07 - pico-train - INFO - โโโ Learning Rate: 4.93e-05 | |
| 2025-08-29 03:46:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:46:19 - pico-train - INFO - Step 8200 -- ๐ Training Metrics | |
| 2025-08-29 03:46:19 - pico-train - INFO - โโโ Loss: 6.8808 | |
| 2025-08-29 03:46:19 - pico-train - INFO - โโโ Learning Rate: 4.92e-05 | |
| 2025-08-29 03:46:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:46:32 - pico-train - INFO - Step 8225 -- ๐ Training Metrics | |
| 2025-08-29 03:46:32 - pico-train - INFO - โโโ Loss: 6.8500 | |
| 2025-08-29 03:46:32 - pico-train - INFO - โโโ Learning Rate: 4.91e-05 | |
| 2025-08-29 03:46:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:46:45 - pico-train - INFO - Step 8250 -- ๐ Training Metrics | |
| 2025-08-29 03:46:45 - pico-train - INFO - โโโ Loss: 6.9328 | |
| 2025-08-29 03:46:45 - pico-train - INFO - โโโ Learning Rate: 4.90e-05 | |
| 2025-08-29 03:46:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:46:58 - pico-train - INFO - Step 8275 -- ๐ Training Metrics | |
| 2025-08-29 03:46:58 - pico-train - INFO - โโโ Loss: 6.8971 | |
| 2025-08-29 03:46:58 - pico-train - INFO - โโโ Learning Rate: 4.89e-05 | |
| 2025-08-29 03:46:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:47:11 - pico-train - INFO - Step 8300 -- ๐ Training Metrics | |
| 2025-08-29 03:47:11 - pico-train - INFO - โโโ Loss: 6.9635 | |
| 2025-08-29 03:47:11 - pico-train - INFO - โโโ Learning Rate: 4.87e-05 | |
| 2025-08-29 03:47:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:47:24 - pico-train - INFO - Step 8325 -- ๐ Training Metrics | |
| 2025-08-29 03:47:24 - pico-train - INFO - โโโ Loss: 6.8937 | |
| 2025-08-29 03:47:24 - pico-train - INFO - โโโ Learning Rate: 4.86e-05 | |
| 2025-08-29 03:47:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:47:36 - pico-train - INFO - Step 8350 -- ๐ Training Metrics | |
| 2025-08-29 03:47:36 - pico-train - INFO - โโโ Loss: 6.8578 | |
| 2025-08-29 03:47:36 - pico-train - INFO - โโโ Learning Rate: 4.85e-05 | |
| 2025-08-29 03:47:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:47:49 - pico-train - INFO - Step 8375 -- ๐ Training Metrics | |
| 2025-08-29 03:47:49 - pico-train - INFO - โโโ Loss: 6.9492 | |
| 2025-08-29 03:47:49 - pico-train - INFO - โโโ Learning Rate: 4.84e-05 | |
| 2025-08-29 03:47:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:48:02 - pico-train - INFO - Step 8400 -- ๐ Training Metrics | |
| 2025-08-29 03:48:02 - pico-train - INFO - โโโ Loss: 6.8896 | |
| 2025-08-29 03:48:02 - pico-train - INFO - โโโ Learning Rate: 4.83e-05 | |
| 2025-08-29 03:48:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:48:15 - pico-train - INFO - Step 8425 -- ๐ Training Metrics | |
| 2025-08-29 03:48:15 - pico-train - INFO - โโโ Loss: 6.9677 | |
| 2025-08-29 03:48:15 - pico-train - INFO - โโโ Learning Rate: 4.82e-05 | |
| 2025-08-29 03:48:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:48:28 - pico-train - INFO - Step 8450 -- ๐ Training Metrics | |
| 2025-08-29 03:48:28 - pico-train - INFO - โโโ Loss: 6.9071 | |
| 2025-08-29 03:48:28 - pico-train - INFO - โโโ Learning Rate: 4.81e-05 | |
| 2025-08-29 03:48:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:48:40 - pico-train - INFO - Step 8475 -- ๐ Training Metrics | |
| 2025-08-29 03:48:40 - pico-train - INFO - โโโ Loss: 6.8973 | |
| 2025-08-29 03:48:40 - pico-train - INFO - โโโ Learning Rate: 4.80e-05 | |
| 2025-08-29 03:48:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:48:53 - pico-train - INFO - Step 8500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:50:48 - pico-train - INFO - Step 8500 -- ๐ Evaluation Results | |
| 2025-08-29 03:50:48 - pico-train - INFO - โโโ paloma: 1.5123285241831744e+21 | |
| 2025-08-29 03:50:50 - pico-train - INFO - Step 8500 -- ๐ Training Metrics | |
| 2025-08-29 03:50:50 - pico-train - INFO - โโโ Loss: 6.9139 | |
| 2025-08-29 03:50:50 - pico-train - INFO - โโโ Learning Rate: 4.79e-05 | |
| 2025-08-29 03:50:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:50:50 - pico-train - INFO - Step 8500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:51:05 - pico-train - INFO - Step 8525 -- ๐ Training Metrics | |
| 2025-08-29 03:51:05 - pico-train - INFO - โโโ Loss: 6.8983 | |
| 2025-08-29 03:51:05 - pico-train - INFO - โโโ Learning Rate: 4.78e-05 | |
| 2025-08-29 03:51:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:51:18 - pico-train - INFO - Step 8550 -- ๐ Training Metrics | |
| 2025-08-29 03:51:18 - pico-train - INFO - โโโ Loss: 6.8446 | |
| 2025-08-29 03:51:18 - pico-train - INFO - โโโ Learning Rate: 4.77e-05 | |
| 2025-08-29 03:51:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:51:31 - pico-train - INFO - Step 8575 -- ๐ Training Metrics | |
| 2025-08-29 03:51:31 - pico-train - INFO - โโโ Loss: 6.8246 | |
| 2025-08-29 03:51:31 - pico-train - INFO - โโโ Learning Rate: 4.76e-05 | |
| 2025-08-29 03:51:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:51:43 - pico-train - INFO - Step 8600 -- ๐ Training Metrics | |
| 2025-08-29 03:51:43 - pico-train - INFO - โโโ Loss: 6.9637 | |
| 2025-08-29 03:51:43 - pico-train - INFO - โโโ Learning Rate: 4.75e-05 | |
| 2025-08-29 03:51:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:51:56 - pico-train - INFO - Step 8625 -- ๐ Training Metrics | |
| 2025-08-29 03:51:56 - pico-train - INFO - โโโ Loss: 6.8827 | |
| 2025-08-29 03:51:56 - pico-train - INFO - โโโ Learning Rate: 4.74e-05 | |
| 2025-08-29 03:51:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:52:09 - pico-train - INFO - Step 8650 -- ๐ Training Metrics | |
| 2025-08-29 03:52:09 - pico-train - INFO - โโโ Loss: 6.8234 | |
| 2025-08-29 03:52:09 - pico-train - INFO - โโโ Learning Rate: 4.73e-05 | |
| 2025-08-29 03:52:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:52:22 - pico-train - INFO - Step 8675 -- ๐ Training Metrics | |
| 2025-08-29 03:52:22 - pico-train - INFO - โโโ Loss: 6.8270 | |
| 2025-08-29 03:52:22 - pico-train - INFO - โโโ Learning Rate: 4.72e-05 | |
| 2025-08-29 03:52:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:52:35 - pico-train - INFO - Step 8700 -- ๐ Training Metrics | |
| 2025-08-29 03:52:35 - pico-train - INFO - โโโ Loss: 6.9554 | |
| 2025-08-29 03:52:35 - pico-train - INFO - โโโ Learning Rate: 4.71e-05 | |
| 2025-08-29 03:52:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:52:48 - pico-train - INFO - Step 8725 -- ๐ Training Metrics | |
| 2025-08-29 03:52:48 - pico-train - INFO - โโโ Loss: 6.8406 | |
| 2025-08-29 03:52:48 - pico-train - INFO - โโโ Learning Rate: 4.70e-05 | |
| 2025-08-29 03:52:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:53:00 - pico-train - INFO - Step 8750 -- ๐ Training Metrics | |
| 2025-08-29 03:53:00 - pico-train - INFO - โโโ Loss: 6.8328 | |
| 2025-08-29 03:53:00 - pico-train - INFO - โโโ Learning Rate: 4.69e-05 | |
| 2025-08-29 03:53:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:53:13 - pico-train - INFO - Step 8775 -- ๐ Training Metrics | |
| 2025-08-29 03:53:13 - pico-train - INFO - โโโ Loss: 6.8362 | |
| 2025-08-29 03:53:13 - pico-train - INFO - โโโ Learning Rate: 4.68e-05 | |
| 2025-08-29 03:53:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:53:26 - pico-train - INFO - Step 8800 -- ๐ Training Metrics | |
| 2025-08-29 03:53:26 - pico-train - INFO - โโโ Loss: 6.8417 | |
| 2025-08-29 03:53:26 - pico-train - INFO - โโโ Learning Rate: 4.67e-05 | |
| 2025-08-29 03:53:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:53:39 - pico-train - INFO - Step 8825 -- ๐ Training Metrics | |
| 2025-08-29 03:53:39 - pico-train - INFO - โโโ Loss: 6.8248 | |
| 2025-08-29 03:53:39 - pico-train - INFO - โโโ Learning Rate: 4.66e-05 | |
| 2025-08-29 03:53:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:53:52 - pico-train - INFO - Step 8850 -- ๐ Training Metrics | |
| 2025-08-29 03:53:52 - pico-train - INFO - โโโ Loss: 6.7996 | |
| 2025-08-29 03:53:52 - pico-train - INFO - โโโ Learning Rate: 4.65e-05 | |
| 2025-08-29 03:53:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:54:04 - pico-train - INFO - Step 8875 -- ๐ Training Metrics | |
| 2025-08-29 03:54:04 - pico-train - INFO - โโโ Loss: 6.7804 | |
| 2025-08-29 03:54:04 - pico-train - INFO - โโโ Learning Rate: 4.64e-05 | |
| 2025-08-29 03:54:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:54:17 - pico-train - INFO - Step 8900 -- ๐ Training Metrics | |
| 2025-08-29 03:54:17 - pico-train - INFO - โโโ Loss: 6.8802 | |
| 2025-08-29 03:54:17 - pico-train - INFO - โโโ Learning Rate: 4.63e-05 | |
| 2025-08-29 03:54:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:54:30 - pico-train - INFO - Step 8925 -- ๐ Training Metrics | |
| 2025-08-29 03:54:30 - pico-train - INFO - โโโ Loss: 6.8586 | |
| 2025-08-29 03:54:30 - pico-train - INFO - โโโ Learning Rate: 4.61e-05 | |
| 2025-08-29 03:54:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:54:43 - pico-train - INFO - Step 8950 -- ๐ Training Metrics | |
| 2025-08-29 03:54:43 - pico-train - INFO - โโโ Loss: 6.8489 | |
| 2025-08-29 03:54:43 - pico-train - INFO - โโโ Learning Rate: 4.60e-05 | |
| 2025-08-29 03:54:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:54:56 - pico-train - INFO - Step 8975 -- ๐ Training Metrics | |
| 2025-08-29 03:54:56 - pico-train - INFO - โโโ Loss: 6.8592 | |
| 2025-08-29 03:54:56 - pico-train - INFO - โโโ Learning Rate: 4.59e-05 | |
| 2025-08-29 03:54:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:55:08 - pico-train - INFO - Step 9000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 03:57:09 - pico-train - INFO - Step 9000 -- ๐ Evaluation Results | |
| 2025-08-29 03:57:09 - pico-train - INFO - โโโ paloma: 3.573074534351724e+21 | |
| 2025-08-29 03:57:11 - pico-train - INFO - Step 9000 -- ๐ Training Metrics | |
| 2025-08-29 03:57:11 - pico-train - INFO - โโโ Loss: 6.8302 | |
| 2025-08-29 03:57:11 - pico-train - INFO - โโโ Learning Rate: 4.58e-05 | |
| 2025-08-29 03:57:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:57:11 - pico-train - INFO - Step 9000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 03:57:27 - pico-train - INFO - Step 9025 -- ๐ Training Metrics | |
| 2025-08-29 03:57:27 - pico-train - INFO - โโโ Loss: 6.8310 | |
| 2025-08-29 03:57:27 - pico-train - INFO - โโโ Learning Rate: 4.57e-05 | |
| 2025-08-29 03:57:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:57:40 - pico-train - INFO - Step 9050 -- ๐ Training Metrics | |
| 2025-08-29 03:57:40 - pico-train - INFO - โโโ Loss: 6.7991 | |
| 2025-08-29 03:57:40 - pico-train - INFO - โโโ Learning Rate: 4.56e-05 | |
| 2025-08-29 03:57:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:57:53 - pico-train - INFO - Step 9075 -- ๐ Training Metrics | |
| 2025-08-29 03:57:53 - pico-train - INFO - โโโ Loss: 6.8311 | |
| 2025-08-29 03:57:53 - pico-train - INFO - โโโ Learning Rate: 4.55e-05 | |
| 2025-08-29 03:57:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:58:06 - pico-train - INFO - Step 9100 -- ๐ Training Metrics | |
| 2025-08-29 03:58:06 - pico-train - INFO - โโโ Loss: 6.7647 | |
| 2025-08-29 03:58:06 - pico-train - INFO - โโโ Learning Rate: 4.54e-05 | |
| 2025-08-29 03:58:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:58:19 - pico-train - INFO - Step 9125 -- ๐ Training Metrics | |
| 2025-08-29 03:58:19 - pico-train - INFO - โโโ Loss: 6.8225 | |
| 2025-08-29 03:58:19 - pico-train - INFO - โโโ Learning Rate: 4.53e-05 | |
| 2025-08-29 03:58:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:58:32 - pico-train - INFO - Step 9150 -- ๐ Training Metrics | |
| 2025-08-29 03:58:32 - pico-train - INFO - โโโ Loss: 6.7571 | |
| 2025-08-29 03:58:32 - pico-train - INFO - โโโ Learning Rate: 4.52e-05 | |
| 2025-08-29 03:58:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:58:44 - pico-train - INFO - Step 9175 -- ๐ Training Metrics | |
| 2025-08-29 03:58:44 - pico-train - INFO - โโโ Loss: 6.8060 | |
| 2025-08-29 03:58:44 - pico-train - INFO - โโโ Learning Rate: 4.51e-05 | |
| 2025-08-29 03:58:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:58:57 - pico-train - INFO - Step 9200 -- ๐ Training Metrics | |
| 2025-08-29 03:58:57 - pico-train - INFO - โโโ Loss: 6.8348 | |
| 2025-08-29 03:58:57 - pico-train - INFO - โโโ Learning Rate: 4.50e-05 | |
| 2025-08-29 03:58:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:59:10 - pico-train - INFO - Step 9225 -- ๐ Training Metrics | |
| 2025-08-29 03:59:10 - pico-train - INFO - โโโ Loss: 6.9131 | |
| 2025-08-29 03:59:10 - pico-train - INFO - โโโ Learning Rate: 4.49e-05 | |
| 2025-08-29 03:59:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:59:23 - pico-train - INFO - Step 9250 -- ๐ Training Metrics | |
| 2025-08-29 03:59:23 - pico-train - INFO - โโโ Loss: 6.7801 | |
| 2025-08-29 03:59:23 - pico-train - INFO - โโโ Learning Rate: 4.48e-05 | |
| 2025-08-29 03:59:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:59:36 - pico-train - INFO - Step 9275 -- ๐ Training Metrics | |
| 2025-08-29 03:59:36 - pico-train - INFO - โโโ Loss: 6.7776 | |
| 2025-08-29 03:59:36 - pico-train - INFO - โโโ Learning Rate: 4.47e-05 | |
| 2025-08-29 03:59:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 03:59:49 - pico-train - INFO - Step 9300 -- ๐ Training Metrics | |
| 2025-08-29 03:59:49 - pico-train - INFO - โโโ Loss: 6.7160 | |
| 2025-08-29 03:59:49 - pico-train - INFO - โโโ Learning Rate: 4.46e-05 | |
| 2025-08-29 03:59:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:00:01 - pico-train - INFO - Step 9325 -- ๐ Training Metrics | |
| 2025-08-29 04:00:01 - pico-train - INFO - โโโ Loss: 6.8958 | |
| 2025-08-29 04:00:01 - pico-train - INFO - โโโ Learning Rate: 4.45e-05 | |
| 2025-08-29 04:00:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:00:14 - pico-train - INFO - Step 9350 -- ๐ Training Metrics | |
| 2025-08-29 04:00:14 - pico-train - INFO - โโโ Loss: 6.8734 | |
| 2025-08-29 04:00:14 - pico-train - INFO - โโโ Learning Rate: 4.44e-05 | |
| 2025-08-29 04:00:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:00:27 - pico-train - INFO - Step 9375 -- ๐ Training Metrics | |
| 2025-08-29 04:00:27 - pico-train - INFO - โโโ Loss: 6.7203 | |
| 2025-08-29 04:00:27 - pico-train - INFO - โโโ Learning Rate: 4.43e-05 | |
| 2025-08-29 04:00:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:00:40 - pico-train - INFO - Step 9400 -- ๐ Training Metrics | |
| 2025-08-29 04:00:40 - pico-train - INFO - โโโ Loss: 6.7133 | |
| 2025-08-29 04:00:40 - pico-train - INFO - โโโ Learning Rate: 4.42e-05 | |
| 2025-08-29 04:00:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:00:52 - pico-train - INFO - Step 9425 -- ๐ Training Metrics | |
| 2025-08-29 04:00:52 - pico-train - INFO - โโโ Loss: 6.8392 | |
| 2025-08-29 04:00:52 - pico-train - INFO - โโโ Learning Rate: 4.41e-05 | |
| 2025-08-29 04:00:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:01:05 - pico-train - INFO - Step 9450 -- ๐ Training Metrics | |
| 2025-08-29 04:01:05 - pico-train - INFO - โโโ Loss: 6.7945 | |
| 2025-08-29 04:01:05 - pico-train - INFO - โโโ Learning Rate: 4.40e-05 | |
| 2025-08-29 04:01:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:01:18 - pico-train - INFO - Step 9475 -- ๐ Training Metrics | |
| 2025-08-29 04:01:18 - pico-train - INFO - โโโ Loss: 6.7831 | |
| 2025-08-29 04:01:18 - pico-train - INFO - โโโ Learning Rate: 4.39e-05 | |
| 2025-08-29 04:01:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:01:30 - pico-train - INFO - Step 9500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:03:30 - pico-train - INFO - Step 9500 -- ๐ Evaluation Results | |
| 2025-08-29 04:03:30 - pico-train - INFO - โโโ paloma: 7.403721262078652e+21 | |
| 2025-08-29 04:03:32 - pico-train - INFO - Step 9500 -- ๐ Training Metrics | |
| 2025-08-29 04:03:32 - pico-train - INFO - โโโ Loss: 6.7336 | |
| 2025-08-29 04:03:32 - pico-train - INFO - โโโ Learning Rate: 4.37e-05 | |
| 2025-08-29 04:03:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:03:32 - pico-train - INFO - Step 9500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:03:46 - pico-train - INFO - Step 9525 -- ๐ Training Metrics | |
| 2025-08-29 04:03:46 - pico-train - INFO - โโโ Loss: 6.7529 | |
| 2025-08-29 04:03:46 - pico-train - INFO - โโโ Learning Rate: 4.36e-05 | |
| 2025-08-29 04:03:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:03:59 - pico-train - INFO - Step 9550 -- ๐ Training Metrics | |
| 2025-08-29 04:03:59 - pico-train - INFO - โโโ Loss: 6.6838 | |
| 2025-08-29 04:03:59 - pico-train - INFO - โโโ Learning Rate: 4.35e-05 | |
| 2025-08-29 04:03:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:04:12 - pico-train - INFO - Step 9575 -- ๐ Training Metrics | |
| 2025-08-29 04:04:12 - pico-train - INFO - โโโ Loss: 6.7548 | |
| 2025-08-29 04:04:12 - pico-train - INFO - โโโ Learning Rate: 4.34e-05 | |
| 2025-08-29 04:04:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:04:25 - pico-train - INFO - Step 9600 -- ๐ Training Metrics | |
| 2025-08-29 04:04:25 - pico-train - INFO - โโโ Loss: 6.8837 | |
| 2025-08-29 04:04:25 - pico-train - INFO - โโโ Learning Rate: 4.33e-05 | |
| 2025-08-29 04:04:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:04:38 - pico-train - INFO - Step 9625 -- ๐ Training Metrics | |
| 2025-08-29 04:04:38 - pico-train - INFO - โโโ Loss: 6.8271 | |
| 2025-08-29 04:04:38 - pico-train - INFO - โโโ Learning Rate: 4.32e-05 | |
| 2025-08-29 04:04:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:04:51 - pico-train - INFO - Step 9650 -- ๐ Training Metrics | |
| 2025-08-29 04:04:51 - pico-train - INFO - โโโ Loss: 6.7446 | |
| 2025-08-29 04:04:51 - pico-train - INFO - โโโ Learning Rate: 4.31e-05 | |
| 2025-08-29 04:04:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:05:03 - pico-train - INFO - Step 9675 -- ๐ Training Metrics | |
| 2025-08-29 04:05:03 - pico-train - INFO - โโโ Loss: 6.6811 | |
| 2025-08-29 04:05:03 - pico-train - INFO - โโโ Learning Rate: 4.30e-05 | |
| 2025-08-29 04:05:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:05:16 - pico-train - INFO - Step 9700 -- ๐ Training Metrics | |
| 2025-08-29 04:05:16 - pico-train - INFO - โโโ Loss: 6.7641 | |
| 2025-08-29 04:05:16 - pico-train - INFO - โโโ Learning Rate: 4.29e-05 | |
| 2025-08-29 04:05:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:05:29 - pico-train - INFO - Step 9725 -- ๐ Training Metrics | |
| 2025-08-29 04:05:29 - pico-train - INFO - โโโ Loss: 6.6779 | |
| 2025-08-29 04:05:29 - pico-train - INFO - โโโ Learning Rate: 4.28e-05 | |
| 2025-08-29 04:05:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:05:42 - pico-train - INFO - Step 9750 -- ๐ Training Metrics | |
| 2025-08-29 04:05:42 - pico-train - INFO - โโโ Loss: 6.7428 | |
| 2025-08-29 04:05:42 - pico-train - INFO - โโโ Learning Rate: 4.27e-05 | |
| 2025-08-29 04:05:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:05:54 - pico-train - INFO - Step 9775 -- ๐ Training Metrics | |
| 2025-08-29 04:05:54 - pico-train - INFO - โโโ Loss: 6.7698 | |
| 2025-08-29 04:05:54 - pico-train - INFO - โโโ Learning Rate: 4.26e-05 | |
| 2025-08-29 04:05:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:06:07 - pico-train - INFO - Step 9800 -- ๐ Training Metrics | |
| 2025-08-29 04:06:07 - pico-train - INFO - โโโ Loss: 6.7282 | |
| 2025-08-29 04:06:07 - pico-train - INFO - โโโ Learning Rate: 4.25e-05 | |
| 2025-08-29 04:06:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:06:20 - pico-train - INFO - Step 9825 -- ๐ Training Metrics | |
| 2025-08-29 04:06:20 - pico-train - INFO - โโโ Loss: 6.7314 | |
| 2025-08-29 04:06:20 - pico-train - INFO - โโโ Learning Rate: 4.24e-05 | |
| 2025-08-29 04:06:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:06:33 - pico-train - INFO - Step 9850 -- ๐ Training Metrics | |
| 2025-08-29 04:06:33 - pico-train - INFO - โโโ Loss: 6.7281 | |
| 2025-08-29 04:06:33 - pico-train - INFO - โโโ Learning Rate: 4.23e-05 | |
| 2025-08-29 04:06:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:06:46 - pico-train - INFO - Step 9875 -- ๐ Training Metrics | |
| 2025-08-29 04:06:46 - pico-train - INFO - โโโ Loss: 6.8553 | |
| 2025-08-29 04:06:46 - pico-train - INFO - โโโ Learning Rate: 4.22e-05 | |
| 2025-08-29 04:06:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:06:58 - pico-train - INFO - Step 9900 -- ๐ Training Metrics | |
| 2025-08-29 04:06:58 - pico-train - INFO - โโโ Loss: 6.7912 | |
| 2025-08-29 04:06:58 - pico-train - INFO - โโโ Learning Rate: 4.21e-05 | |
| 2025-08-29 04:06:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:07:11 - pico-train - INFO - Step 9925 -- ๐ Training Metrics | |
| 2025-08-29 04:07:11 - pico-train - INFO - โโโ Loss: 6.7301 | |
| 2025-08-29 04:07:11 - pico-train - INFO - โโโ Learning Rate: 4.20e-05 | |
| 2025-08-29 04:07:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:07:24 - pico-train - INFO - Step 9950 -- ๐ Training Metrics | |
| 2025-08-29 04:07:24 - pico-train - INFO - โโโ Loss: 6.7467 | |
| 2025-08-29 04:07:24 - pico-train - INFO - โโโ Learning Rate: 4.19e-05 | |
| 2025-08-29 04:07:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:07:37 - pico-train - INFO - Step 9975 -- ๐ Training Metrics | |
| 2025-08-29 04:07:37 - pico-train - INFO - โโโ Loss: 6.6581 | |
| 2025-08-29 04:07:37 - pico-train - INFO - โโโ Learning Rate: 4.18e-05 | |
| 2025-08-29 04:07:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:07:49 - pico-train - INFO - Step 10000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:09:47 - pico-train - INFO - Step 10000 -- ๐ Evaluation Results | |
| 2025-08-29 04:09:47 - pico-train - INFO - โโโ paloma: 1.0650515380055143e+22 | |
| 2025-08-29 04:09:48 - pico-train - INFO - Step 10000 -- ๐ Training Metrics | |
| 2025-08-29 04:09:48 - pico-train - INFO - โโโ Loss: 6.7114 | |
| 2025-08-29 04:09:48 - pico-train - INFO - โโโ Learning Rate: 4.17e-05 | |
| 2025-08-29 04:09:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:09:48 - pico-train - INFO - Step 10000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:10:03 - pico-train - INFO - Step 10025 -- ๐ Training Metrics | |
| 2025-08-29 04:10:03 - pico-train - INFO - โโโ Loss: 6.7754 | |
| 2025-08-29 04:10:03 - pico-train - INFO - โโโ Learning Rate: 4.16e-05 | |
| 2025-08-29 04:10:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:10:16 - pico-train - INFO - Step 10050 -- ๐ Training Metrics | |
| 2025-08-29 04:10:16 - pico-train - INFO - โโโ Loss: 6.6950 | |
| 2025-08-29 04:10:16 - pico-train - INFO - โโโ Learning Rate: 4.15e-05 | |
| 2025-08-29 04:10:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:10:29 - pico-train - INFO - Step 10075 -- ๐ Training Metrics | |
| 2025-08-29 04:10:29 - pico-train - INFO - โโโ Loss: 6.6791 | |
| 2025-08-29 04:10:29 - pico-train - INFO - โโโ Learning Rate: 4.14e-05 | |
| 2025-08-29 04:10:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:10:42 - pico-train - INFO - Step 10100 -- ๐ Training Metrics | |
| 2025-08-29 04:10:42 - pico-train - INFO - โโโ Loss: 6.6957 | |
| 2025-08-29 04:10:42 - pico-train - INFO - โโโ Learning Rate: 4.12e-05 | |
| 2025-08-29 04:10:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:10:55 - pico-train - INFO - Step 10125 -- ๐ Training Metrics | |
| 2025-08-29 04:10:55 - pico-train - INFO - โโโ Loss: 6.7073 | |
| 2025-08-29 04:10:55 - pico-train - INFO - โโโ Learning Rate: 4.11e-05 | |
| 2025-08-29 04:10:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:11:07 - pico-train - INFO - Step 10150 -- ๐ Training Metrics | |
| 2025-08-29 04:11:07 - pico-train - INFO - โโโ Loss: 6.7740 | |
| 2025-08-29 04:11:07 - pico-train - INFO - โโโ Learning Rate: 4.10e-05 | |
| 2025-08-29 04:11:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:11:20 - pico-train - INFO - Step 10175 -- ๐ Training Metrics | |
| 2025-08-29 04:11:20 - pico-train - INFO - โโโ Loss: 6.8045 | |
| 2025-08-29 04:11:20 - pico-train - INFO - โโโ Learning Rate: 4.09e-05 | |
| 2025-08-29 04:11:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:11:33 - pico-train - INFO - Step 10200 -- ๐ Training Metrics | |
| 2025-08-29 04:11:33 - pico-train - INFO - โโโ Loss: 6.7610 | |
| 2025-08-29 04:11:33 - pico-train - INFO - โโโ Learning Rate: 4.08e-05 | |
| 2025-08-29 04:11:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:11:46 - pico-train - INFO - Step 10225 -- ๐ Training Metrics | |
| 2025-08-29 04:11:46 - pico-train - INFO - โโโ Loss: 6.6995 | |
| 2025-08-29 04:11:46 - pico-train - INFO - โโโ Learning Rate: 4.07e-05 | |
| 2025-08-29 04:11:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:11:59 - pico-train - INFO - Step 10250 -- ๐ Training Metrics | |
| 2025-08-29 04:11:59 - pico-train - INFO - โโโ Loss: 6.6779 | |
| 2025-08-29 04:11:59 - pico-train - INFO - โโโ Learning Rate: 4.06e-05 | |
| 2025-08-29 04:11:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:12:12 - pico-train - INFO - Step 10275 -- ๐ Training Metrics | |
| 2025-08-29 04:12:12 - pico-train - INFO - โโโ Loss: 6.7462 | |
| 2025-08-29 04:12:12 - pico-train - INFO - โโโ Learning Rate: 4.05e-05 | |
| 2025-08-29 04:12:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:12:25 - pico-train - INFO - Step 10300 -- ๐ Training Metrics | |
| 2025-08-29 04:12:25 - pico-train - INFO - โโโ Loss: 6.7099 | |
| 2025-08-29 04:12:25 - pico-train - INFO - โโโ Learning Rate: 4.04e-05 | |
| 2025-08-29 04:12:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:12:37 - pico-train - INFO - Step 10325 -- ๐ Training Metrics | |
| 2025-08-29 04:12:37 - pico-train - INFO - โโโ Loss: 6.7013 | |
| 2025-08-29 04:12:37 - pico-train - INFO - โโโ Learning Rate: 4.03e-05 | |
| 2025-08-29 04:12:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:12:50 - pico-train - INFO - Step 10350 -- ๐ Training Metrics | |
| 2025-08-29 04:12:50 - pico-train - INFO - โโโ Loss: 6.7173 | |
| 2025-08-29 04:12:50 - pico-train - INFO - โโโ Learning Rate: 4.02e-05 | |
| 2025-08-29 04:12:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:13:03 - pico-train - INFO - Step 10375 -- ๐ Training Metrics | |
| 2025-08-29 04:13:03 - pico-train - INFO - โโโ Loss: 6.6967 | |
| 2025-08-29 04:13:03 - pico-train - INFO - โโโ Learning Rate: 4.01e-05 | |
| 2025-08-29 04:13:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:13:16 - pico-train - INFO - Step 10400 -- ๐ Training Metrics | |
| 2025-08-29 04:13:16 - pico-train - INFO - โโโ Loss: 6.7565 | |
| 2025-08-29 04:13:16 - pico-train - INFO - โโโ Learning Rate: 4.00e-05 | |
| 2025-08-29 04:13:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:13:28 - pico-train - INFO - Step 10425 -- ๐ Training Metrics | |
| 2025-08-29 04:13:28 - pico-train - INFO - โโโ Loss: 6.7468 | |
| 2025-08-29 04:13:28 - pico-train - INFO - โโโ Learning Rate: 3.99e-05 | |
| 2025-08-29 04:13:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:13:41 - pico-train - INFO - Step 10450 -- ๐ Training Metrics | |
| 2025-08-29 04:13:41 - pico-train - INFO - โโโ Loss: 6.7132 | |
| 2025-08-29 04:13:41 - pico-train - INFO - โโโ Learning Rate: 3.98e-05 | |
| 2025-08-29 04:13:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:13:54 - pico-train - INFO - Step 10475 -- ๐ Training Metrics | |
| 2025-08-29 04:13:54 - pico-train - INFO - โโโ Loss: 6.6358 | |
| 2025-08-29 04:13:54 - pico-train - INFO - โโโ Learning Rate: 3.97e-05 | |
| 2025-08-29 04:13:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:14:06 - pico-train - INFO - Step 10500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:16:00 - pico-train - INFO - Step 10500 -- ๐ Evaluation Results | |
| 2025-08-29 04:16:00 - pico-train - INFO - โโโ paloma: 2.1077589258137904e+22 | |
| 2025-08-29 04:16:02 - pico-train - INFO - Step 10500 -- ๐ Training Metrics | |
| 2025-08-29 04:16:02 - pico-train - INFO - โโโ Loss: 6.6979 | |
| 2025-08-29 04:16:02 - pico-train - INFO - โโโ Learning Rate: 3.96e-05 | |
| 2025-08-29 04:16:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:16:02 - pico-train - INFO - Step 10500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:16:17 - pico-train - INFO - Step 10525 -- ๐ Training Metrics | |
| 2025-08-29 04:16:17 - pico-train - INFO - โโโ Loss: 6.6512 | |
| 2025-08-29 04:16:17 - pico-train - INFO - โโโ Learning Rate: 3.95e-05 | |
| 2025-08-29 04:16:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:16:30 - pico-train - INFO - Step 10550 -- ๐ Training Metrics | |
| 2025-08-29 04:16:30 - pico-train - INFO - โโโ Loss: 6.6045 | |
| 2025-08-29 04:16:30 - pico-train - INFO - โโโ Learning Rate: 3.94e-05 | |
| 2025-08-29 04:16:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:16:43 - pico-train - INFO - Step 10575 -- ๐ Training Metrics | |
| 2025-08-29 04:16:43 - pico-train - INFO - โโโ Loss: 6.6217 | |
| 2025-08-29 04:16:43 - pico-train - INFO - โโโ Learning Rate: 3.93e-05 | |
| 2025-08-29 04:16:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:16:55 - pico-train - INFO - Step 10600 -- ๐ Training Metrics | |
| 2025-08-29 04:16:55 - pico-train - INFO - โโโ Loss: 6.7091 | |
| 2025-08-29 04:16:55 - pico-train - INFO - โโโ Learning Rate: 3.92e-05 | |
| 2025-08-29 04:16:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:17:08 - pico-train - INFO - Step 10625 -- ๐ Training Metrics | |
| 2025-08-29 04:17:08 - pico-train - INFO - โโโ Loss: 6.6180 | |
| 2025-08-29 04:17:08 - pico-train - INFO - โโโ Learning Rate: 3.91e-05 | |
| 2025-08-29 04:17:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:17:21 - pico-train - INFO - Step 10650 -- ๐ Training Metrics | |
| 2025-08-29 04:17:21 - pico-train - INFO - โโโ Loss: 6.6743 | |
| 2025-08-29 04:17:21 - pico-train - INFO - โโโ Learning Rate: 3.90e-05 | |
| 2025-08-29 04:17:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:17:34 - pico-train - INFO - Step 10675 -- ๐ Training Metrics | |
| 2025-08-29 04:17:34 - pico-train - INFO - โโโ Loss: 6.6481 | |
| 2025-08-29 04:17:34 - pico-train - INFO - โโโ Learning Rate: 3.89e-05 | |
| 2025-08-29 04:17:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:17:46 - pico-train - INFO - Step 10700 -- ๐ Training Metrics | |
| 2025-08-29 04:17:46 - pico-train - INFO - โโโ Loss: 6.6888 | |
| 2025-08-29 04:17:46 - pico-train - INFO - โโโ Learning Rate: 3.87e-05 | |
| 2025-08-29 04:17:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:17:59 - pico-train - INFO - Step 10725 -- ๐ Training Metrics | |
| 2025-08-29 04:17:59 - pico-train - INFO - โโโ Loss: 6.5786 | |
| 2025-08-29 04:17:59 - pico-train - INFO - โโโ Learning Rate: 3.86e-05 | |
| 2025-08-29 04:17:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:18:13 - pico-train - INFO - Step 10750 -- ๐ Training Metrics | |
| 2025-08-29 04:18:13 - pico-train - INFO - โโโ Loss: 6.6917 | |
| 2025-08-29 04:18:13 - pico-train - INFO - โโโ Learning Rate: 3.85e-05 | |
| 2025-08-29 04:18:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:18:26 - pico-train - INFO - Step 10775 -- ๐ Training Metrics | |
| 2025-08-29 04:18:26 - pico-train - INFO - โโโ Loss: 6.6487 | |
| 2025-08-29 04:18:26 - pico-train - INFO - โโโ Learning Rate: 3.84e-05 | |
| 2025-08-29 04:18:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:18:38 - pico-train - INFO - Step 10800 -- ๐ Training Metrics | |
| 2025-08-29 04:18:38 - pico-train - INFO - โโโ Loss: 6.7293 | |
| 2025-08-29 04:18:38 - pico-train - INFO - โโโ Learning Rate: 3.83e-05 | |
| 2025-08-29 04:18:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:18:51 - pico-train - INFO - Step 10825 -- ๐ Training Metrics | |
| 2025-08-29 04:18:51 - pico-train - INFO - โโโ Loss: 6.6369 | |
| 2025-08-29 04:18:51 - pico-train - INFO - โโโ Learning Rate: 3.82e-05 | |
| 2025-08-29 04:18:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:19:04 - pico-train - INFO - Step 10850 -- ๐ Training Metrics | |
| 2025-08-29 04:19:04 - pico-train - INFO - โโโ Loss: 6.7118 | |
| 2025-08-29 04:19:04 - pico-train - INFO - โโโ Learning Rate: 3.81e-05 | |
| 2025-08-29 04:19:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:19:17 - pico-train - INFO - Step 10875 -- ๐ Training Metrics | |
| 2025-08-29 04:19:17 - pico-train - INFO - โโโ Loss: 6.7235 | |
| 2025-08-29 04:19:17 - pico-train - INFO - โโโ Learning Rate: 3.80e-05 | |
| 2025-08-29 04:19:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:19:30 - pico-train - INFO - Step 10900 -- ๐ Training Metrics | |
| 2025-08-29 04:19:30 - pico-train - INFO - โโโ Loss: 6.6963 | |
| 2025-08-29 04:19:30 - pico-train - INFO - โโโ Learning Rate: 3.79e-05 | |
| 2025-08-29 04:19:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:19:42 - pico-train - INFO - Step 10925 -- ๐ Training Metrics | |
| 2025-08-29 04:19:42 - pico-train - INFO - โโโ Loss: 6.6791 | |
| 2025-08-29 04:19:42 - pico-train - INFO - โโโ Learning Rate: 3.78e-05 | |
| 2025-08-29 04:19:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:19:55 - pico-train - INFO - Step 10950 -- ๐ Training Metrics | |
| 2025-08-29 04:19:55 - pico-train - INFO - โโโ Loss: 6.6773 | |
| 2025-08-29 04:19:55 - pico-train - INFO - โโโ Learning Rate: 3.77e-05 | |
| 2025-08-29 04:19:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:20:08 - pico-train - INFO - Step 10975 -- ๐ Training Metrics | |
| 2025-08-29 04:20:08 - pico-train - INFO - โโโ Loss: 6.6819 | |
| 2025-08-29 04:20:08 - pico-train - INFO - โโโ Learning Rate: 3.76e-05 | |
| 2025-08-29 04:20:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:20:20 - pico-train - INFO - Step 11000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:22:23 - pico-train - INFO - Step 11000 -- ๐ Evaluation Results | |
| 2025-08-29 04:22:23 - pico-train - INFO - โโโ paloma: 2.712416409262884e+22 | |
| 2025-08-29 04:22:25 - pico-train - INFO - Step 11000 -- ๐ Training Metrics | |
| 2025-08-29 04:22:25 - pico-train - INFO - โโโ Loss: 6.6167 | |
| 2025-08-29 04:22:25 - pico-train - INFO - โโโ Learning Rate: 3.75e-05 | |
| 2025-08-29 04:22:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:22:25 - pico-train - INFO - Step 11000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:22:40 - pico-train - INFO - Step 11025 -- ๐ Training Metrics | |
| 2025-08-29 04:22:40 - pico-train - INFO - โโโ Loss: 6.6727 | |
| 2025-08-29 04:22:40 - pico-train - INFO - โโโ Learning Rate: 3.74e-05 | |
| 2025-08-29 04:22:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:22:52 - pico-train - INFO - Step 11050 -- ๐ Training Metrics | |
| 2025-08-29 04:22:52 - pico-train - INFO - โโโ Loss: 6.6317 | |
| 2025-08-29 04:22:52 - pico-train - INFO - โโโ Learning Rate: 3.73e-05 | |
| 2025-08-29 04:22:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:23:05 - pico-train - INFO - Step 11075 -- ๐ Training Metrics | |
| 2025-08-29 04:23:05 - pico-train - INFO - โโโ Loss: 6.6432 | |
| 2025-08-29 04:23:05 - pico-train - INFO - โโโ Learning Rate: 3.72e-05 | |
| 2025-08-29 04:23:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:23:18 - pico-train - INFO - Step 11100 -- ๐ Training Metrics | |
| 2025-08-29 04:23:18 - pico-train - INFO - โโโ Loss: 6.6468 | |
| 2025-08-29 04:23:18 - pico-train - INFO - โโโ Learning Rate: 3.71e-05 | |
| 2025-08-29 04:23:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:23:31 - pico-train - INFO - Step 11125 -- ๐ Training Metrics | |
| 2025-08-29 04:23:31 - pico-train - INFO - โโโ Loss: 6.6460 | |
| 2025-08-29 04:23:31 - pico-train - INFO - โโโ Learning Rate: 3.70e-05 | |
| 2025-08-29 04:23:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:23:44 - pico-train - INFO - Step 11150 -- ๐ Training Metrics | |
| 2025-08-29 04:23:44 - pico-train - INFO - โโโ Loss: 6.6852 | |
| 2025-08-29 04:23:44 - pico-train - INFO - โโโ Learning Rate: 3.69e-05 | |
| 2025-08-29 04:23:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:23:56 - pico-train - INFO - Step 11175 -- ๐ Training Metrics | |
| 2025-08-29 04:23:56 - pico-train - INFO - โโโ Loss: 6.5716 | |
| 2025-08-29 04:23:56 - pico-train - INFO - โโโ Learning Rate: 3.68e-05 | |
| 2025-08-29 04:23:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:24:09 - pico-train - INFO - Step 11200 -- ๐ Training Metrics | |
| 2025-08-29 04:24:09 - pico-train - INFO - โโโ Loss: 6.6311 | |
| 2025-08-29 04:24:09 - pico-train - INFO - โโโ Learning Rate: 3.67e-05 | |
| 2025-08-29 04:24:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:24:22 - pico-train - INFO - Step 11225 -- ๐ Training Metrics | |
| 2025-08-29 04:24:22 - pico-train - INFO - โโโ Loss: 6.6480 | |
| 2025-08-29 04:24:22 - pico-train - INFO - โโโ Learning Rate: 3.66e-05 | |
| 2025-08-29 04:24:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:24:35 - pico-train - INFO - Step 11250 -- ๐ Training Metrics | |
| 2025-08-29 04:24:35 - pico-train - INFO - โโโ Loss: 6.6204 | |
| 2025-08-29 04:24:35 - pico-train - INFO - โโโ Learning Rate: 3.65e-05 | |
| 2025-08-29 04:24:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:24:48 - pico-train - INFO - Step 11275 -- ๐ Training Metrics | |
| 2025-08-29 04:24:48 - pico-train - INFO - โโโ Loss: 6.6551 | |
| 2025-08-29 04:24:48 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 | |
| 2025-08-29 04:24:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:25:01 - pico-train - INFO - Step 11300 -- ๐ Training Metrics | |
| 2025-08-29 04:25:01 - pico-train - INFO - โโโ Loss: 6.6013 | |
| 2025-08-29 04:25:01 - pico-train - INFO - โโโ Learning Rate: 3.63e-05 | |
| 2025-08-29 04:25:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:25:13 - pico-train - INFO - Step 11325 -- ๐ Training Metrics | |
| 2025-08-29 04:25:13 - pico-train - INFO - โโโ Loss: 6.6478 | |
| 2025-08-29 04:25:13 - pico-train - INFO - โโโ Learning Rate: 3.61e-05 | |
| 2025-08-29 04:25:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:25:26 - pico-train - INFO - Step 11350 -- ๐ Training Metrics | |
| 2025-08-29 04:25:26 - pico-train - INFO - โโโ Loss: 6.6938 | |
| 2025-08-29 04:25:26 - pico-train - INFO - โโโ Learning Rate: 3.60e-05 | |
| 2025-08-29 04:25:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:25:39 - pico-train - INFO - Step 11375 -- ๐ Training Metrics | |
| 2025-08-29 04:25:39 - pico-train - INFO - โโโ Loss: 6.6124 | |
| 2025-08-29 04:25:39 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 | |
| 2025-08-29 04:25:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:25:52 - pico-train - INFO - Step 11400 -- ๐ Training Metrics | |
| 2025-08-29 04:25:52 - pico-train - INFO - โโโ Loss: 6.6781 | |
| 2025-08-29 04:25:52 - pico-train - INFO - โโโ Learning Rate: 3.58e-05 | |
| 2025-08-29 04:25:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:26:04 - pico-train - INFO - Step 11425 -- ๐ Training Metrics | |
| 2025-08-29 04:26:04 - pico-train - INFO - โโโ Loss: 6.6317 | |
| 2025-08-29 04:26:04 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 | |
| 2025-08-29 04:26:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:26:17 - pico-train - INFO - Step 11450 -- ๐ Training Metrics | |
| 2025-08-29 04:26:17 - pico-train - INFO - โโโ Loss: 6.6195 | |
| 2025-08-29 04:26:17 - pico-train - INFO - โโโ Learning Rate: 3.56e-05 | |
| 2025-08-29 04:26:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:26:30 - pico-train - INFO - Step 11475 -- ๐ Training Metrics | |
| 2025-08-29 04:26:30 - pico-train - INFO - โโโ Loss: 6.5941 | |
| 2025-08-29 04:26:30 - pico-train - INFO - โโโ Learning Rate: 3.55e-05 | |
| 2025-08-29 04:26:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:26:42 - pico-train - INFO - Step 11500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:28:35 - pico-train - INFO - Step 11500 -- ๐ Evaluation Results | |
| 2025-08-29 04:28:35 - pico-train - INFO - โโโ paloma: 4.877238989481918e+22 | |
| 2025-08-29 04:28:36 - pico-train - INFO - Step 11500 -- ๐ Training Metrics | |
| 2025-08-29 04:28:36 - pico-train - INFO - โโโ Loss: 6.5808 | |
| 2025-08-29 04:28:36 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 | |
| 2025-08-29 04:28:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:28:36 - pico-train - INFO - Step 11500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:28:52 - pico-train - INFO - Step 11525 -- ๐ Training Metrics | |
| 2025-08-29 04:28:52 - pico-train - INFO - โโโ Loss: 6.6322 | |
| 2025-08-29 04:28:52 - pico-train - INFO - โโโ Learning Rate: 3.53e-05 | |
| 2025-08-29 04:28:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:29:05 - pico-train - INFO - Step 11550 -- ๐ Training Metrics | |
| 2025-08-29 04:29:05 - pico-train - INFO - โโโ Loss: 6.6172 | |
| 2025-08-29 04:29:05 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 | |
| 2025-08-29 04:29:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:29:18 - pico-train - INFO - Step 11575 -- ๐ Training Metrics | |
| 2025-08-29 04:29:18 - pico-train - INFO - โโโ Loss: 6.6490 | |
| 2025-08-29 04:29:18 - pico-train - INFO - โโโ Learning Rate: 3.51e-05 | |
| 2025-08-29 04:29:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:29:31 - pico-train - INFO - Step 11600 -- ๐ Training Metrics | |
| 2025-08-29 04:29:31 - pico-train - INFO - โโโ Loss: 6.6050 | |
| 2025-08-29 04:29:31 - pico-train - INFO - โโโ Learning Rate: 3.50e-05 | |
| 2025-08-29 04:29:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:29:43 - pico-train - INFO - Step 11625 -- ๐ Training Metrics | |
| 2025-08-29 04:29:43 - pico-train - INFO - โโโ Loss: 6.6184 | |
| 2025-08-29 04:29:43 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 | |
| 2025-08-29 04:29:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:29:56 - pico-train - INFO - Step 11650 -- ๐ Training Metrics | |
| 2025-08-29 04:29:56 - pico-train - INFO - โโโ Loss: 6.5597 | |
| 2025-08-29 04:29:56 - pico-train - INFO - โโโ Learning Rate: 3.48e-05 | |
| 2025-08-29 04:29:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:30:09 - pico-train - INFO - Step 11675 -- ๐ Training Metrics | |
| 2025-08-29 04:30:09 - pico-train - INFO - โโโ Loss: 6.6285 | |
| 2025-08-29 04:30:09 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 | |
| 2025-08-29 04:30:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:30:22 - pico-train - INFO - Step 11700 -- ๐ Training Metrics | |
| 2025-08-29 04:30:22 - pico-train - INFO - โโโ Loss: 6.5209 | |
| 2025-08-29 04:30:22 - pico-train - INFO - โโโ Learning Rate: 3.46e-05 | |
| 2025-08-29 04:30:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:30:34 - pico-train - INFO - Step 11725 -- ๐ Training Metrics | |
| 2025-08-29 04:30:34 - pico-train - INFO - โโโ Loss: 6.5505 | |
| 2025-08-29 04:30:34 - pico-train - INFO - โโโ Learning Rate: 3.45e-05 | |
| 2025-08-29 04:30:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:30:47 - pico-train - INFO - Step 11750 -- ๐ Training Metrics | |
| 2025-08-29 04:30:47 - pico-train - INFO - โโโ Loss: 6.6710 | |
| 2025-08-29 04:30:47 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 | |
| 2025-08-29 04:30:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:31:00 - pico-train - INFO - Step 11775 -- ๐ Training Metrics | |
| 2025-08-29 04:31:00 - pico-train - INFO - โโโ Loss: 6.6403 | |
| 2025-08-29 04:31:00 - pico-train - INFO - โโโ Learning Rate: 3.43e-05 | |
| 2025-08-29 04:31:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:31:13 - pico-train - INFO - Step 11800 -- ๐ Training Metrics | |
| 2025-08-29 04:31:13 - pico-train - INFO - โโโ Loss: 6.5738 | |
| 2025-08-29 04:31:13 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 | |
| 2025-08-29 04:31:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:31:26 - pico-train - INFO - Step 11825 -- ๐ Training Metrics | |
| 2025-08-29 04:31:26 - pico-train - INFO - โโโ Loss: 6.6080 | |
| 2025-08-29 04:31:26 - pico-train - INFO - โโโ Learning Rate: 3.41e-05 | |
| 2025-08-29 04:31:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:31:39 - pico-train - INFO - Step 11850 -- ๐ Training Metrics | |
| 2025-08-29 04:31:39 - pico-train - INFO - โโโ Loss: 6.6406 | |
| 2025-08-29 04:31:39 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 | |
| 2025-08-29 04:31:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:31:51 - pico-train - INFO - Step 11875 -- ๐ Training Metrics | |
| 2025-08-29 04:31:51 - pico-train - INFO - โโโ Loss: 6.6299 | |
| 2025-08-29 04:31:51 - pico-train - INFO - โโโ Learning Rate: 3.39e-05 | |
| 2025-08-29 04:31:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:32:04 - pico-train - INFO - Step 11900 -- ๐ Training Metrics | |
| 2025-08-29 04:32:04 - pico-train - INFO - โโโ Loss: 6.5781 | |
| 2025-08-29 04:32:04 - pico-train - INFO - โโโ Learning Rate: 3.38e-05 | |
| 2025-08-29 04:32:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:32:17 - pico-train - INFO - Step 11925 -- ๐ Training Metrics | |
| 2025-08-29 04:32:17 - pico-train - INFO - โโโ Loss: 6.5003 | |
| 2025-08-29 04:32:17 - pico-train - INFO - โโโ Learning Rate: 3.36e-05 | |
| 2025-08-29 04:32:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:32:30 - pico-train - INFO - Step 11950 -- ๐ Training Metrics | |
| 2025-08-29 04:32:30 - pico-train - INFO - โโโ Loss: 6.6350 | |
| 2025-08-29 04:32:30 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 | |
| 2025-08-29 04:32:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:32:43 - pico-train - INFO - Step 11975 -- ๐ Training Metrics | |
| 2025-08-29 04:32:43 - pico-train - INFO - โโโ Loss: 6.6180 | |
| 2025-08-29 04:32:43 - pico-train - INFO - โโโ Learning Rate: 3.34e-05 | |
| 2025-08-29 04:32:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:32:55 - pico-train - INFO - Step 12000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:34:49 - pico-train - INFO - Step 12000 -- ๐ Evaluation Results | |
| 2025-08-29 04:34:49 - pico-train - INFO - โโโ paloma: 7.219509956260661e+22 | |
| 2025-08-29 04:34:51 - pico-train - INFO - Step 12000 -- ๐ Training Metrics | |
| 2025-08-29 04:34:51 - pico-train - INFO - โโโ Loss: 6.6603 | |
| 2025-08-29 04:34:51 - pico-train - INFO - โโโ Learning Rate: 3.33e-05 | |
| 2025-08-29 04:34:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:34:51 - pico-train - INFO - Step 12000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:35:07 - pico-train - INFO - Step 12025 -- ๐ Training Metrics | |
| 2025-08-29 04:35:07 - pico-train - INFO - โโโ Loss: 6.5507 | |
| 2025-08-29 04:35:07 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 | |
| 2025-08-29 04:35:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:35:20 - pico-train - INFO - Step 12050 -- ๐ Training Metrics | |
| 2025-08-29 04:35:20 - pico-train - INFO - โโโ Loss: 6.5878 | |
| 2025-08-29 04:35:20 - pico-train - INFO - โโโ Learning Rate: 3.31e-05 | |
| 2025-08-29 04:35:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:35:32 - pico-train - INFO - Step 12075 -- ๐ Training Metrics | |
| 2025-08-29 04:35:32 - pico-train - INFO - โโโ Loss: 6.5245 | |
| 2025-08-29 04:35:32 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 | |
| 2025-08-29 04:35:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:35:45 - pico-train - INFO - Step 12100 -- ๐ Training Metrics | |
| 2025-08-29 04:35:45 - pico-train - INFO - โโโ Loss: 6.5629 | |
| 2025-08-29 04:35:45 - pico-train - INFO - โโโ Learning Rate: 3.29e-05 | |
| 2025-08-29 04:35:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:35:58 - pico-train - INFO - Step 12125 -- ๐ Training Metrics | |
| 2025-08-29 04:35:58 - pico-train - INFO - โโโ Loss: 6.6181 | |
| 2025-08-29 04:35:58 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 | |
| 2025-08-29 04:35:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:36:11 - pico-train - INFO - Step 12150 -- ๐ Training Metrics | |
| 2025-08-29 04:36:11 - pico-train - INFO - โโโ Loss: 6.5780 | |
| 2025-08-29 04:36:11 - pico-train - INFO - โโโ Learning Rate: 3.27e-05 | |
| 2025-08-29 04:36:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:36:24 - pico-train - INFO - Step 12175 -- ๐ Training Metrics | |
| 2025-08-29 04:36:24 - pico-train - INFO - โโโ Loss: 6.5753 | |
| 2025-08-29 04:36:24 - pico-train - INFO - โโโ Learning Rate: 3.26e-05 | |
| 2025-08-29 04:36:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:36:36 - pico-train - INFO - Step 12200 -- ๐ Training Metrics | |
| 2025-08-29 04:36:36 - pico-train - INFO - โโโ Loss: 6.6071 | |
| 2025-08-29 04:36:36 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 | |
| 2025-08-29 04:36:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:36:49 - pico-train - INFO - Step 12225 -- ๐ Training Metrics | |
| 2025-08-29 04:36:49 - pico-train - INFO - โโโ Loss: 6.5885 | |
| 2025-08-29 04:36:49 - pico-train - INFO - โโโ Learning Rate: 3.24e-05 | |
| 2025-08-29 04:36:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:37:02 - pico-train - INFO - Step 12250 -- ๐ Training Metrics | |
| 2025-08-29 04:37:02 - pico-train - INFO - โโโ Loss: 6.5413 | |
| 2025-08-29 04:37:02 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 | |
| 2025-08-29 04:37:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:37:15 - pico-train - INFO - Step 12275 -- ๐ Training Metrics | |
| 2025-08-29 04:37:15 - pico-train - INFO - โโโ Loss: 6.6635 | |
| 2025-08-29 04:37:15 - pico-train - INFO - โโโ Learning Rate: 3.22e-05 | |
| 2025-08-29 04:37:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:37:28 - pico-train - INFO - Step 12300 -- ๐ Training Metrics | |
| 2025-08-29 04:37:28 - pico-train - INFO - โโโ Loss: 6.6304 | |
| 2025-08-29 04:37:28 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 | |
| 2025-08-29 04:37:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:37:41 - pico-train - INFO - Step 12325 -- ๐ Training Metrics | |
| 2025-08-29 04:37:41 - pico-train - INFO - โโโ Loss: 6.5078 | |
| 2025-08-29 04:37:41 - pico-train - INFO - โโโ Learning Rate: 3.20e-05 | |
| 2025-08-29 04:37:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:37:53 - pico-train - INFO - Step 12350 -- ๐ Training Metrics | |
| 2025-08-29 04:37:53 - pico-train - INFO - โโโ Loss: 6.5712 | |
| 2025-08-29 04:37:53 - pico-train - INFO - โโโ Learning Rate: 3.19e-05 | |
| 2025-08-29 04:37:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:38:06 - pico-train - INFO - Step 12375 -- ๐ Training Metrics | |
| 2025-08-29 04:38:06 - pico-train - INFO - โโโ Loss: 6.6284 | |
| 2025-08-29 04:38:06 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 | |
| 2025-08-29 04:38:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:38:19 - pico-train - INFO - Step 12400 -- ๐ Training Metrics | |
| 2025-08-29 04:38:19 - pico-train - INFO - โโโ Loss: 6.5837 | |
| 2025-08-29 04:38:19 - pico-train - INFO - โโโ Learning Rate: 3.17e-05 | |
| 2025-08-29 04:38:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:38:32 - pico-train - INFO - Step 12425 -- ๐ Training Metrics | |
| 2025-08-29 04:38:32 - pico-train - INFO - โโโ Loss: 6.5354 | |
| 2025-08-29 04:38:32 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 | |
| 2025-08-29 04:38:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:38:45 - pico-train - INFO - Step 12450 -- ๐ Training Metrics | |
| 2025-08-29 04:38:45 - pico-train - INFO - โโโ Loss: 6.6125 | |
| 2025-08-29 04:38:45 - pico-train - INFO - โโโ Learning Rate: 3.15e-05 | |
| 2025-08-29 04:38:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:38:58 - pico-train - INFO - Step 12475 -- ๐ Training Metrics | |
| 2025-08-29 04:38:58 - pico-train - INFO - โโโ Loss: 6.5477 | |
| 2025-08-29 04:38:58 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 | |
| 2025-08-29 04:38:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:39:10 - pico-train - INFO - Step 12500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:41:09 - pico-train - INFO - Step 12500 -- ๐ Evaluation Results | |
| 2025-08-29 04:41:09 - pico-train - INFO - โโโ paloma: 1.1729325953411656e+23 | |
| 2025-08-29 04:41:11 - pico-train - INFO - Step 12500 -- ๐ Training Metrics | |
| 2025-08-29 04:41:11 - pico-train - INFO - โโโ Loss: 6.5827 | |
| 2025-08-29 04:41:11 - pico-train - INFO - โโโ Learning Rate: 3.13e-05 | |
| 2025-08-29 04:41:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:41:11 - pico-train - INFO - Step 12500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:41:26 - pico-train - INFO - Step 12525 -- ๐ Training Metrics | |
| 2025-08-29 04:41:26 - pico-train - INFO - โโโ Loss: 6.5874 | |
| 2025-08-29 04:41:26 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 | |
| 2025-08-29 04:41:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:41:39 - pico-train - INFO - Step 12550 -- ๐ Training Metrics | |
| 2025-08-29 04:41:39 - pico-train - INFO - โโโ Loss: 6.5437 | |
| 2025-08-29 04:41:39 - pico-train - INFO - โโโ Learning Rate: 3.10e-05 | |
| 2025-08-29 04:41:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:41:52 - pico-train - INFO - Step 12575 -- ๐ Training Metrics | |
| 2025-08-29 04:41:52 - pico-train - INFO - โโโ Loss: 6.5820 | |
| 2025-08-29 04:41:52 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 | |
| 2025-08-29 04:41:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:42:05 - pico-train - INFO - Step 12600 -- ๐ Training Metrics | |
| 2025-08-29 04:42:05 - pico-train - INFO - โโโ Loss: 6.5286 | |
| 2025-08-29 04:42:05 - pico-train - INFO - โโโ Learning Rate: 3.08e-05 | |
| 2025-08-29 04:42:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:42:17 - pico-train - INFO - Step 12625 -- ๐ Training Metrics | |
| 2025-08-29 04:42:17 - pico-train - INFO - โโโ Loss: 6.5144 | |
| 2025-08-29 04:42:17 - pico-train - INFO - โโโ Learning Rate: 3.07e-05 | |
| 2025-08-29 04:42:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:42:30 - pico-train - INFO - Step 12650 -- ๐ Training Metrics | |
| 2025-08-29 04:42:30 - pico-train - INFO - โโโ Loss: 6.5327 | |
| 2025-08-29 04:42:30 - pico-train - INFO - โโโ Learning Rate: 3.06e-05 | |
| 2025-08-29 04:42:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:42:43 - pico-train - INFO - Step 12675 -- ๐ Training Metrics | |
| 2025-08-29 04:42:43 - pico-train - INFO - โโโ Loss: 6.6058 | |
| 2025-08-29 04:42:43 - pico-train - INFO - โโโ Learning Rate: 3.05e-05 | |
| 2025-08-29 04:42:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:42:56 - pico-train - INFO - Step 12700 -- ๐ Training Metrics | |
| 2025-08-29 04:42:56 - pico-train - INFO - โโโ Loss: 6.5626 | |
| 2025-08-29 04:42:56 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 | |
| 2025-08-29 04:42:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:43:08 - pico-train - INFO - Step 12725 -- ๐ Training Metrics | |
| 2025-08-29 04:43:08 - pico-train - INFO - โโโ Loss: 6.4589 | |
| 2025-08-29 04:43:08 - pico-train - INFO - โโโ Learning Rate: 3.03e-05 | |
| 2025-08-29 04:43:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:43:21 - pico-train - INFO - Step 12750 -- ๐ Training Metrics | |
| 2025-08-29 04:43:21 - pico-train - INFO - โโโ Loss: 6.5629 | |
| 2025-08-29 04:43:21 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 | |
| 2025-08-29 04:43:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:43:34 - pico-train - INFO - Step 12775 -- ๐ Training Metrics | |
| 2025-08-29 04:43:34 - pico-train - INFO - โโโ Loss: 6.4815 | |
| 2025-08-29 04:43:34 - pico-train - INFO - โโโ Learning Rate: 3.01e-05 | |
| 2025-08-29 04:43:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:43:47 - pico-train - INFO - Step 12800 -- ๐ Training Metrics | |
| 2025-08-29 04:43:47 - pico-train - INFO - โโโ Loss: 6.5651 | |
| 2025-08-29 04:43:47 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 | |
| 2025-08-29 04:43:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:44:00 - pico-train - INFO - Step 12825 -- ๐ Training Metrics | |
| 2025-08-29 04:44:00 - pico-train - INFO - โโโ Loss: 6.6164 | |
| 2025-08-29 04:44:00 - pico-train - INFO - โโโ Learning Rate: 2.99e-05 | |
| 2025-08-29 04:44:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:44:13 - pico-train - INFO - Step 12850 -- ๐ Training Metrics | |
| 2025-08-29 04:44:13 - pico-train - INFO - โโโ Loss: 6.6102 | |
| 2025-08-29 04:44:13 - pico-train - INFO - โโโ Learning Rate: 2.98e-05 | |
| 2025-08-29 04:44:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:44:25 - pico-train - INFO - Step 12875 -- ๐ Training Metrics | |
| 2025-08-29 04:44:25 - pico-train - INFO - โโโ Loss: 6.4871 | |
| 2025-08-29 04:44:25 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 | |
| 2025-08-29 04:44:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:44:38 - pico-train - INFO - Step 12900 -- ๐ Training Metrics | |
| 2025-08-29 04:44:38 - pico-train - INFO - โโโ Loss: 6.4900 | |
| 2025-08-29 04:44:38 - pico-train - INFO - โโโ Learning Rate: 2.96e-05 | |
| 2025-08-29 04:44:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:44:51 - pico-train - INFO - Step 12925 -- ๐ Training Metrics | |
| 2025-08-29 04:44:51 - pico-train - INFO - โโโ Loss: 6.6028 | |
| 2025-08-29 04:44:51 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 | |
| 2025-08-29 04:44:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:45:03 - pico-train - INFO - Step 12950 -- ๐ Training Metrics | |
| 2025-08-29 04:45:03 - pico-train - INFO - โโโ Loss: 6.5509 | |
| 2025-08-29 04:45:03 - pico-train - INFO - โโโ Learning Rate: 2.94e-05 | |
| 2025-08-29 04:45:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:45:16 - pico-train - INFO - Step 12975 -- ๐ Training Metrics | |
| 2025-08-29 04:45:16 - pico-train - INFO - โโโ Loss: 6.5454 | |
| 2025-08-29 04:45:16 - pico-train - INFO - โโโ Learning Rate: 2.93e-05 | |
| 2025-08-29 04:45:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:45:28 - pico-train - INFO - Step 13000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:47:27 - pico-train - INFO - Step 13000 -- ๐ Evaluation Results | |
| 2025-08-29 04:47:27 - pico-train - INFO - โโโ paloma: 1.729306754923583e+23 | |
| 2025-08-29 04:47:29 - pico-train - INFO - Step 13000 -- ๐ Training Metrics | |
| 2025-08-29 04:47:29 - pico-train - INFO - โโโ Loss: 6.5587 | |
| 2025-08-29 04:47:29 - pico-train - INFO - โโโ Learning Rate: 2.92e-05 | |
| 2025-08-29 04:47:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:47:29 - pico-train - INFO - Step 13000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:47:45 - pico-train - INFO - Step 13025 -- ๐ Training Metrics | |
| 2025-08-29 04:47:45 - pico-train - INFO - โโโ Loss: 6.5862 | |
| 2025-08-29 04:47:45 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 | |
| 2025-08-29 04:47:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:47:57 - pico-train - INFO - Step 13050 -- ๐ Training Metrics | |
| 2025-08-29 04:47:57 - pico-train - INFO - โโโ Loss: 6.5668 | |
| 2025-08-29 04:47:57 - pico-train - INFO - โโโ Learning Rate: 2.90e-05 | |
| 2025-08-29 04:47:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:48:10 - pico-train - INFO - Step 13075 -- ๐ Training Metrics | |
| 2025-08-29 04:48:10 - pico-train - INFO - โโโ Loss: 6.5220 | |
| 2025-08-29 04:48:10 - pico-train - INFO - โโโ Learning Rate: 2.89e-05 | |
| 2025-08-29 04:48:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:48:23 - pico-train - INFO - Step 13100 -- ๐ Training Metrics | |
| 2025-08-29 04:48:23 - pico-train - INFO - โโโ Loss: 6.5044 | |
| 2025-08-29 04:48:23 - pico-train - INFO - โโโ Learning Rate: 2.87e-05 | |
| 2025-08-29 04:48:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:48:36 - pico-train - INFO - Step 13125 -- ๐ Training Metrics | |
| 2025-08-29 04:48:36 - pico-train - INFO - โโโ Loss: 6.6356 | |
| 2025-08-29 04:48:36 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 | |
| 2025-08-29 04:48:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:48:48 - pico-train - INFO - Step 13150 -- ๐ Training Metrics | |
| 2025-08-29 04:48:48 - pico-train - INFO - โโโ Loss: 6.4772 | |
| 2025-08-29 04:48:48 - pico-train - INFO - โโโ Learning Rate: 2.85e-05 | |
| 2025-08-29 04:48:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:49:01 - pico-train - INFO - Step 13175 -- ๐ Training Metrics | |
| 2025-08-29 04:49:01 - pico-train - INFO - โโโ Loss: 6.5504 | |
| 2025-08-29 04:49:01 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 | |
| 2025-08-29 04:49:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:49:14 - pico-train - INFO - Step 13200 -- ๐ Training Metrics | |
| 2025-08-29 04:49:14 - pico-train - INFO - โโโ Loss: 6.5415 | |
| 2025-08-29 04:49:14 - pico-train - INFO - โโโ Learning Rate: 2.83e-05 | |
| 2025-08-29 04:49:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:49:27 - pico-train - INFO - Step 13225 -- ๐ Training Metrics | |
| 2025-08-29 04:49:27 - pico-train - INFO - โโโ Loss: 6.4651 | |
| 2025-08-29 04:49:27 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 | |
| 2025-08-29 04:49:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:49:40 - pico-train - INFO - Step 13250 -- ๐ Training Metrics | |
| 2025-08-29 04:49:40 - pico-train - INFO - โโโ Loss: 6.5536 | |
| 2025-08-29 04:49:40 - pico-train - INFO - โโโ Learning Rate: 2.81e-05 | |
| 2025-08-29 04:49:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:49:52 - pico-train - INFO - Step 13275 -- ๐ Training Metrics | |
| 2025-08-29 04:49:52 - pico-train - INFO - โโโ Loss: 6.4861 | |
| 2025-08-29 04:49:52 - pico-train - INFO - โโโ Learning Rate: 2.80e-05 | |
| 2025-08-29 04:49:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:50:05 - pico-train - INFO - Step 13300 -- ๐ Training Metrics | |
| 2025-08-29 04:50:05 - pico-train - INFO - โโโ Loss: 6.4688 | |
| 2025-08-29 04:50:05 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 | |
| 2025-08-29 04:50:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:50:18 - pico-train - INFO - Step 13325 -- ๐ Training Metrics | |
| 2025-08-29 04:50:18 - pico-train - INFO - โโโ Loss: 6.5549 | |
| 2025-08-29 04:50:18 - pico-train - INFO - โโโ Learning Rate: 2.78e-05 | |
| 2025-08-29 04:50:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:50:30 - pico-train - INFO - Step 13350 -- ๐ Training Metrics | |
| 2025-08-29 04:50:30 - pico-train - INFO - โโโ Loss: 6.4589 | |
| 2025-08-29 04:50:30 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 | |
| 2025-08-29 04:50:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:50:43 - pico-train - INFO - Step 13375 -- ๐ Training Metrics | |
| 2025-08-29 04:50:43 - pico-train - INFO - โโโ Loss: 6.4644 | |
| 2025-08-29 04:50:43 - pico-train - INFO - โโโ Learning Rate: 2.76e-05 | |
| 2025-08-29 04:50:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:50:56 - pico-train - INFO - Step 13400 -- ๐ Training Metrics | |
| 2025-08-29 04:50:56 - pico-train - INFO - โโโ Loss: 6.5937 | |
| 2025-08-29 04:50:56 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 | |
| 2025-08-29 04:50:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:51:08 - pico-train - INFO - Step 13425 -- ๐ Training Metrics | |
| 2025-08-29 04:51:08 - pico-train - INFO - โโโ Loss: 6.5798 | |
| 2025-08-29 04:51:08 - pico-train - INFO - โโโ Learning Rate: 2.74e-05 | |
| 2025-08-29 04:51:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:51:21 - pico-train - INFO - Step 13450 -- ๐ Training Metrics | |
| 2025-08-29 04:51:21 - pico-train - INFO - โโโ Loss: 6.4615 | |
| 2025-08-29 04:51:21 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 | |
| 2025-08-29 04:51:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:51:34 - pico-train - INFO - Step 13475 -- ๐ Training Metrics | |
| 2025-08-29 04:51:34 - pico-train - INFO - โโโ Loss: 6.5173 | |
| 2025-08-29 04:51:34 - pico-train - INFO - โโโ Learning Rate: 2.72e-05 | |
| 2025-08-29 04:51:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:51:46 - pico-train - INFO - Step 13500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 04:53:45 - pico-train - INFO - Step 13500 -- ๐ Evaluation Results | |
| 2025-08-29 04:53:45 - pico-train - INFO - โโโ paloma: 2.4018454768029128e+23 | |
| 2025-08-29 04:53:47 - pico-train - INFO - Step 13500 -- ๐ Training Metrics | |
| 2025-08-29 04:53:47 - pico-train - INFO - โโโ Loss: 6.4795 | |
| 2025-08-29 04:53:47 - pico-train - INFO - โโโ Learning Rate: 2.71e-05 | |
| 2025-08-29 04:53:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:53:47 - pico-train - INFO - Step 13500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 04:54:02 - pico-train - INFO - Step 13525 -- ๐ Training Metrics | |
| 2025-08-29 04:54:02 - pico-train - INFO - โโโ Loss: 6.4789 | |
| 2025-08-29 04:54:02 - pico-train - INFO - โโโ Learning Rate: 2.70e-05 | |
| 2025-08-29 04:54:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:54:15 - pico-train - INFO - Step 13550 -- ๐ Training Metrics | |
| 2025-08-29 04:54:15 - pico-train - INFO - โโโ Loss: 6.4835 | |
| 2025-08-29 04:54:15 - pico-train - INFO - โโโ Learning Rate: 2.69e-05 | |
| 2025-08-29 04:54:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:54:27 - pico-train - INFO - Step 13575 -- ๐ Training Metrics | |
| 2025-08-29 04:54:27 - pico-train - INFO - โโโ Loss: 6.5405 | |
| 2025-08-29 04:54:27 - pico-train - INFO - โโโ Learning Rate: 2.68e-05 | |
| 2025-08-29 04:54:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:54:40 - pico-train - INFO - Step 13600 -- ๐ Training Metrics | |
| 2025-08-29 04:54:40 - pico-train - INFO - โโโ Loss: 6.4616 | |
| 2025-08-29 04:54:40 - pico-train - INFO - โโโ Learning Rate: 2.67e-05 | |
| 2025-08-29 04:54:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:54:53 - pico-train - INFO - Step 13625 -- ๐ Training Metrics | |
| 2025-08-29 04:54:53 - pico-train - INFO - โโโ Loss: 6.4578 | |
| 2025-08-29 04:54:53 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 | |
| 2025-08-29 04:54:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:55:06 - pico-train - INFO - Step 13650 -- ๐ Training Metrics | |
| 2025-08-29 04:55:06 - pico-train - INFO - โโโ Loss: 6.4083 | |
| 2025-08-29 04:55:06 - pico-train - INFO - โโโ Learning Rate: 2.65e-05 | |
| 2025-08-29 04:55:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:55:18 - pico-train - INFO - Step 13675 -- ๐ Training Metrics | |
| 2025-08-29 04:55:18 - pico-train - INFO - โโโ Loss: 6.5610 | |
| 2025-08-29 04:55:18 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 | |
| 2025-08-29 04:55:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:55:31 - pico-train - INFO - Step 13700 -- ๐ Training Metrics | |
| 2025-08-29 04:55:31 - pico-train - INFO - โโโ Loss: 6.5432 | |
| 2025-08-29 04:55:31 - pico-train - INFO - โโโ Learning Rate: 2.63e-05 | |
| 2025-08-29 04:55:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:55:44 - pico-train - INFO - Step 13725 -- ๐ Training Metrics | |
| 2025-08-29 04:55:44 - pico-train - INFO - โโโ Loss: 6.5119 | |
| 2025-08-29 04:55:44 - pico-train - INFO - โโโ Learning Rate: 2.61e-05 | |
| 2025-08-29 04:55:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:55:57 - pico-train - INFO - Step 13750 -- ๐ Training Metrics | |
| 2025-08-29 04:55:57 - pico-train - INFO - โโโ Loss: 6.4540 | |
| 2025-08-29 04:55:57 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 | |
| 2025-08-29 04:55:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:56:09 - pico-train - INFO - Step 13775 -- ๐ Training Metrics | |
| 2025-08-29 04:56:09 - pico-train - INFO - โโโ Loss: 6.4400 | |
| 2025-08-29 04:56:09 - pico-train - INFO - โโโ Learning Rate: 2.59e-05 | |
| 2025-08-29 04:56:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:56:22 - pico-train - INFO - Step 13800 -- ๐ Training Metrics | |
| 2025-08-29 04:56:22 - pico-train - INFO - โโโ Loss: 6.4767 | |
| 2025-08-29 04:56:22 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 | |
| 2025-08-29 04:56:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:56:35 - pico-train - INFO - Step 13825 -- ๐ Training Metrics | |
| 2025-08-29 04:56:35 - pico-train - INFO - โโโ Loss: 6.4765 | |
| 2025-08-29 04:56:35 - pico-train - INFO - โโโ Learning Rate: 2.57e-05 | |
| 2025-08-29 04:56:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:56:47 - pico-train - INFO - Step 13850 -- ๐ Training Metrics | |
| 2025-08-29 04:56:47 - pico-train - INFO - โโโ Loss: 6.5018 | |
| 2025-08-29 04:56:47 - pico-train - INFO - โโโ Learning Rate: 2.56e-05 | |
| 2025-08-29 04:56:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:57:00 - pico-train - INFO - Step 13875 -- ๐ Training Metrics | |
| 2025-08-29 04:57:00 - pico-train - INFO - โโโ Loss: 6.5011 | |
| 2025-08-29 04:57:00 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 | |
| 2025-08-29 04:57:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:57:13 - pico-train - INFO - Step 13900 -- ๐ Training Metrics | |
| 2025-08-29 04:57:13 - pico-train - INFO - โโโ Loss: 6.4283 | |
| 2025-08-29 04:57:13 - pico-train - INFO - โโโ Learning Rate: 2.54e-05 | |
| 2025-08-29 04:57:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:57:25 - pico-train - INFO - Step 13925 -- ๐ Training Metrics | |
| 2025-08-29 04:57:25 - pico-train - INFO - โโโ Loss: 6.5190 | |
| 2025-08-29 04:57:25 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 | |
| 2025-08-29 04:57:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:57:38 - pico-train - INFO - Step 13950 -- ๐ Training Metrics | |
| 2025-08-29 04:57:38 - pico-train - INFO - โโโ Loss: 6.4388 | |
| 2025-08-29 04:57:38 - pico-train - INFO - โโโ Learning Rate: 2.52e-05 | |
| 2025-08-29 04:57:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:57:51 - pico-train - INFO - Step 13975 -- ๐ Training Metrics | |
| 2025-08-29 04:57:51 - pico-train - INFO - โโโ Loss: 6.4550 | |
| 2025-08-29 04:57:51 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 | |
| 2025-08-29 04:57:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 04:58:03 - pico-train - INFO - Step 14000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:00:03 - pico-train - INFO - Step 14000 -- ๐ Evaluation Results | |
| 2025-08-29 05:00:03 - pico-train - INFO - โโโ paloma: 3.247328955167052e+23 | |
| 2025-08-29 05:00:05 - pico-train - INFO - Step 14000 -- ๐ Training Metrics | |
| 2025-08-29 05:00:05 - pico-train - INFO - โโโ Loss: 6.3491 | |
| 2025-08-29 05:00:05 - pico-train - INFO - โโโ Learning Rate: 2.50e-05 | |
| 2025-08-29 05:00:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:00:05 - pico-train - INFO - Step 14000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:00:21 - pico-train - INFO - Step 14025 -- ๐ Training Metrics | |
| 2025-08-29 05:00:21 - pico-train - INFO - โโโ Loss: 6.5285 | |
| 2025-08-29 05:00:21 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 | |
| 2025-08-29 05:00:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:00:33 - pico-train - INFO - Step 14050 -- ๐ Training Metrics | |
| 2025-08-29 05:00:33 - pico-train - INFO - โโโ Loss: 6.5082 | |
| 2025-08-29 05:00:33 - pico-train - INFO - โโโ Learning Rate: 2.48e-05 | |
| 2025-08-29 05:00:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:00:46 - pico-train - INFO - Step 14075 -- ๐ Training Metrics | |
| 2025-08-29 05:00:46 - pico-train - INFO - โโโ Loss: 6.5451 | |
| 2025-08-29 05:00:46 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 | |
| 2025-08-29 05:00:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:00:59 - pico-train - INFO - Step 14100 -- ๐ Training Metrics | |
| 2025-08-29 05:00:59 - pico-train - INFO - โโโ Loss: 6.4753 | |
| 2025-08-29 05:00:59 - pico-train - INFO - โโโ Learning Rate: 2.46e-05 | |
| 2025-08-29 05:00:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:01:12 - pico-train - INFO - Step 14125 -- ๐ Training Metrics | |
| 2025-08-29 05:01:12 - pico-train - INFO - โโโ Loss: 6.6011 | |
| 2025-08-29 05:01:12 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 | |
| 2025-08-29 05:01:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:01:25 - pico-train - INFO - Step 14150 -- ๐ Training Metrics | |
| 2025-08-29 05:01:25 - pico-train - INFO - โโโ Loss: 6.4885 | |
| 2025-08-29 05:01:25 - pico-train - INFO - โโโ Learning Rate: 2.44e-05 | |
| 2025-08-29 05:01:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:01:38 - pico-train - INFO - Step 14175 -- ๐ Training Metrics | |
| 2025-08-29 05:01:38 - pico-train - INFO - โโโ Loss: 6.4635 | |
| 2025-08-29 05:01:38 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 | |
| 2025-08-29 05:01:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:01:50 - pico-train - INFO - Step 14200 -- ๐ Training Metrics | |
| 2025-08-29 05:01:50 - pico-train - INFO - โโโ Loss: 6.5519 | |
| 2025-08-29 05:01:50 - pico-train - INFO - โโโ Learning Rate: 2.42e-05 | |
| 2025-08-29 05:01:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:02:03 - pico-train - INFO - Step 14225 -- ๐ Training Metrics | |
| 2025-08-29 05:02:03 - pico-train - INFO - โโโ Loss: 6.4356 | |
| 2025-08-29 05:02:03 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 | |
| 2025-08-29 05:02:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:02:16 - pico-train - INFO - Step 14250 -- ๐ Training Metrics | |
| 2025-08-29 05:02:16 - pico-train - INFO - โโโ Loss: 6.4552 | |
| 2025-08-29 05:02:16 - pico-train - INFO - โโโ Learning Rate: 2.40e-05 | |
| 2025-08-29 05:02:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:02:29 - pico-train - INFO - Step 14275 -- ๐ Training Metrics | |
| 2025-08-29 05:02:29 - pico-train - INFO - โโโ Loss: 6.4613 | |
| 2025-08-29 05:02:29 - pico-train - INFO - โโโ Learning Rate: 2.39e-05 | |
| 2025-08-29 05:02:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:02:42 - pico-train - INFO - Step 14300 -- ๐ Training Metrics | |
| 2025-08-29 05:02:42 - pico-train - INFO - โโโ Loss: 6.4411 | |
| 2025-08-29 05:02:42 - pico-train - INFO - โโโ Learning Rate: 2.38e-05 | |
| 2025-08-29 05:02:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:02:54 - pico-train - INFO - Step 14325 -- ๐ Training Metrics | |
| 2025-08-29 05:02:54 - pico-train - INFO - โโโ Loss: 6.5570 | |
| 2025-08-29 05:02:54 - pico-train - INFO - โโโ Learning Rate: 2.36e-05 | |
| 2025-08-29 05:02:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:03:07 - pico-train - INFO - Step 14350 -- ๐ Training Metrics | |
| 2025-08-29 05:03:07 - pico-train - INFO - โโโ Loss: 6.4476 | |
| 2025-08-29 05:03:07 - pico-train - INFO - โโโ Learning Rate: 2.35e-05 | |
| 2025-08-29 05:03:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:03:20 - pico-train - INFO - Step 14375 -- ๐ Training Metrics | |
| 2025-08-29 05:03:20 - pico-train - INFO - โโโ Loss: 6.5895 | |
| 2025-08-29 05:03:20 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 | |
| 2025-08-29 05:03:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:03:33 - pico-train - INFO - Step 14400 -- ๐ Training Metrics | |
| 2025-08-29 05:03:33 - pico-train - INFO - โโโ Loss: 6.4836 | |
| 2025-08-29 05:03:33 - pico-train - INFO - โโโ Learning Rate: 2.33e-05 | |
| 2025-08-29 05:03:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:03:46 - pico-train - INFO - Step 14425 -- ๐ Training Metrics | |
| 2025-08-29 05:03:46 - pico-train - INFO - โโโ Loss: 6.4175 | |
| 2025-08-29 05:03:46 - pico-train - INFO - โโโ Learning Rate: 2.32e-05 | |
| 2025-08-29 05:03:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:03:58 - pico-train - INFO - Step 14450 -- ๐ Training Metrics | |
| 2025-08-29 05:03:58 - pico-train - INFO - โโโ Loss: 6.4971 | |
| 2025-08-29 05:03:58 - pico-train - INFO - โโโ Learning Rate: 2.31e-05 | |
| 2025-08-29 05:03:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:04:11 - pico-train - INFO - Step 14475 -- ๐ Training Metrics | |
| 2025-08-29 05:04:11 - pico-train - INFO - โโโ Loss: 6.4897 | |
| 2025-08-29 05:04:11 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 | |
| 2025-08-29 05:04:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:04:23 - pico-train - INFO - Step 14500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:06:21 - pico-train - INFO - Step 14500 -- ๐ Evaluation Results | |
| 2025-08-29 05:06:21 - pico-train - INFO - โโโ paloma: 4.43239578722337e+23 | |
| 2025-08-29 05:06:25 - pico-train - INFO - Step 14500 -- ๐ Training Metrics | |
| 2025-08-29 05:06:25 - pico-train - INFO - โโโ Loss: 6.4550 | |
| 2025-08-29 05:06:25 - pico-train - INFO - โโโ Learning Rate: 2.29e-05 | |
| 2025-08-29 05:06:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:06:25 - pico-train - INFO - Step 14500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:06:40 - pico-train - INFO - Step 14525 -- ๐ Training Metrics | |
| 2025-08-29 05:06:40 - pico-train - INFO - โโโ Loss: 6.4688 | |
| 2025-08-29 05:06:40 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 | |
| 2025-08-29 05:06:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:06:53 - pico-train - INFO - Step 14550 -- ๐ Training Metrics | |
| 2025-08-29 05:06:53 - pico-train - INFO - โโโ Loss: 6.5494 | |
| 2025-08-29 05:06:53 - pico-train - INFO - โโโ Learning Rate: 2.27e-05 | |
| 2025-08-29 05:06:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:07:06 - pico-train - INFO - Step 14575 -- ๐ Training Metrics | |
| 2025-08-29 05:07:06 - pico-train - INFO - โโโ Loss: 6.4501 | |
| 2025-08-29 05:07:06 - pico-train - INFO - โโโ Learning Rate: 2.26e-05 | |
| 2025-08-29 05:07:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:07:19 - pico-train - INFO - Step 14600 -- ๐ Training Metrics | |
| 2025-08-29 05:07:19 - pico-train - INFO - โโโ Loss: 6.5142 | |
| 2025-08-29 05:07:19 - pico-train - INFO - โโโ Learning Rate: 2.25e-05 | |
| 2025-08-29 05:07:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:07:32 - pico-train - INFO - Step 14625 -- ๐ Training Metrics | |
| 2025-08-29 05:07:32 - pico-train - INFO - โโโ Loss: 6.4891 | |
| 2025-08-29 05:07:32 - pico-train - INFO - โโโ Learning Rate: 2.24e-05 | |
| 2025-08-29 05:07:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:07:44 - pico-train - INFO - Step 14650 -- ๐ Training Metrics | |
| 2025-08-29 05:07:44 - pico-train - INFO - โโโ Loss: 6.4274 | |
| 2025-08-29 05:07:44 - pico-train - INFO - โโโ Learning Rate: 2.23e-05 | |
| 2025-08-29 05:07:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:07:57 - pico-train - INFO - Step 14675 -- ๐ Training Metrics | |
| 2025-08-29 05:07:57 - pico-train - INFO - โโโ Loss: 6.5277 | |
| 2025-08-29 05:07:57 - pico-train - INFO - โโโ Learning Rate: 2.22e-05 | |
| 2025-08-29 05:07:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:08:10 - pico-train - INFO - Step 14700 -- ๐ Training Metrics | |
| 2025-08-29 05:08:10 - pico-train - INFO - โโโ Loss: 6.4472 | |
| 2025-08-29 05:08:10 - pico-train - INFO - โโโ Learning Rate: 2.21e-05 | |
| 2025-08-29 05:08:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:08:23 - pico-train - INFO - Step 14725 -- ๐ Training Metrics | |
| 2025-08-29 05:08:23 - pico-train - INFO - โโโ Loss: 6.4328 | |
| 2025-08-29 05:08:23 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 | |
| 2025-08-29 05:08:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:08:36 - pico-train - INFO - Step 14750 -- ๐ Training Metrics | |
| 2025-08-29 05:08:36 - pico-train - INFO - โโโ Loss: 6.4928 | |
| 2025-08-29 05:08:36 - pico-train - INFO - โโโ Learning Rate: 2.19e-05 | |
| 2025-08-29 05:08:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:08:49 - pico-train - INFO - Step 14775 -- ๐ Training Metrics | |
| 2025-08-29 05:08:49 - pico-train - INFO - โโโ Loss: 6.5520 | |
| 2025-08-29 05:08:49 - pico-train - INFO - โโโ Learning Rate: 2.18e-05 | |
| 2025-08-29 05:08:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:09:01 - pico-train - INFO - Step 14800 -- ๐ Training Metrics | |
| 2025-08-29 05:09:01 - pico-train - INFO - โโโ Loss: 6.5474 | |
| 2025-08-29 05:09:01 - pico-train - INFO - โโโ Learning Rate: 2.17e-05 | |
| 2025-08-29 05:09:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:09:14 - pico-train - INFO - Step 14825 -- ๐ Training Metrics | |
| 2025-08-29 05:09:14 - pico-train - INFO - โโโ Loss: 6.4394 | |
| 2025-08-29 05:09:14 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 | |
| 2025-08-29 05:09:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:09:27 - pico-train - INFO - Step 14850 -- ๐ Training Metrics | |
| 2025-08-29 05:09:27 - pico-train - INFO - โโโ Loss: 6.5234 | |
| 2025-08-29 05:09:27 - pico-train - INFO - โโโ Learning Rate: 2.15e-05 | |
| 2025-08-29 05:09:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:09:40 - pico-train - INFO - Step 14875 -- ๐ Training Metrics | |
| 2025-08-29 05:09:40 - pico-train - INFO - โโโ Loss: 6.4369 | |
| 2025-08-29 05:09:40 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 | |
| 2025-08-29 05:09:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:09:53 - pico-train - INFO - Step 14900 -- ๐ Training Metrics | |
| 2025-08-29 05:09:53 - pico-train - INFO - โโโ Loss: 6.4694 | |
| 2025-08-29 05:09:53 - pico-train - INFO - โโโ Learning Rate: 2.13e-05 | |
| 2025-08-29 05:09:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:10:05 - pico-train - INFO - Step 14925 -- ๐ Training Metrics | |
| 2025-08-29 05:10:05 - pico-train - INFO - โโโ Loss: 6.5837 | |
| 2025-08-29 05:10:05 - pico-train - INFO - โโโ Learning Rate: 2.11e-05 | |
| 2025-08-29 05:10:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:10:18 - pico-train - INFO - Step 14950 -- ๐ Training Metrics | |
| 2025-08-29 05:10:18 - pico-train - INFO - โโโ Loss: 6.4841 | |
| 2025-08-29 05:10:18 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 | |
| 2025-08-29 05:10:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:10:31 - pico-train - INFO - Step 14975 -- ๐ Training Metrics | |
| 2025-08-29 05:10:31 - pico-train - INFO - โโโ Loss: 6.4347 | |
| 2025-08-29 05:10:31 - pico-train - INFO - โโโ Learning Rate: 2.09e-05 | |
| 2025-08-29 05:10:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:10:43 - pico-train - INFO - Step 15000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:12:54 - pico-train - INFO - Step 15000 -- ๐ Evaluation Results | |
| 2025-08-29 05:12:54 - pico-train - INFO - โโโ paloma: 5.215164570276226e+23 | |
| 2025-08-29 05:12:55 - pico-train - INFO - Step 15000 -- ๐ Training Metrics | |
| 2025-08-29 05:12:55 - pico-train - INFO - โโโ Loss: 6.5816 | |
| 2025-08-29 05:12:55 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 | |
| 2025-08-29 05:12:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:12:55 - pico-train - INFO - Step 15000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:13:10 - pico-train - INFO - Step 15025 -- ๐ Training Metrics | |
| 2025-08-29 05:13:10 - pico-train - INFO - โโโ Loss: 6.5337 | |
| 2025-08-29 05:13:10 - pico-train - INFO - โโโ Learning Rate: 2.07e-05 | |
| 2025-08-29 05:13:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:13:23 - pico-train - INFO - Step 15050 -- ๐ Training Metrics | |
| 2025-08-29 05:13:23 - pico-train - INFO - โโโ Loss: 6.5131 | |
| 2025-08-29 05:13:23 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 | |
| 2025-08-29 05:13:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:13:36 - pico-train - INFO - Step 15075 -- ๐ Training Metrics | |
| 2025-08-29 05:13:36 - pico-train - INFO - โโโ Loss: 6.4669 | |
| 2025-08-29 05:13:36 - pico-train - INFO - โโโ Learning Rate: 2.05e-05 | |
| 2025-08-29 05:13:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:13:49 - pico-train - INFO - Step 15100 -- ๐ Training Metrics | |
| 2025-08-29 05:13:49 - pico-train - INFO - โโโ Loss: 6.5141 | |
| 2025-08-29 05:13:49 - pico-train - INFO - โโโ Learning Rate: 2.04e-05 | |
| 2025-08-29 05:13:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:14:02 - pico-train - INFO - Step 15125 -- ๐ Training Metrics | |
| 2025-08-29 05:14:02 - pico-train - INFO - โโโ Loss: 6.4380 | |
| 2025-08-29 05:14:02 - pico-train - INFO - โโโ Learning Rate: 2.03e-05 | |
| 2025-08-29 05:14:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:14:15 - pico-train - INFO - Step 15150 -- ๐ Training Metrics | |
| 2025-08-29 05:14:15 - pico-train - INFO - โโโ Loss: 6.4036 | |
| 2025-08-29 05:14:15 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 | |
| 2025-08-29 05:14:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:14:27 - pico-train - INFO - Step 15175 -- ๐ Training Metrics | |
| 2025-08-29 05:14:27 - pico-train - INFO - โโโ Loss: 6.4517 | |
| 2025-08-29 05:14:27 - pico-train - INFO - โโโ Learning Rate: 2.01e-05 | |
| 2025-08-29 05:14:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:14:40 - pico-train - INFO - Step 15200 -- ๐ Training Metrics | |
| 2025-08-29 05:14:40 - pico-train - INFO - โโโ Loss: 6.4770 | |
| 2025-08-29 05:14:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 | |
| 2025-08-29 05:14:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:14:53 - pico-train - INFO - Step 15225 -- ๐ Training Metrics | |
| 2025-08-29 05:14:53 - pico-train - INFO - โโโ Loss: 6.4317 | |
| 2025-08-29 05:14:53 - pico-train - INFO - โโโ Learning Rate: 1.99e-05 | |
| 2025-08-29 05:14:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:15:06 - pico-train - INFO - Step 15250 -- ๐ Training Metrics | |
| 2025-08-29 05:15:06 - pico-train - INFO - โโโ Loss: 6.4880 | |
| 2025-08-29 05:15:06 - pico-train - INFO - โโโ Learning Rate: 1.98e-05 | |
| 2025-08-29 05:15:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:15:19 - pico-train - INFO - Step 15275 -- ๐ Training Metrics | |
| 2025-08-29 05:15:19 - pico-train - INFO - โโโ Loss: 6.4466 | |
| 2025-08-29 05:15:19 - pico-train - INFO - โโโ Learning Rate: 1.97e-05 | |
| 2025-08-29 05:15:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:15:32 - pico-train - INFO - Step 15300 -- ๐ Training Metrics | |
| 2025-08-29 05:15:32 - pico-train - INFO - โโโ Loss: 6.4248 | |
| 2025-08-29 05:15:32 - pico-train - INFO - โโโ Learning Rate: 1.96e-05 | |
| 2025-08-29 05:15:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:15:45 - pico-train - INFO - Step 15325 -- ๐ Training Metrics | |
| 2025-08-29 05:15:45 - pico-train - INFO - โโโ Loss: 6.3834 | |
| 2025-08-29 05:15:45 - pico-train - INFO - โโโ Learning Rate: 1.95e-05 | |
| 2025-08-29 05:15:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:15:58 - pico-train - INFO - Step 15350 -- ๐ Training Metrics | |
| 2025-08-29 05:15:58 - pico-train - INFO - โโโ Loss: 6.4272 | |
| 2025-08-29 05:15:58 - pico-train - INFO - โโโ Learning Rate: 1.94e-05 | |
| 2025-08-29 05:15:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:16:11 - pico-train - INFO - Step 15375 -- ๐ Training Metrics | |
| 2025-08-29 05:16:11 - pico-train - INFO - โโโ Loss: 6.4834 | |
| 2025-08-29 05:16:11 - pico-train - INFO - โโโ Learning Rate: 1.93e-05 | |
| 2025-08-29 05:16:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:16:24 - pico-train - INFO - Step 15400 -- ๐ Training Metrics | |
| 2025-08-29 05:16:24 - pico-train - INFO - โโโ Loss: 6.4050 | |
| 2025-08-29 05:16:24 - pico-train - INFO - โโโ Learning Rate: 1.92e-05 | |
| 2025-08-29 05:16:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:16:36 - pico-train - INFO - Step 15425 -- ๐ Training Metrics | |
| 2025-08-29 05:16:36 - pico-train - INFO - โโโ Loss: 6.4264 | |
| 2025-08-29 05:16:36 - pico-train - INFO - โโโ Learning Rate: 1.91e-05 | |
| 2025-08-29 05:16:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:16:49 - pico-train - INFO - Step 15450 -- ๐ Training Metrics | |
| 2025-08-29 05:16:49 - pico-train - INFO - โโโ Loss: 6.4941 | |
| 2025-08-29 05:16:49 - pico-train - INFO - โโโ Learning Rate: 1.90e-05 | |
| 2025-08-29 05:16:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:17:02 - pico-train - INFO - Step 15475 -- ๐ Training Metrics | |
| 2025-08-29 05:17:02 - pico-train - INFO - โโโ Loss: 6.4755 | |
| 2025-08-29 05:17:02 - pico-train - INFO - โโโ Learning Rate: 1.89e-05 | |
| 2025-08-29 05:17:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:17:14 - pico-train - INFO - Step 15500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:19:19 - pico-train - INFO - Step 15500 -- ๐ Evaluation Results | |
| 2025-08-29 05:19:19 - pico-train - INFO - โโโ paloma: 6.102665947946271e+23 | |
| 2025-08-29 05:19:20 - pico-train - INFO - Step 15500 -- ๐ Training Metrics | |
| 2025-08-29 05:19:20 - pico-train - INFO - โโโ Loss: 6.5459 | |
| 2025-08-29 05:19:20 - pico-train - INFO - โโโ Learning Rate: 1.88e-05 | |
| 2025-08-29 05:19:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:19:20 - pico-train - INFO - Step 15500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:19:36 - pico-train - INFO - Step 15525 -- ๐ Training Metrics | |
| 2025-08-29 05:19:36 - pico-train - INFO - โโโ Loss: 6.3772 | |
| 2025-08-29 05:19:36 - pico-train - INFO - โโโ Learning Rate: 1.86e-05 | |
| 2025-08-29 05:19:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:19:48 - pico-train - INFO - Step 15550 -- ๐ Training Metrics | |
| 2025-08-29 05:19:48 - pico-train - INFO - โโโ Loss: 6.4430 | |
| 2025-08-29 05:19:48 - pico-train - INFO - โโโ Learning Rate: 1.85e-05 | |
| 2025-08-29 05:19:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:20:01 - pico-train - INFO - Step 15575 -- ๐ Training Metrics | |
| 2025-08-29 05:20:01 - pico-train - INFO - โโโ Loss: 6.3931 | |
| 2025-08-29 05:20:01 - pico-train - INFO - โโโ Learning Rate: 1.84e-05 | |
| 2025-08-29 05:20:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:20:14 - pico-train - INFO - Step 15600 -- ๐ Training Metrics | |
| 2025-08-29 05:20:14 - pico-train - INFO - โโโ Loss: 6.4087 | |
| 2025-08-29 05:20:14 - pico-train - INFO - โโโ Learning Rate: 1.83e-05 | |
| 2025-08-29 05:20:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:20:27 - pico-train - INFO - Step 15625 -- ๐ Training Metrics | |
| 2025-08-29 05:20:27 - pico-train - INFO - โโโ Loss: 6.4743 | |
| 2025-08-29 05:20:27 - pico-train - INFO - โโโ Learning Rate: 1.82e-05 | |
| 2025-08-29 05:20:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:20:39 - pico-train - INFO - Step 15650 -- ๐ Training Metrics | |
| 2025-08-29 05:20:39 - pico-train - INFO - โโโ Loss: 6.4575 | |
| 2025-08-29 05:20:39 - pico-train - INFO - โโโ Learning Rate: 1.81e-05 | |
| 2025-08-29 05:20:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:20:52 - pico-train - INFO - Step 15675 -- ๐ Training Metrics | |
| 2025-08-29 05:20:52 - pico-train - INFO - โโโ Loss: 6.4971 | |
| 2025-08-29 05:20:52 - pico-train - INFO - โโโ Learning Rate: 1.80e-05 | |
| 2025-08-29 05:20:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:21:05 - pico-train - INFO - Step 15700 -- ๐ Training Metrics | |
| 2025-08-29 05:21:05 - pico-train - INFO - โโโ Loss: 6.4380 | |
| 2025-08-29 05:21:05 - pico-train - INFO - โโโ Learning Rate: 1.79e-05 | |
| 2025-08-29 05:21:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:21:18 - pico-train - INFO - Step 15725 -- ๐ Training Metrics | |
| 2025-08-29 05:21:18 - pico-train - INFO - โโโ Loss: 6.5071 | |
| 2025-08-29 05:21:18 - pico-train - INFO - โโโ Learning Rate: 1.78e-05 | |
| 2025-08-29 05:21:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:21:31 - pico-train - INFO - Step 15750 -- ๐ Training Metrics | |
| 2025-08-29 05:21:31 - pico-train - INFO - โโโ Loss: 6.3910 | |
| 2025-08-29 05:21:31 - pico-train - INFO - โโโ Learning Rate: 1.77e-05 | |
| 2025-08-29 05:21:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:21:43 - pico-train - INFO - Step 15775 -- ๐ Training Metrics | |
| 2025-08-29 05:21:43 - pico-train - INFO - โโโ Loss: 6.4386 | |
| 2025-08-29 05:21:43 - pico-train - INFO - โโโ Learning Rate: 1.76e-05 | |
| 2025-08-29 05:21:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:21:56 - pico-train - INFO - Step 15800 -- ๐ Training Metrics | |
| 2025-08-29 05:21:56 - pico-train - INFO - โโโ Loss: 6.4268 | |
| 2025-08-29 05:21:56 - pico-train - INFO - โโโ Learning Rate: 1.75e-05 | |
| 2025-08-29 05:21:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:22:09 - pico-train - INFO - Step 15825 -- ๐ Training Metrics | |
| 2025-08-29 05:22:09 - pico-train - INFO - โโโ Loss: 6.5534 | |
| 2025-08-29 05:22:09 - pico-train - INFO - โโโ Learning Rate: 1.74e-05 | |
| 2025-08-29 05:22:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:22:22 - pico-train - INFO - Step 15850 -- ๐ Training Metrics | |
| 2025-08-29 05:22:22 - pico-train - INFO - โโโ Loss: 6.4422 | |
| 2025-08-29 05:22:22 - pico-train - INFO - โโโ Learning Rate: 1.73e-05 | |
| 2025-08-29 05:22:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:22:35 - pico-train - INFO - Step 15875 -- ๐ Training Metrics | |
| 2025-08-29 05:22:35 - pico-train - INFO - โโโ Loss: 6.4075 | |
| 2025-08-29 05:22:35 - pico-train - INFO - โโโ Learning Rate: 1.72e-05 | |
| 2025-08-29 05:22:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:22:47 - pico-train - INFO - Step 15900 -- ๐ Training Metrics | |
| 2025-08-29 05:22:47 - pico-train - INFO - โโโ Loss: 6.4458 | |
| 2025-08-29 05:22:47 - pico-train - INFO - โโโ Learning Rate: 1.71e-05 | |
| 2025-08-29 05:22:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:23:00 - pico-train - INFO - Step 15925 -- ๐ Training Metrics | |
| 2025-08-29 05:23:00 - pico-train - INFO - โโโ Loss: 6.3855 | |
| 2025-08-29 05:23:00 - pico-train - INFO - โโโ Learning Rate: 1.70e-05 | |
| 2025-08-29 05:23:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:23:13 - pico-train - INFO - Step 15950 -- ๐ Training Metrics | |
| 2025-08-29 05:23:13 - pico-train - INFO - โโโ Loss: 6.3659 | |
| 2025-08-29 05:23:13 - pico-train - INFO - โโโ Learning Rate: 1.69e-05 | |
| 2025-08-29 05:23:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:23:26 - pico-train - INFO - Step 15975 -- ๐ Training Metrics | |
| 2025-08-29 05:23:26 - pico-train - INFO - โโโ Loss: 6.5396 | |
| 2025-08-29 05:23:26 - pico-train - INFO - โโโ Learning Rate: 1.68e-05 | |
| 2025-08-29 05:23:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:23:38 - pico-train - INFO - Step 16000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:26:48 - pico-train - INFO - Step 16000 -- ๐ Evaluation Results | |
| 2025-08-29 05:26:48 - pico-train - INFO - โโโ paloma: 8.874629945146669e+23 | |
| 2025-08-29 05:26:49 - pico-train - INFO - Step 16000 -- ๐ Training Metrics | |
| 2025-08-29 05:26:49 - pico-train - INFO - โโโ Loss: 6.4974 | |
| 2025-08-29 05:26:49 - pico-train - INFO - โโโ Learning Rate: 1.67e-05 | |
| 2025-08-29 05:26:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:26:49 - pico-train - INFO - Step 16000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:27:04 - pico-train - INFO - Step 16025 -- ๐ Training Metrics | |
| 2025-08-29 05:27:04 - pico-train - INFO - โโโ Loss: 6.4785 | |
| 2025-08-29 05:27:04 - pico-train - INFO - โโโ Learning Rate: 1.66e-05 | |
| 2025-08-29 05:27:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:27:17 - pico-train - INFO - Step 16050 -- ๐ Training Metrics | |
| 2025-08-29 05:27:17 - pico-train - INFO - โโโ Loss: 6.4341 | |
| 2025-08-29 05:27:17 - pico-train - INFO - โโโ Learning Rate: 1.65e-05 | |
| 2025-08-29 05:27:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:27:30 - pico-train - INFO - Step 16075 -- ๐ Training Metrics | |
| 2025-08-29 05:27:30 - pico-train - INFO - โโโ Loss: 6.3709 | |
| 2025-08-29 05:27:30 - pico-train - INFO - โโโ Learning Rate: 1.64e-05 | |
| 2025-08-29 05:27:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:27:43 - pico-train - INFO - Step 16100 -- ๐ Training Metrics | |
| 2025-08-29 05:27:43 - pico-train - INFO - โโโ Loss: 6.3707 | |
| 2025-08-29 05:27:43 - pico-train - INFO - โโโ Learning Rate: 1.63e-05 | |
| 2025-08-29 05:27:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:27:55 - pico-train - INFO - Step 16125 -- ๐ Training Metrics | |
| 2025-08-29 05:27:55 - pico-train - INFO - โโโ Loss: 6.4206 | |
| 2025-08-29 05:27:55 - pico-train - INFO - โโโ Learning Rate: 1.61e-05 | |
| 2025-08-29 05:27:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:28:08 - pico-train - INFO - Step 16150 -- ๐ Training Metrics | |
| 2025-08-29 05:28:08 - pico-train - INFO - โโโ Loss: 6.3970 | |
| 2025-08-29 05:28:08 - pico-train - INFO - โโโ Learning Rate: 1.60e-05 | |
| 2025-08-29 05:28:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:28:21 - pico-train - INFO - Step 16175 -- ๐ Training Metrics | |
| 2025-08-29 05:28:21 - pico-train - INFO - โโโ Loss: 6.4617 | |
| 2025-08-29 05:28:21 - pico-train - INFO - โโโ Learning Rate: 1.59e-05 | |
| 2025-08-29 05:28:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:28:34 - pico-train - INFO - Step 16200 -- ๐ Training Metrics | |
| 2025-08-29 05:28:34 - pico-train - INFO - โโโ Loss: 6.5586 | |
| 2025-08-29 05:28:34 - pico-train - INFO - โโโ Learning Rate: 1.58e-05 | |
| 2025-08-29 05:28:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:28:47 - pico-train - INFO - Step 16225 -- ๐ Training Metrics | |
| 2025-08-29 05:28:47 - pico-train - INFO - โโโ Loss: 6.4248 | |
| 2025-08-29 05:28:47 - pico-train - INFO - โโโ Learning Rate: 1.57e-05 | |
| 2025-08-29 05:28:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:28:59 - pico-train - INFO - Step 16250 -- ๐ Training Metrics | |
| 2025-08-29 05:28:59 - pico-train - INFO - โโโ Loss: 6.4204 | |
| 2025-08-29 05:28:59 - pico-train - INFO - โโโ Learning Rate: 1.56e-05 | |
| 2025-08-29 05:28:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:29:12 - pico-train - INFO - Step 16275 -- ๐ Training Metrics | |
| 2025-08-29 05:29:12 - pico-train - INFO - โโโ Loss: 6.4632 | |
| 2025-08-29 05:29:12 - pico-train - INFO - โโโ Learning Rate: 1.55e-05 | |
| 2025-08-29 05:29:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:29:25 - pico-train - INFO - Step 16300 -- ๐ Training Metrics | |
| 2025-08-29 05:29:25 - pico-train - INFO - โโโ Loss: 6.4491 | |
| 2025-08-29 05:29:25 - pico-train - INFO - โโโ Learning Rate: 1.54e-05 | |
| 2025-08-29 05:29:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:29:38 - pico-train - INFO - Step 16325 -- ๐ Training Metrics | |
| 2025-08-29 05:29:38 - pico-train - INFO - โโโ Loss: 6.4412 | |
| 2025-08-29 05:29:38 - pico-train - INFO - โโโ Learning Rate: 1.53e-05 | |
| 2025-08-29 05:29:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:29:51 - pico-train - INFO - Step 16350 -- ๐ Training Metrics | |
| 2025-08-29 05:29:51 - pico-train - INFO - โโโ Loss: 6.4144 | |
| 2025-08-29 05:29:51 - pico-train - INFO - โโโ Learning Rate: 1.52e-05 | |
| 2025-08-29 05:29:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:30:04 - pico-train - INFO - Step 16375 -- ๐ Training Metrics | |
| 2025-08-29 05:30:04 - pico-train - INFO - โโโ Loss: 6.4660 | |
| 2025-08-29 05:30:04 - pico-train - INFO - โโโ Learning Rate: 1.51e-05 | |
| 2025-08-29 05:30:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:30:17 - pico-train - INFO - Step 16400 -- ๐ Training Metrics | |
| 2025-08-29 05:30:17 - pico-train - INFO - โโโ Loss: 6.4246 | |
| 2025-08-29 05:30:17 - pico-train - INFO - โโโ Learning Rate: 1.50e-05 | |
| 2025-08-29 05:30:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:30:30 - pico-train - INFO - Step 16425 -- ๐ Training Metrics | |
| 2025-08-29 05:30:30 - pico-train - INFO - โโโ Loss: 6.4571 | |
| 2025-08-29 05:30:30 - pico-train - INFO - โโโ Learning Rate: 1.49e-05 | |
| 2025-08-29 05:30:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:30:42 - pico-train - INFO - Step 16450 -- ๐ Training Metrics | |
| 2025-08-29 05:30:42 - pico-train - INFO - โโโ Loss: 6.3903 | |
| 2025-08-29 05:30:42 - pico-train - INFO - โโโ Learning Rate: 1.48e-05 | |
| 2025-08-29 05:30:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:30:55 - pico-train - INFO - Step 16475 -- ๐ Training Metrics | |
| 2025-08-29 05:30:55 - pico-train - INFO - โโโ Loss: 6.4141 | |
| 2025-08-29 05:30:55 - pico-train - INFO - โโโ Learning Rate: 1.47e-05 | |
| 2025-08-29 05:30:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:31:07 - pico-train - INFO - Step 16500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:33:05 - pico-train - INFO - Step 16500 -- ๐ Evaluation Results | |
| 2025-08-29 05:33:05 - pico-train - INFO - โโโ paloma: 9.981607121011733e+23 | |
| 2025-08-29 05:33:06 - pico-train - INFO - Step 16500 -- ๐ Training Metrics | |
| 2025-08-29 05:33:06 - pico-train - INFO - โโโ Loss: 6.4467 | |
| 2025-08-29 05:33:06 - pico-train - INFO - โโโ Learning Rate: 1.46e-05 | |
| 2025-08-29 05:33:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:33:06 - pico-train - INFO - Step 16500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:33:21 - pico-train - INFO - Step 16525 -- ๐ Training Metrics | |
| 2025-08-29 05:33:21 - pico-train - INFO - โโโ Loss: 6.3560 | |
| 2025-08-29 05:33:21 - pico-train - INFO - โโโ Learning Rate: 1.45e-05 | |
| 2025-08-29 05:33:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:33:34 - pico-train - INFO - Step 16550 -- ๐ Training Metrics | |
| 2025-08-29 05:33:34 - pico-train - INFO - โโโ Loss: 6.4049 | |
| 2025-08-29 05:33:34 - pico-train - INFO - โโโ Learning Rate: 1.44e-05 | |
| 2025-08-29 05:33:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:33:47 - pico-train - INFO - Step 16575 -- ๐ Training Metrics | |
| 2025-08-29 05:33:47 - pico-train - INFO - โโโ Loss: 6.4103 | |
| 2025-08-29 05:33:47 - pico-train - INFO - โโโ Learning Rate: 1.43e-05 | |
| 2025-08-29 05:33:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:34:00 - pico-train - INFO - Step 16600 -- ๐ Training Metrics | |
| 2025-08-29 05:34:00 - pico-train - INFO - โโโ Loss: 6.4282 | |
| 2025-08-29 05:34:00 - pico-train - INFO - โโโ Learning Rate: 1.42e-05 | |
| 2025-08-29 05:34:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:34:12 - pico-train - INFO - Step 16625 -- ๐ Training Metrics | |
| 2025-08-29 05:34:12 - pico-train - INFO - โโโ Loss: 6.5397 | |
| 2025-08-29 05:34:12 - pico-train - INFO - โโโ Learning Rate: 1.41e-05 | |
| 2025-08-29 05:34:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:34:25 - pico-train - INFO - Step 16650 -- ๐ Training Metrics | |
| 2025-08-29 05:34:25 - pico-train - INFO - โโโ Loss: 6.3862 | |
| 2025-08-29 05:34:25 - pico-train - INFO - โโโ Learning Rate: 1.40e-05 | |
| 2025-08-29 05:34:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:34:38 - pico-train - INFO - Step 16675 -- ๐ Training Metrics | |
| 2025-08-29 05:34:38 - pico-train - INFO - โโโ Loss: 6.4291 | |
| 2025-08-29 05:34:38 - pico-train - INFO - โโโ Learning Rate: 1.39e-05 | |
| 2025-08-29 05:34:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:34:51 - pico-train - INFO - Step 16700 -- ๐ Training Metrics | |
| 2025-08-29 05:34:51 - pico-train - INFO - โโโ Loss: 6.4330 | |
| 2025-08-29 05:34:51 - pico-train - INFO - โโโ Learning Rate: 1.38e-05 | |
| 2025-08-29 05:34:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:35:04 - pico-train - INFO - Step 16725 -- ๐ Training Metrics | |
| 2025-08-29 05:35:04 - pico-train - INFO - โโโ Loss: 6.3934 | |
| 2025-08-29 05:35:04 - pico-train - INFO - โโโ Learning Rate: 1.36e-05 | |
| 2025-08-29 05:35:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:35:16 - pico-train - INFO - Step 16750 -- ๐ Training Metrics | |
| 2025-08-29 05:35:16 - pico-train - INFO - โโโ Loss: 6.4042 | |
| 2025-08-29 05:35:16 - pico-train - INFO - โโโ Learning Rate: 1.35e-05 | |
| 2025-08-29 05:35:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:35:29 - pico-train - INFO - Step 16775 -- ๐ Training Metrics | |
| 2025-08-29 05:35:29 - pico-train - INFO - โโโ Loss: 6.4187 | |
| 2025-08-29 05:35:29 - pico-train - INFO - โโโ Learning Rate: 1.34e-05 | |
| 2025-08-29 05:35:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:35:42 - pico-train - INFO - Step 16800 -- ๐ Training Metrics | |
| 2025-08-29 05:35:42 - pico-train - INFO - โโโ Loss: 6.4455 | |
| 2025-08-29 05:35:42 - pico-train - INFO - โโโ Learning Rate: 1.33e-05 | |
| 2025-08-29 05:35:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:35:55 - pico-train - INFO - Step 16825 -- ๐ Training Metrics | |
| 2025-08-29 05:35:55 - pico-train - INFO - โโโ Loss: 6.4240 | |
| 2025-08-29 05:35:55 - pico-train - INFO - โโโ Learning Rate: 1.32e-05 | |
| 2025-08-29 05:35:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:36:07 - pico-train - INFO - Step 16850 -- ๐ Training Metrics | |
| 2025-08-29 05:36:07 - pico-train - INFO - โโโ Loss: 6.4491 | |
| 2025-08-29 05:36:07 - pico-train - INFO - โโโ Learning Rate: 1.31e-05 | |
| 2025-08-29 05:36:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:36:20 - pico-train - INFO - Step 16875 -- ๐ Training Metrics | |
| 2025-08-29 05:36:20 - pico-train - INFO - โโโ Loss: 6.3993 | |
| 2025-08-29 05:36:20 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 | |
| 2025-08-29 05:36:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:36:33 - pico-train - INFO - Step 16900 -- ๐ Training Metrics | |
| 2025-08-29 05:36:33 - pico-train - INFO - โโโ Loss: 6.4393 | |
| 2025-08-29 05:36:33 - pico-train - INFO - โโโ Learning Rate: 1.29e-05 | |
| 2025-08-29 05:36:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:36:46 - pico-train - INFO - Step 16925 -- ๐ Training Metrics | |
| 2025-08-29 05:36:46 - pico-train - INFO - โโโ Loss: 6.3705 | |
| 2025-08-29 05:36:46 - pico-train - INFO - โโโ Learning Rate: 1.28e-05 | |
| 2025-08-29 05:36:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:36:58 - pico-train - INFO - Step 16950 -- ๐ Training Metrics | |
| 2025-08-29 05:36:58 - pico-train - INFO - โโโ Loss: 6.4404 | |
| 2025-08-29 05:36:58 - pico-train - INFO - โโโ Learning Rate: 1.27e-05 | |
| 2025-08-29 05:36:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:37:11 - pico-train - INFO - Step 16975 -- ๐ Training Metrics | |
| 2025-08-29 05:37:11 - pico-train - INFO - โโโ Loss: 6.4507 | |
| 2025-08-29 05:37:11 - pico-train - INFO - โโโ Learning Rate: 1.26e-05 | |
| 2025-08-29 05:37:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:37:23 - pico-train - INFO - Step 17000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:39:26 - pico-train - INFO - Step 17000 -- ๐ Evaluation Results | |
| 2025-08-29 05:39:26 - pico-train - INFO - โโโ paloma: 1.1075349421086151e+24 | |
| 2025-08-29 05:39:28 - pico-train - INFO - Step 17000 -- ๐ Training Metrics | |
| 2025-08-29 05:39:28 - pico-train - INFO - โโโ Loss: 6.3821 | |
| 2025-08-29 05:39:28 - pico-train - INFO - โโโ Learning Rate: 1.25e-05 | |
| 2025-08-29 05:39:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:39:28 - pico-train - INFO - Step 17000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:39:43 - pico-train - INFO - Step 17025 -- ๐ Training Metrics | |
| 2025-08-29 05:39:43 - pico-train - INFO - โโโ Loss: 6.4234 | |
| 2025-08-29 05:39:43 - pico-train - INFO - โโโ Learning Rate: 1.24e-05 | |
| 2025-08-29 05:39:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:39:56 - pico-train - INFO - Step 17050 -- ๐ Training Metrics | |
| 2025-08-29 05:39:56 - pico-train - INFO - โโโ Loss: 6.4235 | |
| 2025-08-29 05:39:56 - pico-train - INFO - โโโ Learning Rate: 1.23e-05 | |
| 2025-08-29 05:39:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:40:09 - pico-train - INFO - Step 17075 -- ๐ Training Metrics | |
| 2025-08-29 05:40:09 - pico-train - INFO - โโโ Loss: 6.4856 | |
| 2025-08-29 05:40:09 - pico-train - INFO - โโโ Learning Rate: 1.22e-05 | |
| 2025-08-29 05:40:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:40:22 - pico-train - INFO - Step 17100 -- ๐ Training Metrics | |
| 2025-08-29 05:40:22 - pico-train - INFO - โโโ Loss: 6.4877 | |
| 2025-08-29 05:40:22 - pico-train - INFO - โโโ Learning Rate: 1.21e-05 | |
| 2025-08-29 05:40:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:40:34 - pico-train - INFO - Step 17125 -- ๐ Training Metrics | |
| 2025-08-29 05:40:34 - pico-train - INFO - โโโ Loss: 6.3683 | |
| 2025-08-29 05:40:34 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 | |
| 2025-08-29 05:40:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:40:47 - pico-train - INFO - Step 17150 -- ๐ Training Metrics | |
| 2025-08-29 05:40:47 - pico-train - INFO - โโโ Loss: 6.4225 | |
| 2025-08-29 05:40:47 - pico-train - INFO - โโโ Learning Rate: 1.19e-05 | |
| 2025-08-29 05:40:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:41:00 - pico-train - INFO - Step 17175 -- ๐ Training Metrics | |
| 2025-08-29 05:41:00 - pico-train - INFO - โโโ Loss: 6.2573 | |
| 2025-08-29 05:41:00 - pico-train - INFO - โโโ Learning Rate: 1.18e-05 | |
| 2025-08-29 05:41:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:41:13 - pico-train - INFO - Step 17200 -- ๐ Training Metrics | |
| 2025-08-29 05:41:13 - pico-train - INFO - โโโ Loss: 6.3946 | |
| 2025-08-29 05:41:13 - pico-train - INFO - โโโ Learning Rate: 1.17e-05 | |
| 2025-08-29 05:41:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:41:25 - pico-train - INFO - Step 17225 -- ๐ Training Metrics | |
| 2025-08-29 05:41:25 - pico-train - INFO - โโโ Loss: 6.4607 | |
| 2025-08-29 05:41:25 - pico-train - INFO - โโโ Learning Rate: 1.16e-05 | |
| 2025-08-29 05:41:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:41:38 - pico-train - INFO - Step 17250 -- ๐ Training Metrics | |
| 2025-08-29 05:41:38 - pico-train - INFO - โโโ Loss: 6.4407 | |
| 2025-08-29 05:41:38 - pico-train - INFO - โโโ Learning Rate: 1.15e-05 | |
| 2025-08-29 05:41:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:41:51 - pico-train - INFO - Step 17275 -- ๐ Training Metrics | |
| 2025-08-29 05:41:51 - pico-train - INFO - โโโ Loss: 6.4333 | |
| 2025-08-29 05:41:51 - pico-train - INFO - โโโ Learning Rate: 1.14e-05 | |
| 2025-08-29 05:41:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:42:04 - pico-train - INFO - Step 17300 -- ๐ Training Metrics | |
| 2025-08-29 05:42:04 - pico-train - INFO - โโโ Loss: 6.3782 | |
| 2025-08-29 05:42:04 - pico-train - INFO - โโโ Learning Rate: 1.13e-05 | |
| 2025-08-29 05:42:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:42:17 - pico-train - INFO - Step 17325 -- ๐ Training Metrics | |
| 2025-08-29 05:42:17 - pico-train - INFO - โโโ Loss: 6.3665 | |
| 2025-08-29 05:42:17 - pico-train - INFO - โโโ Learning Rate: 1.11e-05 | |
| 2025-08-29 05:42:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:42:29 - pico-train - INFO - Step 17350 -- ๐ Training Metrics | |
| 2025-08-29 05:42:29 - pico-train - INFO - โโโ Loss: 6.4329 | |
| 2025-08-29 05:42:29 - pico-train - INFO - โโโ Learning Rate: 1.10e-05 | |
| 2025-08-29 05:42:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:42:43 - pico-train - INFO - Step 17375 -- ๐ Training Metrics | |
| 2025-08-29 05:42:43 - pico-train - INFO - โโโ Loss: 6.5107 | |
| 2025-08-29 05:42:43 - pico-train - INFO - โโโ Learning Rate: 1.09e-05 | |
| 2025-08-29 05:42:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:42:55 - pico-train - INFO - Step 17400 -- ๐ Training Metrics | |
| 2025-08-29 05:42:55 - pico-train - INFO - โโโ Loss: 6.5076 | |
| 2025-08-29 05:42:55 - pico-train - INFO - โโโ Learning Rate: 1.08e-05 | |
| 2025-08-29 05:42:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:43:08 - pico-train - INFO - Step 17425 -- ๐ Training Metrics | |
| 2025-08-29 05:43:08 - pico-train - INFO - โโโ Loss: 6.4936 | |
| 2025-08-29 05:43:08 - pico-train - INFO - โโโ Learning Rate: 1.07e-05 | |
| 2025-08-29 05:43:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:43:21 - pico-train - INFO - Step 17450 -- ๐ Training Metrics | |
| 2025-08-29 05:43:21 - pico-train - INFO - โโโ Loss: 6.4119 | |
| 2025-08-29 05:43:21 - pico-train - INFO - โโโ Learning Rate: 1.06e-05 | |
| 2025-08-29 05:43:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:43:34 - pico-train - INFO - Step 17475 -- ๐ Training Metrics | |
| 2025-08-29 05:43:34 - pico-train - INFO - โโโ Loss: 6.4032 | |
| 2025-08-29 05:43:34 - pico-train - INFO - โโโ Learning Rate: 1.05e-05 | |
| 2025-08-29 05:43:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:43:46 - pico-train - INFO - Step 17500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:45:54 - pico-train - INFO - Step 17500 -- ๐ Evaluation Results | |
| 2025-08-29 05:45:54 - pico-train - INFO - โโโ paloma: 1.1064948792133394e+24 | |
| 2025-08-29 05:45:56 - pico-train - INFO - Step 17500 -- ๐ Training Metrics | |
| 2025-08-29 05:45:56 - pico-train - INFO - โโโ Loss: 6.3962 | |
| 2025-08-29 05:45:56 - pico-train - INFO - โโโ Learning Rate: 1.04e-05 | |
| 2025-08-29 05:45:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:45:56 - pico-train - INFO - Step 17500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:46:11 - pico-train - INFO - Step 17525 -- ๐ Training Metrics | |
| 2025-08-29 05:46:11 - pico-train - INFO - โโโ Loss: 6.4288 | |
| 2025-08-29 05:46:11 - pico-train - INFO - โโโ Learning Rate: 1.03e-05 | |
| 2025-08-29 05:46:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:46:24 - pico-train - INFO - Step 17550 -- ๐ Training Metrics | |
| 2025-08-29 05:46:24 - pico-train - INFO - โโโ Loss: 6.4021 | |
| 2025-08-29 05:46:24 - pico-train - INFO - โโโ Learning Rate: 1.02e-05 | |
| 2025-08-29 05:46:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:46:37 - pico-train - INFO - Step 17575 -- ๐ Training Metrics | |
| 2025-08-29 05:46:37 - pico-train - INFO - โโโ Loss: 6.3670 | |
| 2025-08-29 05:46:37 - pico-train - INFO - โโโ Learning Rate: 1.01e-05 | |
| 2025-08-29 05:46:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:46:49 - pico-train - INFO - Step 17600 -- ๐ Training Metrics | |
| 2025-08-29 05:46:49 - pico-train - INFO - โโโ Loss: 6.3904 | |
| 2025-08-29 05:46:49 - pico-train - INFO - โโโ Learning Rate: 1.00e-05 | |
| 2025-08-29 05:46:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:47:02 - pico-train - INFO - Step 17625 -- ๐ Training Metrics | |
| 2025-08-29 05:47:02 - pico-train - INFO - โโโ Loss: 6.5059 | |
| 2025-08-29 05:47:02 - pico-train - INFO - โโโ Learning Rate: 9.90e-06 | |
| 2025-08-29 05:47:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:47:15 - pico-train - INFO - Step 17650 -- ๐ Training Metrics | |
| 2025-08-29 05:47:15 - pico-train - INFO - โโโ Loss: 6.4225 | |
| 2025-08-29 05:47:15 - pico-train - INFO - โโโ Learning Rate: 9.79e-06 | |
| 2025-08-29 05:47:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:47:28 - pico-train - INFO - Step 17675 -- ๐ Training Metrics | |
| 2025-08-29 05:47:28 - pico-train - INFO - โโโ Loss: 6.4422 | |
| 2025-08-29 05:47:28 - pico-train - INFO - โโโ Learning Rate: 9.69e-06 | |
| 2025-08-29 05:47:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:47:41 - pico-train - INFO - Step 17700 -- ๐ Training Metrics | |
| 2025-08-29 05:47:41 - pico-train - INFO - โโโ Loss: 6.4570 | |
| 2025-08-29 05:47:41 - pico-train - INFO - โโโ Learning Rate: 9.58e-06 | |
| 2025-08-29 05:47:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:47:54 - pico-train - INFO - Step 17725 -- ๐ Training Metrics | |
| 2025-08-29 05:47:54 - pico-train - INFO - โโโ Loss: 6.4475 | |
| 2025-08-29 05:47:54 - pico-train - INFO - โโโ Learning Rate: 9.48e-06 | |
| 2025-08-29 05:47:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:48:06 - pico-train - INFO - Step 17750 -- ๐ Training Metrics | |
| 2025-08-29 05:48:06 - pico-train - INFO - โโโ Loss: 6.3786 | |
| 2025-08-29 05:48:06 - pico-train - INFO - โโโ Learning Rate: 9.38e-06 | |
| 2025-08-29 05:48:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:48:19 - pico-train - INFO - Step 17775 -- ๐ Training Metrics | |
| 2025-08-29 05:48:19 - pico-train - INFO - โโโ Loss: 6.4145 | |
| 2025-08-29 05:48:19 - pico-train - INFO - โโโ Learning Rate: 9.27e-06 | |
| 2025-08-29 05:48:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:48:32 - pico-train - INFO - Step 17800 -- ๐ Training Metrics | |
| 2025-08-29 05:48:32 - pico-train - INFO - โโโ Loss: 6.3543 | |
| 2025-08-29 05:48:32 - pico-train - INFO - โโโ Learning Rate: 9.17e-06 | |
| 2025-08-29 05:48:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:48:45 - pico-train - INFO - Step 17825 -- ๐ Training Metrics | |
| 2025-08-29 05:48:45 - pico-train - INFO - โโโ Loss: 6.5116 | |
| 2025-08-29 05:48:45 - pico-train - INFO - โโโ Learning Rate: 9.06e-06 | |
| 2025-08-29 05:48:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:48:58 - pico-train - INFO - Step 17850 -- ๐ Training Metrics | |
| 2025-08-29 05:48:58 - pico-train - INFO - โโโ Loss: 6.4101 | |
| 2025-08-29 05:48:58 - pico-train - INFO - โโโ Learning Rate: 8.96e-06 | |
| 2025-08-29 05:48:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:49:11 - pico-train - INFO - Step 17875 -- ๐ Training Metrics | |
| 2025-08-29 05:49:11 - pico-train - INFO - โโโ Loss: 6.4014 | |
| 2025-08-29 05:49:11 - pico-train - INFO - โโโ Learning Rate: 8.85e-06 | |
| 2025-08-29 05:49:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:49:24 - pico-train - INFO - Step 17900 -- ๐ Training Metrics | |
| 2025-08-29 05:49:24 - pico-train - INFO - โโโ Loss: 6.4216 | |
| 2025-08-29 05:49:24 - pico-train - INFO - โโโ Learning Rate: 8.75e-06 | |
| 2025-08-29 05:49:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:49:37 - pico-train - INFO - Step 17925 -- ๐ Training Metrics | |
| 2025-08-29 05:49:37 - pico-train - INFO - โโโ Loss: 6.4539 | |
| 2025-08-29 05:49:37 - pico-train - INFO - โโโ Learning Rate: 8.65e-06 | |
| 2025-08-29 05:49:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:49:49 - pico-train - INFO - Step 17950 -- ๐ Training Metrics | |
| 2025-08-29 05:49:49 - pico-train - INFO - โโโ Loss: 6.4205 | |
| 2025-08-29 05:49:49 - pico-train - INFO - โโโ Learning Rate: 8.54e-06 | |
| 2025-08-29 05:49:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:50:02 - pico-train - INFO - Step 17975 -- ๐ Training Metrics | |
| 2025-08-29 05:50:02 - pico-train - INFO - โโโ Loss: 6.3865 | |
| 2025-08-29 05:50:02 - pico-train - INFO - โโโ Learning Rate: 8.44e-06 | |
| 2025-08-29 05:50:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:50:14 - pico-train - INFO - Step 18000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:52:13 - pico-train - INFO - Step 18000 -- ๐ Evaluation Results | |
| 2025-08-29 05:52:13 - pico-train - INFO - โโโ paloma: 1.340918782615931e+24 | |
| 2025-08-29 05:52:15 - pico-train - INFO - Step 18000 -- ๐ Training Metrics | |
| 2025-08-29 05:52:15 - pico-train - INFO - โโโ Loss: 6.4347 | |
| 2025-08-29 05:52:15 - pico-train - INFO - โโโ Learning Rate: 8.33e-06 | |
| 2025-08-29 05:52:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:52:15 - pico-train - INFO - Step 18000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:52:30 - pico-train - INFO - Step 18025 -- ๐ Training Metrics | |
| 2025-08-29 05:52:30 - pico-train - INFO - โโโ Loss: 6.4313 | |
| 2025-08-29 05:52:30 - pico-train - INFO - โโโ Learning Rate: 8.23e-06 | |
| 2025-08-29 05:52:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:52:43 - pico-train - INFO - Step 18050 -- ๐ Training Metrics | |
| 2025-08-29 05:52:43 - pico-train - INFO - โโโ Loss: 6.3868 | |
| 2025-08-29 05:52:43 - pico-train - INFO - โโโ Learning Rate: 8.13e-06 | |
| 2025-08-29 05:52:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:52:56 - pico-train - INFO - Step 18075 -- ๐ Training Metrics | |
| 2025-08-29 05:52:56 - pico-train - INFO - โโโ Loss: 6.3703 | |
| 2025-08-29 05:52:56 - pico-train - INFO - โโโ Learning Rate: 8.02e-06 | |
| 2025-08-29 05:52:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:53:09 - pico-train - INFO - Step 18100 -- ๐ Training Metrics | |
| 2025-08-29 05:53:09 - pico-train - INFO - โโโ Loss: 6.3747 | |
| 2025-08-29 05:53:09 - pico-train - INFO - โโโ Learning Rate: 7.92e-06 | |
| 2025-08-29 05:53:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:53:22 - pico-train - INFO - Step 18125 -- ๐ Training Metrics | |
| 2025-08-29 05:53:22 - pico-train - INFO - โโโ Loss: 6.4228 | |
| 2025-08-29 05:53:22 - pico-train - INFO - โโโ Learning Rate: 7.81e-06 | |
| 2025-08-29 05:53:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:53:34 - pico-train - INFO - Step 18150 -- ๐ Training Metrics | |
| 2025-08-29 05:53:34 - pico-train - INFO - โโโ Loss: 6.3490 | |
| 2025-08-29 05:53:34 - pico-train - INFO - โโโ Learning Rate: 7.71e-06 | |
| 2025-08-29 05:53:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:53:47 - pico-train - INFO - Step 18175 -- ๐ Training Metrics | |
| 2025-08-29 05:53:47 - pico-train - INFO - โโโ Loss: 6.4522 | |
| 2025-08-29 05:53:47 - pico-train - INFO - โโโ Learning Rate: 7.60e-06 | |
| 2025-08-29 05:53:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:54:00 - pico-train - INFO - Step 18200 -- ๐ Training Metrics | |
| 2025-08-29 05:54:00 - pico-train - INFO - โโโ Loss: 6.3354 | |
| 2025-08-29 05:54:00 - pico-train - INFO - โโโ Learning Rate: 7.50e-06 | |
| 2025-08-29 05:54:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:54:12 - pico-train - INFO - Step 18225 -- ๐ Training Metrics | |
| 2025-08-29 05:54:12 - pico-train - INFO - โโโ Loss: 6.4663 | |
| 2025-08-29 05:54:12 - pico-train - INFO - โโโ Learning Rate: 7.40e-06 | |
| 2025-08-29 05:54:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:54:25 - pico-train - INFO - Step 18250 -- ๐ Training Metrics | |
| 2025-08-29 05:54:25 - pico-train - INFO - โโโ Loss: 6.4155 | |
| 2025-08-29 05:54:25 - pico-train - INFO - โโโ Learning Rate: 7.29e-06 | |
| 2025-08-29 05:54:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:54:38 - pico-train - INFO - Step 18275 -- ๐ Training Metrics | |
| 2025-08-29 05:54:38 - pico-train - INFO - โโโ Loss: 6.4584 | |
| 2025-08-29 05:54:38 - pico-train - INFO - โโโ Learning Rate: 7.19e-06 | |
| 2025-08-29 05:54:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:54:51 - pico-train - INFO - Step 18300 -- ๐ Training Metrics | |
| 2025-08-29 05:54:51 - pico-train - INFO - โโโ Loss: 6.3637 | |
| 2025-08-29 05:54:51 - pico-train - INFO - โโโ Learning Rate: 7.08e-06 | |
| 2025-08-29 05:54:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:55:03 - pico-train - INFO - Step 18325 -- ๐ Training Metrics | |
| 2025-08-29 05:55:03 - pico-train - INFO - โโโ Loss: 6.3583 | |
| 2025-08-29 05:55:03 - pico-train - INFO - โโโ Learning Rate: 6.98e-06 | |
| 2025-08-29 05:55:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:55:16 - pico-train - INFO - Step 18350 -- ๐ Training Metrics | |
| 2025-08-29 05:55:16 - pico-train - INFO - โโโ Loss: 6.4469 | |
| 2025-08-29 05:55:16 - pico-train - INFO - โโโ Learning Rate: 6.88e-06 | |
| 2025-08-29 05:55:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:55:29 - pico-train - INFO - Step 18375 -- ๐ Training Metrics | |
| 2025-08-29 05:55:29 - pico-train - INFO - โโโ Loss: 6.3768 | |
| 2025-08-29 05:55:29 - pico-train - INFO - โโโ Learning Rate: 6.77e-06 | |
| 2025-08-29 05:55:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:55:42 - pico-train - INFO - Step 18400 -- ๐ Training Metrics | |
| 2025-08-29 05:55:42 - pico-train - INFO - โโโ Loss: 6.3179 | |
| 2025-08-29 05:55:42 - pico-train - INFO - โโโ Learning Rate: 6.67e-06 | |
| 2025-08-29 05:55:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:55:54 - pico-train - INFO - Step 18425 -- ๐ Training Metrics | |
| 2025-08-29 05:55:54 - pico-train - INFO - โโโ Loss: 6.4046 | |
| 2025-08-29 05:55:54 - pico-train - INFO - โโโ Learning Rate: 6.56e-06 | |
| 2025-08-29 05:55:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:56:07 - pico-train - INFO - Step 18450 -- ๐ Training Metrics | |
| 2025-08-29 05:56:07 - pico-train - INFO - โโโ Loss: 6.3435 | |
| 2025-08-29 05:56:07 - pico-train - INFO - โโโ Learning Rate: 6.46e-06 | |
| 2025-08-29 05:56:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:56:20 - pico-train - INFO - Step 18475 -- ๐ Training Metrics | |
| 2025-08-29 05:56:20 - pico-train - INFO - โโโ Loss: 6.3454 | |
| 2025-08-29 05:56:20 - pico-train - INFO - โโโ Learning Rate: 6.35e-06 | |
| 2025-08-29 05:56:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:56:32 - pico-train - INFO - Step 18500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 05:58:25 - pico-train - INFO - Step 18500 -- ๐ Evaluation Results | |
| 2025-08-29 05:58:25 - pico-train - INFO - โโโ paloma: 1.4325241176004668e+24 | |
| 2025-08-29 05:58:27 - pico-train - INFO - Step 18500 -- ๐ Training Metrics | |
| 2025-08-29 05:58:27 - pico-train - INFO - โโโ Loss: 6.3922 | |
| 2025-08-29 05:58:27 - pico-train - INFO - โโโ Learning Rate: 6.25e-06 | |
| 2025-08-29 05:58:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:58:27 - pico-train - INFO - Step 18500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 05:58:41 - pico-train - INFO - Step 18525 -- ๐ Training Metrics | |
| 2025-08-29 05:58:41 - pico-train - INFO - โโโ Loss: 6.3459 | |
| 2025-08-29 05:58:41 - pico-train - INFO - โโโ Learning Rate: 6.15e-06 | |
| 2025-08-29 05:58:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:58:54 - pico-train - INFO - Step 18550 -- ๐ Training Metrics | |
| 2025-08-29 05:58:54 - pico-train - INFO - โโโ Loss: 6.3591 | |
| 2025-08-29 05:58:54 - pico-train - INFO - โโโ Learning Rate: 6.04e-06 | |
| 2025-08-29 05:58:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:59:07 - pico-train - INFO - Step 18575 -- ๐ Training Metrics | |
| 2025-08-29 05:59:07 - pico-train - INFO - โโโ Loss: 6.4337 | |
| 2025-08-29 05:59:07 - pico-train - INFO - โโโ Learning Rate: 5.94e-06 | |
| 2025-08-29 05:59:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:59:20 - pico-train - INFO - Step 18600 -- ๐ Training Metrics | |
| 2025-08-29 05:59:20 - pico-train - INFO - โโโ Loss: 6.3962 | |
| 2025-08-29 05:59:20 - pico-train - INFO - โโโ Learning Rate: 5.83e-06 | |
| 2025-08-29 05:59:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:59:32 - pico-train - INFO - Step 18625 -- ๐ Training Metrics | |
| 2025-08-29 05:59:32 - pico-train - INFO - โโโ Loss: 6.3425 | |
| 2025-08-29 05:59:32 - pico-train - INFO - โโโ Learning Rate: 5.73e-06 | |
| 2025-08-29 05:59:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:59:45 - pico-train - INFO - Step 18650 -- ๐ Training Metrics | |
| 2025-08-29 05:59:45 - pico-train - INFO - โโโ Loss: 6.4022 | |
| 2025-08-29 05:59:45 - pico-train - INFO - โโโ Learning Rate: 5.63e-06 | |
| 2025-08-29 05:59:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 05:59:58 - pico-train - INFO - Step 18675 -- ๐ Training Metrics | |
| 2025-08-29 05:59:58 - pico-train - INFO - โโโ Loss: 6.4513 | |
| 2025-08-29 05:59:58 - pico-train - INFO - โโโ Learning Rate: 5.52e-06 | |
| 2025-08-29 05:59:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:00:11 - pico-train - INFO - Step 18700 -- ๐ Training Metrics | |
| 2025-08-29 06:00:11 - pico-train - INFO - โโโ Loss: 6.4284 | |
| 2025-08-29 06:00:11 - pico-train - INFO - โโโ Learning Rate: 5.42e-06 | |
| 2025-08-29 06:00:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:00:23 - pico-train - INFO - Step 18725 -- ๐ Training Metrics | |
| 2025-08-29 06:00:23 - pico-train - INFO - โโโ Loss: 6.3879 | |
| 2025-08-29 06:00:23 - pico-train - INFO - โโโ Learning Rate: 5.31e-06 | |
| 2025-08-29 06:00:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:00:36 - pico-train - INFO - Step 18750 -- ๐ Training Metrics | |
| 2025-08-29 06:00:36 - pico-train - INFO - โโโ Loss: 6.4009 | |
| 2025-08-29 06:00:36 - pico-train - INFO - โโโ Learning Rate: 5.21e-06 | |
| 2025-08-29 06:00:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:00:49 - pico-train - INFO - Step 18775 -- ๐ Training Metrics | |
| 2025-08-29 06:00:49 - pico-train - INFO - โโโ Loss: 6.3713 | |
| 2025-08-29 06:00:49 - pico-train - INFO - โโโ Learning Rate: 5.10e-06 | |
| 2025-08-29 06:00:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:01:01 - pico-train - INFO - Step 18800 -- ๐ Training Metrics | |
| 2025-08-29 06:01:01 - pico-train - INFO - โโโ Loss: 6.3752 | |
| 2025-08-29 06:01:01 - pico-train - INFO - โโโ Learning Rate: 5.00e-06 | |
| 2025-08-29 06:01:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:01:14 - pico-train - INFO - Step 18825 -- ๐ Training Metrics | |
| 2025-08-29 06:01:14 - pico-train - INFO - โโโ Loss: 6.4265 | |
| 2025-08-29 06:01:14 - pico-train - INFO - โโโ Learning Rate: 4.90e-06 | |
| 2025-08-29 06:01:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:01:27 - pico-train - INFO - Step 18850 -- ๐ Training Metrics | |
| 2025-08-29 06:01:27 - pico-train - INFO - โโโ Loss: 6.3709 | |
| 2025-08-29 06:01:27 - pico-train - INFO - โโโ Learning Rate: 4.79e-06 | |
| 2025-08-29 06:01:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:01:40 - pico-train - INFO - Step 18875 -- ๐ Training Metrics | |
| 2025-08-29 06:01:40 - pico-train - INFO - โโโ Loss: 6.3316 | |
| 2025-08-29 06:01:40 - pico-train - INFO - โโโ Learning Rate: 4.69e-06 | |
| 2025-08-29 06:01:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:01:53 - pico-train - INFO - Step 18900 -- ๐ Training Metrics | |
| 2025-08-29 06:01:53 - pico-train - INFO - โโโ Loss: 6.4479 | |
| 2025-08-29 06:01:53 - pico-train - INFO - โโโ Learning Rate: 4.58e-06 | |
| 2025-08-29 06:01:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:02:05 - pico-train - INFO - Step 18925 -- ๐ Training Metrics | |
| 2025-08-29 06:02:05 - pico-train - INFO - โโโ Loss: 6.4247 | |
| 2025-08-29 06:02:05 - pico-train - INFO - โโโ Learning Rate: 4.48e-06 | |
| 2025-08-29 06:02:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:02:18 - pico-train - INFO - Step 18950 -- ๐ Training Metrics | |
| 2025-08-29 06:02:18 - pico-train - INFO - โโโ Loss: 6.4126 | |
| 2025-08-29 06:02:18 - pico-train - INFO - โโโ Learning Rate: 4.37e-06 | |
| 2025-08-29 06:02:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:02:31 - pico-train - INFO - Step 18975 -- ๐ Training Metrics | |
| 2025-08-29 06:02:31 - pico-train - INFO - โโโ Loss: 6.3489 | |
| 2025-08-29 06:02:31 - pico-train - INFO - โโโ Learning Rate: 4.27e-06 | |
| 2025-08-29 06:02:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:02:43 - pico-train - INFO - Step 19000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 06:04:38 - pico-train - INFO - Step 19000 -- ๐ Evaluation Results | |
| 2025-08-29 06:04:38 - pico-train - INFO - โโโ paloma: 1.5360601246943468e+24 | |
| 2025-08-29 06:04:40 - pico-train - INFO - Step 19000 -- ๐ Training Metrics | |
| 2025-08-29 06:04:40 - pico-train - INFO - โโโ Loss: 6.3250 | |
| 2025-08-29 06:04:40 - pico-train - INFO - โโโ Learning Rate: 4.17e-06 | |
| 2025-08-29 06:04:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:04:40 - pico-train - INFO - Step 19000 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 06:04:56 - pico-train - INFO - Step 19025 -- ๐ Training Metrics | |
| 2025-08-29 06:04:56 - pico-train - INFO - โโโ Loss: 6.3306 | |
| 2025-08-29 06:04:56 - pico-train - INFO - โโโ Learning Rate: 4.06e-06 | |
| 2025-08-29 06:04:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:05:08 - pico-train - INFO - Step 19050 -- ๐ Training Metrics | |
| 2025-08-29 06:05:08 - pico-train - INFO - โโโ Loss: 6.3870 | |
| 2025-08-29 06:05:08 - pico-train - INFO - โโโ Learning Rate: 3.96e-06 | |
| 2025-08-29 06:05:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:05:21 - pico-train - INFO - Step 19075 -- ๐ Training Metrics | |
| 2025-08-29 06:05:21 - pico-train - INFO - โโโ Loss: 6.4133 | |
| 2025-08-29 06:05:21 - pico-train - INFO - โโโ Learning Rate: 3.85e-06 | |
| 2025-08-29 06:05:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:05:34 - pico-train - INFO - Step 19100 -- ๐ Training Metrics | |
| 2025-08-29 06:05:34 - pico-train - INFO - โโโ Loss: 6.3340 | |
| 2025-08-29 06:05:34 - pico-train - INFO - โโโ Learning Rate: 3.75e-06 | |
| 2025-08-29 06:05:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:05:47 - pico-train - INFO - Step 19125 -- ๐ Training Metrics | |
| 2025-08-29 06:05:47 - pico-train - INFO - โโโ Loss: 6.3034 | |
| 2025-08-29 06:05:47 - pico-train - INFO - โโโ Learning Rate: 3.65e-06 | |
| 2025-08-29 06:05:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:06:00 - pico-train - INFO - Step 19150 -- ๐ Training Metrics | |
| 2025-08-29 06:06:00 - pico-train - INFO - โโโ Loss: 6.4097 | |
| 2025-08-29 06:06:00 - pico-train - INFO - โโโ Learning Rate: 3.54e-06 | |
| 2025-08-29 06:06:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:06:13 - pico-train - INFO - Step 19175 -- ๐ Training Metrics | |
| 2025-08-29 06:06:13 - pico-train - INFO - โโโ Loss: 6.4420 | |
| 2025-08-29 06:06:13 - pico-train - INFO - โโโ Learning Rate: 3.44e-06 | |
| 2025-08-29 06:06:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:06:26 - pico-train - INFO - Step 19200 -- ๐ Training Metrics | |
| 2025-08-29 06:06:26 - pico-train - INFO - โโโ Loss: 6.3756 | |
| 2025-08-29 06:06:26 - pico-train - INFO - โโโ Learning Rate: 3.33e-06 | |
| 2025-08-29 06:06:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:06:39 - pico-train - INFO - Step 19225 -- ๐ Training Metrics | |
| 2025-08-29 06:06:39 - pico-train - INFO - โโโ Loss: 6.4037 | |
| 2025-08-29 06:06:39 - pico-train - INFO - โโโ Learning Rate: 3.23e-06 | |
| 2025-08-29 06:06:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:06:51 - pico-train - INFO - Step 19250 -- ๐ Training Metrics | |
| 2025-08-29 06:06:51 - pico-train - INFO - โโโ Loss: 6.3974 | |
| 2025-08-29 06:06:51 - pico-train - INFO - โโโ Learning Rate: 3.13e-06 | |
| 2025-08-29 06:06:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:07:04 - pico-train - INFO - Step 19275 -- ๐ Training Metrics | |
| 2025-08-29 06:07:04 - pico-train - INFO - โโโ Loss: 6.3933 | |
| 2025-08-29 06:07:04 - pico-train - INFO - โโโ Learning Rate: 3.02e-06 | |
| 2025-08-29 06:07:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:07:17 - pico-train - INFO - Step 19300 -- ๐ Training Metrics | |
| 2025-08-29 06:07:17 - pico-train - INFO - โโโ Loss: 6.3269 | |
| 2025-08-29 06:07:17 - pico-train - INFO - โโโ Learning Rate: 2.92e-06 | |
| 2025-08-29 06:07:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:07:30 - pico-train - INFO - Step 19325 -- ๐ Training Metrics | |
| 2025-08-29 06:07:30 - pico-train - INFO - โโโ Loss: 6.3907 | |
| 2025-08-29 06:07:30 - pico-train - INFO - โโโ Learning Rate: 2.81e-06 | |
| 2025-08-29 06:07:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:07:42 - pico-train - INFO - Step 19350 -- ๐ Training Metrics | |
| 2025-08-29 06:07:42 - pico-train - INFO - โโโ Loss: 6.3955 | |
| 2025-08-29 06:07:42 - pico-train - INFO - โโโ Learning Rate: 2.71e-06 | |
| 2025-08-29 06:07:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:07:55 - pico-train - INFO - Step 19375 -- ๐ Training Metrics | |
| 2025-08-29 06:07:55 - pico-train - INFO - โโโ Loss: 6.3972 | |
| 2025-08-29 06:07:55 - pico-train - INFO - โโโ Learning Rate: 2.60e-06 | |
| 2025-08-29 06:07:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:08:08 - pico-train - INFO - Step 19400 -- ๐ Training Metrics | |
| 2025-08-29 06:08:08 - pico-train - INFO - โโโ Loss: 6.3896 | |
| 2025-08-29 06:08:08 - pico-train - INFO - โโโ Learning Rate: 2.50e-06 | |
| 2025-08-29 06:08:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:08:21 - pico-train - INFO - Step 19425 -- ๐ Training Metrics | |
| 2025-08-29 06:08:21 - pico-train - INFO - โโโ Loss: 6.3425 | |
| 2025-08-29 06:08:21 - pico-train - INFO - โโโ Learning Rate: 2.40e-06 | |
| 2025-08-29 06:08:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:08:33 - pico-train - INFO - Step 19450 -- ๐ Training Metrics | |
| 2025-08-29 06:08:33 - pico-train - INFO - โโโ Loss: 6.3587 | |
| 2025-08-29 06:08:33 - pico-train - INFO - โโโ Learning Rate: 2.29e-06 | |
| 2025-08-29 06:08:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:08:46 - pico-train - INFO - Step 19475 -- ๐ Training Metrics | |
| 2025-08-29 06:08:46 - pico-train - INFO - โโโ Loss: 6.4179 | |
| 2025-08-29 06:08:46 - pico-train - INFO - โโโ Learning Rate: 2.19e-06 | |
| 2025-08-29 06:08:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:08:58 - pico-train - INFO - Step 19500 -- ๐พ Saving Checkpoint | |
| 2025-08-29 06:12:49 - pico-train - INFO - Step 19500 -- ๐ Evaluation Results | |
| 2025-08-29 06:12:49 - pico-train - INFO - โโโ paloma: 1.6346615942991742e+24 | |
| 2025-08-29 06:12:50 - pico-train - INFO - Step 19500 -- ๐ Training Metrics | |
| 2025-08-29 06:12:50 - pico-train - INFO - โโโ Loss: 6.4192 | |
| 2025-08-29 06:12:50 - pico-train - INFO - โโโ Learning Rate: 2.08e-06 | |
| 2025-08-29 06:12:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:12:50 - pico-train - INFO - Step 19500 -- ๐ Saving Learning Dynamics | |
| 2025-08-29 06:13:05 - pico-train - INFO - Step 19525 -- ๐ Training Metrics | |
| 2025-08-29 06:13:05 - pico-train - INFO - โโโ Loss: 6.4252 | |
| 2025-08-29 06:13:05 - pico-train - INFO - โโโ Learning Rate: 1.98e-06 | |
| 2025-08-29 06:13:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:13:18 - pico-train - INFO - Step 19550 -- ๐ Training Metrics | |
| 2025-08-29 06:13:18 - pico-train - INFO - โโโ Loss: 6.3349 | |
| 2025-08-29 06:13:18 - pico-train - INFO - โโโ Learning Rate: 1.88e-06 | |
| 2025-08-29 06:13:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:13:31 - pico-train - INFO - Step 19575 -- ๐ Training Metrics | |
| 2025-08-29 06:13:31 - pico-train - INFO - โโโ Loss: 6.4042 | |
| 2025-08-29 06:13:31 - pico-train - INFO - โโโ Learning Rate: 1.77e-06 | |
| 2025-08-29 06:13:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:13:44 - pico-train - INFO - Step 19600 -- ๐ Training Metrics | |
| 2025-08-29 06:13:44 - pico-train - INFO - โโโ Loss: 6.3567 | |
| 2025-08-29 06:13:44 - pico-train - INFO - โโโ Learning Rate: 1.67e-06 | |
| 2025-08-29 06:13:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:13:57 - pico-train - INFO - Step 19625 -- ๐ Training Metrics | |
| 2025-08-29 06:13:57 - pico-train - INFO - โโโ Loss: 6.3912 | |
| 2025-08-29 06:13:57 - pico-train - INFO - โโโ Learning Rate: 1.56e-06 | |
| 2025-08-29 06:13:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:14:09 - pico-train - INFO - Step 19650 -- ๐ Training Metrics | |
| 2025-08-29 06:14:09 - pico-train - INFO - โโโ Loss: 6.3113 | |
| 2025-08-29 06:14:09 - pico-train - INFO - โโโ Learning Rate: 1.46e-06 | |
| 2025-08-29 06:14:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:14:22 - pico-train - INFO - Step 19675 -- ๐ Training Metrics | |
| 2025-08-29 06:14:22 - pico-train - INFO - โโโ Loss: 6.3756 | |
| 2025-08-29 06:14:22 - pico-train - INFO - โโโ Learning Rate: 1.35e-06 | |
| 2025-08-29 06:14:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:14:35 - pico-train - INFO - Step 19700 -- ๐ Training Metrics | |
| 2025-08-29 06:14:35 - pico-train - INFO - โโโ Loss: 6.3850 | |
| 2025-08-29 06:14:35 - pico-train - INFO - โโโ Learning Rate: 1.25e-06 | |
| 2025-08-29 06:14:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:14:47 - pico-train - INFO - Step 19725 -- ๐ Training Metrics | |
| 2025-08-29 06:14:47 - pico-train - INFO - โโโ Loss: 6.3631 | |
| 2025-08-29 06:14:47 - pico-train - INFO - โโโ Learning Rate: 1.15e-06 | |
| 2025-08-29 06:14:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:15:00 - pico-train - INFO - Step 19750 -- ๐ Training Metrics | |
| 2025-08-29 06:15:00 - pico-train - INFO - โโโ Loss: 6.4564 | |
| 2025-08-29 06:15:00 - pico-train - INFO - โโโ Learning Rate: 1.04e-06 | |
| 2025-08-29 06:15:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:15:13 - pico-train - INFO - Step 19775 -- ๐ Training Metrics | |
| 2025-08-29 06:15:13 - pico-train - INFO - โโโ Loss: 6.3258 | |
| 2025-08-29 06:15:13 - pico-train - INFO - โโโ Learning Rate: 9.38e-07 | |
| 2025-08-29 06:15:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:15:26 - pico-train - INFO - Step 19800 -- ๐ Training Metrics | |
| 2025-08-29 06:15:26 - pico-train - INFO - โโโ Loss: 6.4682 | |
| 2025-08-29 06:15:26 - pico-train - INFO - โโโ Learning Rate: 8.33e-07 | |
| 2025-08-29 06:15:26 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:15:39 - pico-train - INFO - Step 19825 -- ๐ Training Metrics | |
| 2025-08-29 06:15:39 - pico-train - INFO - โโโ Loss: 6.4421 | |
| 2025-08-29 06:15:39 - pico-train - INFO - โโโ Learning Rate: 7.29e-07 | |
| 2025-08-29 06:15:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:15:52 - pico-train - INFO - Step 19850 -- ๐ Training Metrics | |
| 2025-08-29 06:15:52 - pico-train - INFO - โโโ Loss: 6.4342 | |
| 2025-08-29 06:15:52 - pico-train - INFO - โโโ Learning Rate: 6.25e-07 | |
| 2025-08-29 06:15:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:16:06 - pico-train - INFO - Step 19875 -- ๐ Training Metrics | |
| 2025-08-29 06:16:06 - pico-train - INFO - โโโ Loss: 6.4182 | |
| 2025-08-29 06:16:06 - pico-train - INFO - โโโ Learning Rate: 5.21e-07 | |
| 2025-08-29 06:16:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:16:18 - pico-train - INFO - Step 19900 -- ๐ Training Metrics | |
| 2025-08-29 06:16:18 - pico-train - INFO - โโโ Loss: 6.3203 | |
| 2025-08-29 06:16:18 - pico-train - INFO - โโโ Learning Rate: 4.17e-07 | |
| 2025-08-29 06:16:18 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:16:31 - pico-train - INFO - Step 19925 -- ๐ Training Metrics | |
| 2025-08-29 06:16:31 - pico-train - INFO - โโโ Loss: 6.4339 | |
| 2025-08-29 06:16:31 - pico-train - INFO - โโโ Learning Rate: 3.13e-07 | |
| 2025-08-29 06:16:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:16:44 - pico-train - INFO - Step 19950 -- ๐ Training Metrics | |
| 2025-08-29 06:16:44 - pico-train - INFO - โโโ Loss: 6.4095 | |
| 2025-08-29 06:16:44 - pico-train - INFO - โโโ Learning Rate: 2.08e-07 | |
| 2025-08-29 06:16:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:16:57 - pico-train - INFO - Step 19975 -- ๐ Training Metrics | |
| 2025-08-29 06:16:57 - pico-train - INFO - โโโ Loss: 6.4814 | |
| 2025-08-29 06:16:57 - pico-train - INFO - โโโ Learning Rate: 1.04e-07 | |
| 2025-08-29 06:16:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-29 06:17:09 - pico-train - INFO - Step 20000 -- ๐พ Saving Checkpoint | |
| 2025-08-29 06:19:05 - pico-train - INFO - Step 20000 -- ๐ Evaluation Results | |
| 2025-08-29 06:19:05 - pico-train - INFO - โโโ paloma: 1.645368302099182e+24 | |
| 2025-08-29 06:19:06 - pico-train - INFO - ๐ Training complete! Final step: 20000 | |