--- library_name: transformers license: mit base_model: roberta-base metrics: - accuracy model-index: - name: roberta-base-unified-mcqa-v2 results: [] datasets: - pszemraj/unified-mcqa language: - en --- # roberta-base-unified-mcqa: 4-choice This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the [unified-mcqa](https://huggingface.co/datasets/pszemraj/unified-mcqa) dataset (4 choice config). It achieves the following results on the evaluation set: - Loss: 0.5534 - Accuracy: 0.8030 - Num Input Tokens Seen: 2785906024 ## Intended uses & limitations goal is to see if training on general MCQ data helps A) GLUE evals B) results in a better base model than just the MLM output ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 69 - gradient_accumulation_steps: 8 - total_train_batch_size: 64 - optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 300 - num_epochs: 3.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | Input Tokens Seen | |:-------------:|:------:|:-----:|:---------------:|:--------:|:-----------------:| | 0.9531 | 0.1189 | 1000 | 0.8328 | 0.6370 | 111443072 | | 0.8363 | 0.2377 | 2000 | 0.7918 | 0.6720 | 222788512 | | 0.7689 | 0.3566 | 3000 | 0.7457 | 0.6940 | 334128480 | | 0.8036 | 0.4754 | 4000 | 0.7429 | 0.6940 | 445377152 | | 0.7349 | 0.5943 | 5000 | 0.7252 | 0.7050 | 556965376 | | 0.7721 | 0.7131 | 6000 | 0.7102 | 0.7130 | 668132544 | | 0.6532 | 0.8320 | 7000 | 0.6958 | 0.7230 | 779523488 | | 0.6842 | 0.9509 | 8000 | 0.6609 | 0.7230 | 891149056 | | 0.576 | 1.0696 | 9000 | 0.6887 | 0.7360 | 1002658088 | | 0.6265 | 1.1885 | 10000 | 0.6730 | 0.7520 | 1114316936 | | 0.5256 | 1.3074 | 11000 | 0.6860 | 0.7550 | 1225691432 | | 0.5701 | 1.4262 | 12000 | 0.6487 | 0.7530 | 1337160232 | | 0.4803 | 1.5451 | 13000 | 0.6306 | 0.7580 | 1448480392 | | 0.5155 | 1.6639 | 14000 | 0.5834 | 0.7800 | 1560022824 | | 0.5221 | 1.7828 | 15000 | 0.6005 | 0.7850 | 1671544872 | | 0.4736 | 1.9016 | 16000 | 0.5796 | 0.7820 | 1782692648 | | 0.3577 | 2.0204 | 17000 | 0.5753 | 0.7870 | 1893957800 | | 0.3656 | 2.1393 | 18000 | 0.6014 | 0.7930 | 2005395624 | | 0.3722 | 2.2582 | 19000 | 0.6108 | 0.7900 | 2117111816 | | 0.3599 | 2.3770 | 20000 | 0.5826 | 0.8000 | 2228698440 | | 0.2723 | 2.4959 | 21000 | 0.5845 | 0.7910 | 2340181736 | | 0.2817 | 2.6147 | 22000 | 0.5732 | 0.7840 | 2451744808 | | 0.2402 | 2.7336 | 23000 | 0.5544 | 0.7980 | 2563194408 | | 0.3318 | 2.8524 | 24000 | 0.5542 | 0.8000 | 2674427656 | | 0.272 | 2.9713 | 25000 | 0.5534 | 0.8030 | 2785906024 | ### Framework versions - Transformers 4.51.3 - Pytorch 2.6.0+cu124 - Datasets 3.5.0 - Tokenizers 0.21.1