---
library_name: transformers
license: mit
base_model: roberta-base
metrics:
- accuracy
model-index:
- name: roberta-base-unified-mcqa-v2
  results: []
datasets:
- pszemraj/unified-mcqa
language:
- en
---

# roberta-base-unified-mcqa: 4-choice

This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the [unified-mcqa](https://huggingface.co/datasets/pszemraj/unified-mcqa) dataset (4 choice config).
It achieves the following results on the evaluation set:
- Loss: 0.5534
- Accuracy: 0.8030
- Num Input Tokens Seen: 2785906024

## Intended uses & limitations

goal is to see if training on general MCQ data helps A) GLUE evals B) results in a better base model than just the MLM output


## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 69
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- num_epochs: 3.0

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Accuracy | Input Tokens Seen |
|:-------------:|:------:|:-----:|:---------------:|:--------:|:-----------------:|
| 0.9531        | 0.1189 | 1000  | 0.8328          | 0.6370   | 111443072         |
| 0.8363        | 0.2377 | 2000  | 0.7918          | 0.6720   | 222788512         |
| 0.7689        | 0.3566 | 3000  | 0.7457          | 0.6940   | 334128480         |
| 0.8036        | 0.4754 | 4000  | 0.7429          | 0.6940   | 445377152         |
| 0.7349        | 0.5943 | 5000  | 0.7252          | 0.7050   | 556965376         |
| 0.7721        | 0.7131 | 6000  | 0.7102          | 0.7130   | 668132544         |
| 0.6532        | 0.8320 | 7000  | 0.6958          | 0.7230   | 779523488         |
| 0.6842        | 0.9509 | 8000  | 0.6609          | 0.7230   | 891149056         |
| 0.576         | 1.0696 | 9000  | 0.6887          | 0.7360   | 1002658088        |
| 0.6265        | 1.1885 | 10000 | 0.6730          | 0.7520   | 1114316936        |
| 0.5256        | 1.3074 | 11000 | 0.6860          | 0.7550   | 1225691432        |
| 0.5701        | 1.4262 | 12000 | 0.6487          | 0.7530   | 1337160232        |
| 0.4803        | 1.5451 | 13000 | 0.6306          | 0.7580   | 1448480392        |
| 0.5155        | 1.6639 | 14000 | 0.5834          | 0.7800   | 1560022824        |
| 0.5221        | 1.7828 | 15000 | 0.6005          | 0.7850   | 1671544872        |
| 0.4736        | 1.9016 | 16000 | 0.5796          | 0.7820   | 1782692648        |
| 0.3577        | 2.0204 | 17000 | 0.5753          | 0.7870   | 1893957800        |
| 0.3656        | 2.1393 | 18000 | 0.6014          | 0.7930   | 2005395624        |
| 0.3722        | 2.2582 | 19000 | 0.6108          | 0.7900   | 2117111816        |
| 0.3599        | 2.3770 | 20000 | 0.5826          | 0.8000   | 2228698440        |
| 0.2723        | 2.4959 | 21000 | 0.5845          | 0.7910   | 2340181736        |
| 0.2817        | 2.6147 | 22000 | 0.5732          | 0.7840   | 2451744808        |
| 0.2402        | 2.7336 | 23000 | 0.5544          | 0.7980   | 2563194408        |
| 0.3318        | 2.8524 | 24000 | 0.5542          | 0.8000   | 2674427656        |
| 0.272         | 2.9713 | 25000 | 0.5534          | 0.8030   | 2785906024        |


### Framework versions

- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1