Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,28 @@ license: apache-2.0
|
|
| 6 |
|
| 7 |
# Rationalyst (with rationales extracted from reasoning datasets)
|
| 8 |
|
| 9 |
-
This model is a
|
| 10 |
introduced in RATIONALYST: Supervising Reasoning via Self-Supervised Rationale Extraction. The code for the rationale extraction, model training and
|
| 11 |
inference can be found [here](https://github.com/JHU-CLSP/reasoning_world_model).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
# Rationalyst (with rationales extracted from reasoning datasets)
|
| 8 |
|
| 9 |
+
This model is a fine-tuned version of the [LLaMa-3-Instruct-8B](https://huggingface.co/bert-base-uncased). It was
|
| 10 |
introduced in RATIONALYST: Supervising Reasoning via Self-Supervised Rationale Extraction. The code for the rationale extraction, model training and
|
| 11 |
inference can be found [here](https://github.com/JHU-CLSP/reasoning_world_model).
|
| 12 |
+
|
| 13 |
+
## Model description
|
| 14 |
+
Implicit rationales are often embedded in the unlabelled text, reflecting the natural thought processes behind speech and writing.
|
| 15 |
+
RATIONALYST is a self-supervised approach to extract and filter these implicit rationales from unlabelled text and apply
|
| 16 |
+
them to supervise reasoning.
|
| 17 |
+
|
| 18 |
+
## How to use
|
| 19 |
+
To use it, simply input question and partial reasoning trajectory, and the model will output the rationale to supervise the next reasoning step.
|
| 20 |
+
|
| 21 |
+
## Training data
|
| 22 |
+
|
| 23 |
+
This Rationalyst is trained using 17566 rationales from GSM8K and 19669 rationales from ECQA. The data used can be found
|
| 24 |
+
[here](https://huggingface.co/datasets/Dongwei/reasoning_world_model)
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
## Evaluation results
|
| 28 |
+
|
| 29 |
+
When used to evaluate on downstream tasks, this model achieves the following results:
|
| 30 |
+
|
| 31 |
+
| Task | GSM8K | MATH | ECQA | HellaSwag | ProofWriter | ARC | MMLU-Pro |
|
| 32 |
+
|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|
|
| 33 |
+
| | 76.2 | 32.5 | 76.2 | 59.4 | 90.1 | 79.3 | 32.1 |
|