Update README.md
Browse files
README.md
CHANGED
|
@@ -31,6 +31,16 @@ With these datasets, we achieve the following scores on the JGLUE benchmark:
|
|
| 31 |
| jnli-1.1-0.3 | 0.504 | 0.48 |
|
| 32 |
| marc_ja-1.1-0.3 | 0.936 | 0.959 |
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
Our model achieves much better results on the question answering benchmark (JSQuAD) than the base checkpoint without monstrous degradation of performance on multi-choice question benchmarks (JCommonSense, JNLI, MARC-Ja) purely through QLoRA training.
|
| 35 |
This shows the potential for applying strong language models such as [Open-Orca/OpenOrcaxOpenChat-Preview2-13B](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B) to minimal QLoRA fine-tuning using Japanese fine-tuning datasets to achieve better results at narrow NLP tasks.
|
| 36 |
|
|
|
|
| 31 |
| jnli-1.1-0.3 | 0.504 | 0.48 |
|
| 32 |
| marc_ja-1.1-0.3 | 0.936 | 0.959 |
|
| 33 |
|
| 34 |
+
We achieved these scores by using the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness) from Stability AI.
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
MODEL_ARGS=pretrained=lightblue/openorca_stx,use_accelerate=True
|
| 38 |
+
TASK="jsquad-1.1-0.3,jcommonsenseqa-1.1-0.3,jnli-1.1-0.3,marc_ja-1.1-0.3"
|
| 39 |
+
export JGLUE_OUTPUT_DIR=../jglue_results/$MODEL_NAME/$DATSET_NAME/$DATASET_SIZE
|
| 40 |
+
mkdir -p $JGLUE_OUTPUT_DIR
|
| 41 |
+
python main.py --model hf-causal-experimental --model_args $MODEL_ARGS --tasks $TASK --num_fewshot "2,3,3,3" --device "cuda" --output_path $JGLUE_OUTPUT_DIR/result.json --batch_size 4 > $JGLUE_OUTPUT_DIR/harness.out 2> $JGLUE_OUTPUT_DIR/harness.err
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
Our model achieves much better results on the question answering benchmark (JSQuAD) than the base checkpoint without monstrous degradation of performance on multi-choice question benchmarks (JCommonSense, JNLI, MARC-Ja) purely through QLoRA training.
|
| 45 |
This shows the potential for applying strong language models such as [Open-Orca/OpenOrcaxOpenChat-Preview2-13B](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B) to minimal QLoRA fine-tuning using Japanese fine-tuning datasets to achieve better results at narrow NLP tasks.
|
| 46 |
|