Phi-4-multimodal-instruct-ko-asr / README.md

junnei

Update README.md

ddf89f4 verified 7 months ago

preview code

raw

history blame contribute delete

3.7 kB

metadata

library_name: transformers
datasets:
  - Bingsu/zeroth-korean
  - google/fleurs
language:
  - ko
metrics:
  - cer
  - wer
  - bleu
base_model:
  - microsoft/Phi-4-multimodal-instruct
model-index:
  - name: Phi-4-multimodal-instruct-ko-asr
    results:
      - task:
          type: automatic-speech-recognition
        dataset:
          type: Bingsu/zeroth_korean
          name: zeroth-korean-test
        metrics:
          - type: bleu
            name: zeroth-test-BLEU
            value: 94.837
          - type: cer
            name: zeroth-test-CER
            value: 1.316
          - type: wer
            name: zeroth-test-WER
            value: 2.951
      - task:
          type: automatic-speech-recognition
        dataset:
          type: google/flerus
          name: flerus-ko-test
        metrics:
          - type: bleu
            name: fleurs-test-BLEU
            value: 67.659
          - type: cer
            name: fleurs-test-CER
            value: 7.951
          - type: wer
            name: fleurs-test-WER
            value: 18.313
pipeline_tag: automatic-speech-recognition

This model is fine-tuned from microsoft/Phi-4-multimodal-instruct on Bingsu/zeroth-korean, google/flerus in 5 epochs.

This model is trained 960 steps on datasets for Korean Audio Speech Recognition on H100.

After that, we continue training with CoVoST2 Dataset / CoVoST2-Ko for AST.

AST Finetuned model is Here : Phi-4-multimodal-instruct-ko-speech

Evaluation

Evaluation was done on the following datasets:

ASR (Automatic Speech Recognition): Evaluated with CER (Character Error Rate) on zeroth-test set (457 samples).
AST (Automatic Speech Translation): Evaluated with BLEU score on fleurs ko <-> en speech translation result (270 samples).

Script is retrieved from here.

Compared to Phi-4-mm-inst-zeroth-kor and Phi-4-multimodal-finetune-ko-speech, ASR is significantly improved.

Model	zeroth-CER	zeroth-WER	fleurs-ko_en-BLEU	fleurs-ko_en-cot-BLEU	fleurs-en_ko-BLEU	fleurs-en_ko-cot-BLEU
original	198.32	-	5.63	2.42	6.86	4.17
daekeun-ml/Phi-4-multimodal-finetune-ko-speech	1.61	3.54	7.67	8.38	12.31	9.69
seastar105/Phi-4-mm-inst-zeroth-kor	7.02	-	7.07	9.19	13.08	9.35
ASR finetune(this model)	1.31	2.95	7.46	6.24	12.15	8.91
+ 1 epoch finetune with Covost-Ko	3.88	-	8.07	10.09	18.82	15.41
AST finetuned model	1.77	2.99	8.01	9.09	17.09	11.82