Open Voice Classification
					Collection
				
audio classifiers
					β’ 
				4 items
				β’ 
				Updated
					
				
Speech-Emotion-Classification is a fine-tuned version of
facebook/wav2vec2-base-960hfor multi-class audio classification, specifically trained to detect emotions in speech. This model utilizes theWav2Vec2ForSequenceClassificationarchitecture to accurately classify speaker emotions from audio signals.
Wav2Vec2: Self-Supervised Learning for Speech Recognition https://arxiv.org/pdf/2006.11477
Classification Report:
              precision    recall  f1-score   test_support
       Anger       0.8314    0.9346    0.8800       306
        Calm       0.7949    0.8857    0.8378        35
     Disgust       0.8261    0.8287    0.8274       321
        Fear       0.8303    0.7377    0.7812       305
       Happy       0.8929    0.7764    0.8306       322
     Neutral       0.8423    0.9303    0.8841       287
         Sad       0.7749    0.7825    0.7787       308
  Surprised       0.9478    0.9478    0.9478       115
    accuracy                           0.8379      1999
   macro avg       0.8426    0.8530    0.8460      1999
weighted avg       0.8392    0.8379    0.8367      1999
Class 0: Anger  
Class 1: Calm  
Class 2: Disgust  
Class 3: Fear  
Class 4: Happy  
Class 5: Neutral  
Class 6: Sad  
Class 7: Surprised
pip install gradio transformers torch librosa hf_xet
import gradio as gr
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
import torch
import librosa
# Load model and processor
model_name = "prithivMLmods/Speech-Emotion-Classification"
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)
processor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)
# Label mapping
id2label = {
    "0": "Anger",
    "1": "Calm",
    "2": "Disgust",
    "3": "Fear",
    "4": "Happy",
    "5": "Neutral",
    "6": "Sad",
    "7": "Surprised"
}
def classify_audio(audio_path):
    # Load and resample audio to 16kHz
    speech, sample_rate = librosa.load(audio_path, sr=16000)
    # Process audio
    inputs = processor(
        speech,
        sampling_rate=sample_rate,
        return_tensors="pt",
        padding=True
    )
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }
    return prediction
# Gradio Interface
iface = gr.Interface(
    fn=classify_audio,
    inputs=gr.Audio(type="filepath", label="Upload Audio (WAV, MP3, etc.)"),
    outputs=gr.Label(num_top_classes=8, label="Emotion Classification"),
    title="Speech Emotion Classification",
    description="Upload an audio clip to classify the speaker's emotion from voice signals."
)
if __name__ == "__main__":
    iface.launch()
  "id2label": {
    "0": "ANG",
    "1": "CAL",
    "2": "DIS",
    "3": "FEA",
    "4": "HAP",
    "5": "NEU",
    "6": "SAD",
    "7": "SUR"
  },
Speech-Emotion-Classification is designed for: