You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
If you are using this model as part of a horizontal voice agent platform, you agree not to set Vogent-Turn-80M as a default option, and to require users to select an option labeled 'Vogent Turn Detector' if they would like to use the model.
Log in or Sign Up to review the conditions and access this model content.
Vogent-Turn-80M
State-of-the-art multimodal turn detection model for voice AI systems, achieving 94.1% accuracy by combining acoustic and linguistic signals for real-time conversational applications.
Model Details
Model Description
Vogent-Turn-80M is a multimodal turn detection model that addresses the critical challenge of determining when a speaker has finished their turn in a conversation. Unlike traditional approaches that rely solely on audio or text, Vogent-Turn-80M processes both acoustic features (via Whisper encoder) and semantic context to make accurate predictions in real-time (~7ms on T4 GPU).
- Developed by: Vogent AI
- Model type: Multimodal Turn Detection (Binary Classification)
- Language(s) (NLP): English
- License: Vogent-Turn-80M is licensed under a modified Apache-2.0 license; horizontal voice agent platforms may not select Vogent-Turn-80M as the default turn-detection model, and any end-users who which to use the model must be required to select 'Vogent Turn Detector.' Otherwise, standard Apache 2.0 provisions apply.
- Finetuned from model: SmolLM2-135M (reduced to 80M parameters by using only first 12 layers)
Model Sources
- GitHub Repository: https://github.com/vogent/vogent-turn
- Blog post: https://blog.vogent.ai/posts/voturn-80m-state-of-the-art-turn-detection-for-voice-agents
Uses
Vogent-Turn-80M is designed for real-time turn detection in voice assistant applications, determining when a user has finished speaking to enable natural conversational flow without premature interruptions or awkward delays.
Bias, Risks, and Limitations
Technical Limitations:
- English-only support; turn-taking conventions vary across languages and cultures
- CPU inference may be too slow for some real-time applications
How to Get Started with the Model
For complete installation and usage instructions, visit: https://github.com/vogent/vogent-turn
Quick Install
# Clone the repository
git clone https://github.com/vogent/vogent-turn.git
cd vogent-turn
# Install in development mode
pip install -e .
Basic Usage
from vogent_turn import TurnDetector
import soundfile as sf
import urllib.request
# Initialize detector
detector = TurnDetector(compile_model=True, warmup=True)
# Download and load audio
audio_url = "https://storage.googleapis.com/voturn-sample-recordings/incomplete_number_sample.wav"
urllib.request.urlretrieve(audio_url, "sample.wav")
audio, sr = sf.read("sample.wav")
# Run turn detection with conversational context
result = detector.predict(
audio,
prev_line="What is your phone number",
curr_line="My number is 804",
sample_rate=sr,
return_probs=True,
)
print(f"Turn complete: {result['is_endpoint']}")
print(f"Done speaking probability: {result['prob_endpoint']:.1%}")
Available Interfaces
- Python Library: Direct integration with
TurnDetectorclass - CLI Tool:
vogent-turn-predict speech.wav --prev "What is your phone number" --curr "My number is 804"
See the GitHub repository for detailed documentation, performance benchmarks, and advanced usage.
Training Details
Training Data
The model was trained on a diverse dataset combining human-collected and synthetic conversational data:
Training Procedure
Preprocessing
- Audio: Last 8 seconds extracted via Whisper-Tiny encoder β ~400 audio tokens
- Text: Full conversational context including assistant and user utterances
- Labels: Binary classification (turn complete/incomplete)
- Multimodal fusion: Audio embeddings projected into LLM's input space and concatenated with text
Training Hyperparameters
- Training regime: fp16 mixed precision
- Base model initialization: SmolLM2-135M (first 12 layers)
- Architecture modifications: Reduced from 135M to ~80M parameters through layer ablation
Speeds, Sizes, Times
- Model size: ~80M parameters
Evaluation
Testing Data, Factors & Metrics
Testing Data
Internal test set covering diverse conversational scenarios and edge cases where audio-only or text-only approaches fail.
- Accuracy: 94.1%
- AUPRC: 0.975
Technical Specifications
Model Architecture and Objective
Architecture:
- Audio Encoder: Whisper-Tiny (processes up to 8 seconds of 16kHz audio)
- Text Model: SmolLM-135M (12 layers, ~80M parameters)
- Multimodal Fusion: Audio embeddings projected into LLM's input space
- Classifier: Binary classification head (turn complete/incomplete)
Processing Flow:
- Audio (16kHz PCM) β Whisper Encoder β Audio Embeddings (~400 tokens)
- Text Context β SmolLM Tokenizer β Text Embeddings
- Concatenate embeddings β SmolLM Transformer β Last token hidden state
- Linear Classifier β Softmax β [P(continue), P(endpoint)]
Compute Infrastructure
Hardware
Optimization Features:
- torch.compile with max-autotune mode
- Dynamic tensor shapes without recompilation
- Pre-warmed bucket sizes (64, 128, 256, 512, 1024)
Software
- Framework: PyTorch with torch.compile
- Audio processing: Whisper encoder (up to 8 seconds)
Citation
BibTeX:
@misc{voturn2025,
title={Vogent-Turn-80M: State-of-the-Art Turn Detection for Voice Agents},
author={Varadarajan, Vignesh and Vytheeswaran, Jagath},
year={2025},
publisher={Vogent AI},
howpublished={\url{https://huggingface.co/vogent/Vogent-Turn-80M}},
note={Blog: \url{https://blog.vogent.ai/posts/voturn-80m-state-of-the-art-turn-detection-for-voice-agents}}
}
More Information
Vogent-Turn-80M is part of Vogent's comprehensive voice AI platform.
Resources:
- Full documentation and code: https://github.com/vogent/vogent-turn
- Platform access: https://vogent.ai
- Enterprise solutions: Contact j@vogent.ai
Upcoming releases:
- Int8 quantized model for faster CPU deployment
- Multilingual versions
- Domain-specific adaptations
Model Card Authors
Vogent AI Team
Model Card Contact
- GitHub Repository: https://github.com/vogent/vogent-turn
- GitHub Issues: https://github.com/vogent/vogent-turn/issues
- Website: https://vogent.ai
- Downloads last month
- 213