Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
|
@@ -237,33 +237,48 @@ def create_interface():
|
|
| 237 |
# """)
|
| 238 |
|
| 239 |
# Evaluation Criteria
|
| 240 |
-
with gr.Row():
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
|
| 265 |
|
| 266 |
-
gr.Markdown("---")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 267 |
|
| 268 |
# Search and Filter Section
|
| 269 |
with gr.Row():
|
|
|
|
| 237 |
# """)
|
| 238 |
|
| 239 |
# Evaluation Criteria
|
| 240 |
+
# with gr.Row():
|
| 241 |
+
# with gr.Column():
|
| 242 |
+
# gr.HTML("""
|
| 243 |
+
# <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
|
| 244 |
+
# <div style="font-size: 2rem;">🎭</div>
|
| 245 |
+
# <strong>Naturalness</strong><br>
|
| 246 |
+
# <small>Human-like quality & emotional expression</small>
|
| 247 |
+
# </div>
|
| 248 |
+
# """)
|
| 249 |
+
# with gr.Column():
|
| 250 |
+
# gr.HTML("""
|
| 251 |
+
# <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
|
| 252 |
+
# <div style="font-size: 2rem;">🗣️</div>
|
| 253 |
+
# <strong>Intelligibility</strong><br>
|
| 254 |
+
# <small>Clarity & pronunciation accuracy</small>
|
| 255 |
+
# </div>
|
| 256 |
+
# """)
|
| 257 |
+
# with gr.Column():
|
| 258 |
+
# gr.HTML("""
|
| 259 |
+
# <div style="text-align: center; padding: 1rem; background: rgba(102, 126, 234, 0.1); border-radius: 8px;">
|
| 260 |
+
# <div style="font-size: 2rem;">🎛️</div>
|
| 261 |
+
# <strong>Controllability</strong><br>
|
| 262 |
+
# <small>Tone, pace & parameter flexibility</small>
|
| 263 |
+
# </div>
|
| 264 |
+
# """)
|
| 265 |
|
| 266 |
+
# gr.Markdown("---")
|
| 267 |
+
gr.Markdown("""
|
| 268 |
+
## 🔑 Key Findings
|
| 269 |
+
|
| 270 |
+
1. **Outstanding Speech Quality**
|
| 271 |
+
Several models—namely **Kokoro-82M**, **csm-1b**, **Spark-TTS-0.5B**, **Orpheus-3b-0.1-ft**, **F5-TTS**, and **Llasa-3B**—delivered exceptionally natural, clear, and realistic synthesized speech. Among these, **csm-1b** and **F5-TTS** stood out as the most well-rounded: they combined top-tier naturalness and intelligibility with solid controllability.
|
| 272 |
+
|
| 273 |
+
2. **Superior Controllability**
|
| 274 |
+
**Zonos-v0.1-transformer** emerged as the leader in fine-grained control: it offers detailed adjustments for prosody, emotion, and audio quality, making it ideal for use cases that demand precise voice modulation.
|
| 275 |
+
|
| 276 |
+
3. **Performance vs. Footprint Trade-off**
|
| 277 |
+
Smaller models (e.g., **Kokoro-82M** at 82 million parameters) can still achieve “Good” or “Excellent” ratings in many scenarios, especially when efficient inference or low VRAM usage is critical. Larger models (1 billion–3 billion+ parameters) generally offer more versatility—handling multilingual synthesis, zero-shot voice cloning, and multi-speaker generation—but require heavier compute resources.
|
| 278 |
+
|
| 279 |
+
4. **Special Notes on Multilingual & Cloning Capabilities**
|
| 280 |
+
**Spark-TTS-0.5B** and **XTTS-v2** excel at cross-lingual and zero-shot voice cloning, making them strong candidates for projects that need multi-language support or short-clip cloning. **Llama-OuteTTS-1.0-1B** and **MegaTTS3** also offer multilingual input handling, though they may require careful sampling parameter tuning to achieve optimal results.
|
| 281 |
+
""")
|
| 282 |
|
| 283 |
# Search and Filter Section
|
| 284 |
with gr.Row():
|