Sounding that Object: Interactive Object-Aware Image to Audio Generation Paper • 2506.04214 • Published Jun 4 • 2
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models Paper • 2505.16211 • Published May 22 • 18
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation Paper • 2506.00385 • Published May 31 • 3
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4, 2024 • 38
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation Paper • 2502.03930 • Published Feb 6
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11 • 130
KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke Paper • 2110.09121 • Published Oct 18, 2021