xiaobin zhuang's picture

5 2 6

xiaobin zhuang

xiaobinzhuang

·

https://scholar.google.com/citations?user=a-crUqgAAAAJ&hl=zh-CN

auzxb

AI & ML interests

multi modal; audio generation; posting training

Organizations

None yet

authored 3 papers 5 months ago

Sounding that Object: Interactive Object-Aware Image to Audio Generation

Paper • 2506.04214 • Published Jun 4 • 2

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Paper • 2505.16211 • Published May 22 • 18

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation

Paper • 2506.00385 • Published May 31 • 3

authored 4 papers 7 months ago

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4, 2024 • 38

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

Paper • 2502.03930 • Published Feb 6

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published Apr 11 • 130

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

Paper • 2110.09121 • Published Oct 18, 2021