MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published May 12 • 132
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates Paper • 2509.09550 • Published Sep 11 • 2
PP-StructureV3 Collection PP-StructureV3 is a SOTA document parsing solution on OmniDocBench, supporting the conversion of PDFs and do cument images to Markdown and JSON. • 17 items • Updated Sep 15 • 9
PP-OCRv5 Collection PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15 • 48
AFM-Datasets Collection Training datasets of the paper: Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL • 6 items • Updated Aug 6 • 5
AFM-Models Collection The models and training dataset of the paper: Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL • 12 items • Updated Aug 6 • 16
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 127
view article Article Introducing ColQwen-Omni: Retrieve in every modality By manu and 4 others • Jul 17 • 75
MiniMax-M1 Collection MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. • 6 items • Updated 8 days ago • 111