Improve model card with paper link and pipeline tag
#80
by
						
nielsr
	
							HF Staff
						- opened
							
					
    	
        README.md
    CHANGED
    
    | @@ -1,7 +1,9 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
            -
            license: mit
         | 
| 3 | 
             
            library_name: transformers
         | 
|  | |
|  | |
| 4 | 
             
            ---
         | 
|  | |
| 5 | 
             
            # DeepSeek-V3-0324
         | 
| 6 | 
             
            <!-- markdownlint-disable first-line-h1 -->
         | 
| 7 | 
             
            <!-- markdownlint-disable html -->
         | 
| @@ -197,5 +199,15 @@ This repository and the model weights are licensed under the [MIT License](LICEN | |
| 197 | 
             
            }
         | 
| 198 | 
             
            ```
         | 
| 199 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 200 | 
             
            ## Contact
         | 
| 201 | 
            -
            If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
         | 
|  | |
| 1 | 
             
            ---
         | 
|  | |
| 2 | 
             
            library_name: transformers
         | 
| 3 | 
            +
            license: mit
         | 
| 4 | 
            +
            pipeline_tag: text-generation
         | 
| 5 | 
             
            ---
         | 
| 6 | 
            +
             | 
| 7 | 
             
            # DeepSeek-V3-0324
         | 
| 8 | 
             
            <!-- markdownlint-disable first-line-h1 -->
         | 
| 9 | 
             
            <!-- markdownlint-disable html -->
         | 
|  | |
| 199 | 
             
            }
         | 
| 200 | 
             
            ```
         | 
| 201 |  | 
| 202 | 
            +
            ## Paper title and link
         | 
| 203 | 
            +
             | 
| 204 | 
            +
            The model was presented in the paper [Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures](https://huggingface.co/papers/2505.09343).
         | 
| 205 | 
            +
             | 
| 206 | 
            +
            ## Paper abstract
         | 
| 207 | 
            +
             | 
| 208 | 
            +
            The abstract of the paper is the following:
         | 
| 209 | 
            +
             | 
| 210 | 
            +
            The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconnection bandwidth. DeepSeek-V3, trained on 2,048 NVIDIA H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale. This paper presents an in-depth analysis of the DeepSeek-V3/R1 model architecture and its AI infrastructure, highlighting key innovations such as Multi-head Latent Attention (MLA) for enhanced memory efficiency, Mixture of Experts (MoE) architectures for optimized computation-communication trade-offs, FP8 mixed-precision training to unlock the full potential of hardware capabilities, and a Multi-Plane Network Topology to minimize cluster-level network overhead. Building on the hardware bottlenecks encountered during DeepSeek-V3's development, we engage in a broader discussion with academic and industry peers on potential future hardware directions, including precise low-precision computation units, scale-up and scale-out convergence, and innovations in low-latency communication fabrics. These insights underscore the critical role of hardware and model co-design in meeting the escalating demands of AI workloads, offering a practical blueprint for innovation in next-generation AI systems.
         | 
| 211 | 
            +
             | 
| 212 | 
             
            ## Contact
         | 
| 213 | 
            +
            If you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).
         | 
