Update ue8m0 config & description
Browse files- README.md +8 -1
- config.json +2 -1
    	
        README.md
    CHANGED
    
    | @@ -52,7 +52,9 @@ DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinkin | |
| 52 |  | 
| 53 | 
             
            - **Higher thinking efficiency**: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
         | 
| 54 |  | 
| 55 | 
            -
            DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. | 
|  | |
|  | |
| 56 |  | 
| 57 | 
             
            ## Model Downloads
         | 
| 58 |  | 
| @@ -196,6 +198,11 @@ tokenizer.apply_chat_template(messages, tokenize=False, thinking=False, add_gene | |
| 196 |  | 
| 197 | 
             
            The model structure of DeepSeek-V3.1 is the same as DeepSeek-V3. Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running this model locally.
         | 
| 198 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
| 199 | 
             
            ## License
         | 
| 200 |  | 
| 201 | 
             
            This repository and the model weights are licensed under the [MIT License](LICENSE).
         | 
|  | |
| 52 |  | 
| 53 | 
             
            - **Higher thinking efficiency**: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
         | 
| 54 |  | 
| 55 | 
            +
            DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens.
         | 
| 56 | 
            +
             | 
| 57 | 
            +
            Additionally, DeepSeek-V3.1 is trained using the **UE8M0 FP8 scale data format on both model weights and activations** to ensure compatibility with microscaling data formats. Please refer to [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) for more details.
         | 
| 58 |  | 
| 59 | 
             
            ## Model Downloads
         | 
| 60 |  | 
|  | |
| 198 |  | 
| 199 | 
             
            The model structure of DeepSeek-V3.1 is the same as DeepSeek-V3. Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running this model locally.
         | 
| 200 |  | 
| 201 | 
            +
            **Usage Recommendations:**
         | 
| 202 | 
            +
             | 
| 203 | 
            +
            1. **The `mlp.gate.e_score_correction_bias `parameters should be loaded and computed in FP32 precision.**
         | 
| 204 | 
            +
            2. **Ensure that FP8 model weights and activations are formatted using the UE8M0 scale format.**
         | 
| 205 | 
            +
             | 
| 206 | 
             
            ## License
         | 
| 207 |  | 
| 208 | 
             
            This repository and the model weights are licensed under the [MIT License](LICENSE).
         | 
    	
        config.json
    CHANGED
    
    | @@ -41,7 +41,8 @@ | |
| 41 | 
             
                "weight_block_size": [
         | 
| 42 | 
             
                  128,
         | 
| 43 | 
             
                  128
         | 
| 44 | 
            -
                ]
         | 
|  | |
| 45 | 
             
              },
         | 
| 46 | 
             
              "rms_norm_eps": 1e-06,
         | 
| 47 | 
             
              "rope_scaling": {
         | 
|  | |
| 41 | 
             
                "weight_block_size": [
         | 
| 42 | 
             
                  128,
         | 
| 43 | 
             
                  128
         | 
| 44 | 
            +
                ],
         | 
| 45 | 
            +
                "scale_fmt": "ue8m0"
         | 
| 46 | 
             
              },
         | 
| 47 | 
             
              "rms_norm_eps": 1e-06,
         | 
| 48 | 
             
              "rope_scaling": {
         | 
