Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -21,7 +21,7 @@ tags: 
     | 
|
| 21 | 
         | 
| 22 | 
         
             
            Today, we are officially open-sourcing Ring-mini-linear-2.0.
         
     | 
| 23 | 
         | 
| 24 | 
         
            -
            This model continues to employ a hybrid architecture that combines linear attention and standard attention mechanisms, striking a balance between performance and efficiency. Inheriting the efficient MoE (Mixture-of-Experts) design from the Ling 2.0 series, and through architectural optimizations such as a 1/32 expert activation ratio and MTP layers, Ring-mini-linear achieves the performance of an ~8B dense model while activating only 1.4B of its 16B total parameters. This model was converted from [Ling-mini-base-2.0](https://huggingface.co/inclusionAI/Ling-mini-base-2.0-20T), continually trained on an additional 600B tokens. In terms of performance, the hybrid linear model is comparable in overall performance to standard attention models of a similar size (e.g., Ring-mini-2) and surpasses other open-source MoE and Dense models of the same class on several challenging benchmarks.  
     | 
| 25 | 
         | 
| 26 | 
         
             
            <div style="display: flex; justify-content: center;">
         
     | 
| 27 | 
         
             
              <div style="text-align: center;">
         
     | 
| 
         @@ -182,7 +182,7 @@ from vllm import LLM, SamplingParams 
     | 
|
| 182 | 
         | 
| 183 | 
         
             
            tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-mini-linear-2.0")
         
     | 
| 184 | 
         | 
| 185 | 
         
            -
            sampling_params = SamplingParams(temperature=0. 
     | 
| 186 | 
         | 
| 187 | 
         
             
            llm = LLM(model="inclusionAI/Ring-mini-linear-2.0", dtype='bfloat16', enable_prefix_caching=False, max_num_seqs=128)
         
     | 
| 188 | 
         
             
            prompt = "Give me a short introduction to large language models."
         
     | 
| 
         | 
|
| 21 | 
         | 
| 22 | 
         
             
            Today, we are officially open-sourcing Ring-mini-linear-2.0.
         
     | 
| 23 | 
         | 
| 24 | 
         
            +
            This model continues to employ a hybrid architecture that combines linear attention and standard attention mechanisms, striking a balance between performance and efficiency. Inheriting the efficient MoE (Mixture-of-Experts) design from the Ling 2.0 series, and through architectural optimizations such as a 1/32 expert activation ratio and MTP layers, Ring-mini-linear achieves the performance of an ~8B dense model while activating only 1.4B of its 16B total parameters. This model was converted from [Ling-mini-base-2.0](https://huggingface.co/inclusionAI/Ling-mini-base-2.0-20T), continually trained on an additional 600B tokens. In terms of performance, the hybrid linear model is comparable in overall performance to standard attention models of a similar size (e.g., Ring-mini-2) and surpasses other open-source MoE and Dense models of the same class on several challenging benchmarks. Additionally, we support a 512k long context window, achieved by extrapolating the window 4x using YaRN. This provides superior speed, especially on tasks involving long inputs and outputs.
         
     | 
| 25 | 
         | 
| 26 | 
         
             
            <div style="display: flex; justify-content: center;">
         
     | 
| 27 | 
         
             
              <div style="text-align: center;">
         
     | 
| 
         | 
|
| 182 | 
         | 
| 183 | 
         
             
            tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-mini-linear-2.0")
         
     | 
| 184 | 
         | 
| 185 | 
         
            +
            sampling_params = SamplingParams(temperature=0.6, top_p=1.0, max_tokens=16384)
         
     | 
| 186 | 
         | 
| 187 | 
         
             
            llm = LLM(model="inclusionAI/Ring-mini-linear-2.0", dtype='bfloat16', enable_prefix_caching=False, max_num_seqs=128)
         
     | 
| 188 | 
         
             
            prompt = "Give me a short introduction to large language models."
         
     |