caizhi1 commited on
Commit
35a80cb
·
1 Parent(s): fa7be48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -21,7 +21,7 @@ tags:
21
 
22
  Today, we are officially open-sourcing Ring-mini-linear-2.0.
23
 
24
- This model continues to employ a hybrid architecture that combines linear attention and standard attention mechanisms, striking a balance between performance and efficiency. Inheriting the efficient MoE (Mixture-of-Experts) design from the Ling 2.0 series, and through architectural optimizations such as a 1/32 expert activation ratio and MTP layers, Ring-mini-linear achieves the performance of an ~8B dense model while activating only 1.4B of its 16B total parameters. This model was converted from [Ling-mini-base-2.0](https://huggingface.co/inclusionAI/Ling-mini-base-2.0-20T), continually trained on an additional 600B tokens. In terms of performance, the hybrid linear model is comparable in overall performance to standard attention models of a similar size (e.g., Ring-mini-2) and surpasses other open-source MoE and Dense models of the same class on several challenging benchmarks. Furthermore, it natively supports a 128k long context window, demonstrating superior speed and accuracy, especially on tasks involving long inputs and outputs.
25
 
26
  <div style="display: flex; justify-content: center;">
27
  <div style="text-align: center;">
@@ -182,7 +182,7 @@ from vllm import LLM, SamplingParams
182
 
183
  tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-mini-linear-2.0")
184
 
185
- sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=16384)
186
 
187
  llm = LLM(model="inclusionAI/Ring-mini-linear-2.0", dtype='bfloat16', enable_prefix_caching=False, max_num_seqs=128)
188
  prompt = "Give me a short introduction to large language models."
 
21
 
22
  Today, we are officially open-sourcing Ring-mini-linear-2.0.
23
 
24
+ This model continues to employ a hybrid architecture that combines linear attention and standard attention mechanisms, striking a balance between performance and efficiency. Inheriting the efficient MoE (Mixture-of-Experts) design from the Ling 2.0 series, and through architectural optimizations such as a 1/32 expert activation ratio and MTP layers, Ring-mini-linear achieves the performance of an ~8B dense model while activating only 1.4B of its 16B total parameters. This model was converted from [Ling-mini-base-2.0](https://huggingface.co/inclusionAI/Ling-mini-base-2.0-20T), continually trained on an additional 600B tokens. In terms of performance, the hybrid linear model is comparable in overall performance to standard attention models of a similar size (e.g., Ring-mini-2) and surpasses other open-source MoE and Dense models of the same class on several challenging benchmarks. Additionally, we support a 512k long context window, achieved by extrapolating the window 4x using YaRN. This provides superior speed, especially on tasks involving long inputs and outputs.
25
 
26
  <div style="display: flex; justify-content: center;">
27
  <div style="text-align: center;">
 
182
 
183
  tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ring-mini-linear-2.0")
184
 
185
+ sampling_params = SamplingParams(temperature=0.6, top_p=1.0, max_tokens=16384)
186
 
187
  llm = LLM(model="inclusionAI/Ring-mini-linear-2.0", dtype='bfloat16', enable_prefix_caching=False, max_num_seqs=128)
188
  prompt = "Give me a short introduction to large language models."