readme: update info
Browse files
    	
        README.md
    CHANGED
    
    | @@ -22,7 +22,7 @@ Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://h | |
| 22 |  | 
| 23 | 
             
            Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
         | 
| 24 |  | 
| 25 | 
            -
            ** | 
| 26 |  | 
| 27 | 
             
            # Usage:
         | 
| 28 |  | 
| @@ -85,7 +85,8 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w | |
| 85 | 
             
            |----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
         | 
| 86 | 
             
            | BF16     | Available   | 439 GB    | Lossless :)                                | Old         | No       | Q8_0 is sufficient for most cases |
         | 
| 87 | 
             
            | Q8_0     | Available   | 233.27 GB | High quality *recommended*                 | Updated     | Yes      |       |
         | 
| 88 | 
            -
            |  | 
|  | |
| 89 | 
             
            | Q4_K_M   | Available   | 132 GB    | Medium quality *recommended*               | Old         | No       |       |
         | 
| 90 | 
             
            | Q3_K_M   | Available   | 104 GB    | Medium-low quality                         | Updated     | Yes      |       |
         | 
| 91 | 
             
            | IQ3_XS   | Available   | 89.6 GB   | Better than Q3_K_M                         | Old         | Yes      |       |
         | 
| @@ -101,7 +102,6 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w | |
| 101 | 
             
            | Q5_K_S            |         |
         | 
| 102 | 
             
            | Q4_K_S            |         |
         | 
| 103 | 
             
            | Q3_K_S            |         |
         | 
| 104 | 
            -
            | Q6_K              |         |
         | 
| 105 | 
             
            | IQ4_XS            |         |
         | 
| 106 | 
             
            | IQ2_XS            |         |
         | 
| 107 | 
             
            | IQ2_S             |         |
         | 
| @@ -118,10 +118,6 @@ deepseek2.leading_dense_block_count=int:1 | |
| 118 | 
             
            deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
         | 
| 119 | 
             
            ```
         | 
| 120 |  | 
| 121 | 
            -
            Quants with "Updated" metadata contain these parameters, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
         | 
| 122 | 
            -
             | 
| 123 | 
            -
            A precompiled Windows AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
         | 
| 124 | 
            -
             | 
| 125 | 
             
            # License:
         | 
| 126 | 
             
            - DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
         | 
| 127 | 
             
            - MIT license for any repo code
         | 
|  | |
| 22 |  | 
| 23 | 
             
            Using llama.cpp [b3026](https://github.com/ggerganov/llama.cpp/releases/tag/b3026) for quantizisation. Given the rapid release of llama.cpp builds, this will likely change over time.
         | 
| 24 |  | 
| 25 | 
            +
            **Please set the metadata KV overrides below.**
         | 
| 26 |  | 
| 27 | 
             
            # Usage:
         | 
| 28 |  | 
|  | |
| 85 | 
             
            |----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
         | 
| 86 | 
             
            | BF16     | Available   | 439 GB    | Lossless :)                                | Old         | No       | Q8_0 is sufficient for most cases |
         | 
| 87 | 
             
            | Q8_0     | Available   | 233.27 GB | High quality *recommended*                 | Updated     | Yes      |       |
         | 
| 88 | 
            +
            | Q8_0     | Available   | ~110 GB   | High quality *recommended*                 | Updated     | Yes      |       |
         | 
| 89 | 
            +
            | Q5_K_M   | Available   | 155 GB    | Medium-high quality *recommended*          | Updated     | Yes      |       |
         | 
| 90 | 
             
            | Q4_K_M   | Available   | 132 GB    | Medium quality *recommended*               | Old         | No       |       |
         | 
| 91 | 
             
            | Q3_K_M   | Available   | 104 GB    | Medium-low quality                         | Updated     | Yes      |       |
         | 
| 92 | 
             
            | IQ3_XS   | Available   | 89.6 GB   | Better than Q3_K_M                         | Old         | Yes      |       |
         | 
|  | |
| 102 | 
             
            | Q5_K_S            |         |
         | 
| 103 | 
             
            | Q4_K_S            |         |
         | 
| 104 | 
             
            | Q3_K_S            |         |
         | 
|  | |
| 105 | 
             
            | IQ4_XS            |         |
         | 
| 106 | 
             
            | IQ2_XS            |         |
         | 
| 107 | 
             
            | IQ2_S             |         |
         | 
|  | |
| 118 | 
             
            deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
         | 
| 119 | 
             
            ```
         | 
| 120 |  | 
|  | |
|  | |
|  | |
|  | |
| 121 | 
             
            # License:
         | 
| 122 | 
             
            - DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
         | 
| 123 | 
             
            - MIT license for any repo code
         | 
