Use unsloth BF16 GGUF to quantize IQ1_M.
Added IQ1_S_M Quantization
IQ1_S_M Quantization Details
--output-tensor-type Q6_K--token-embedding-type Q6_K
--tensor-type ffn_down_exps=Q2_K
--tensor-type blk.[4-48].ffn_down_exps=IQ1_S
--tensor-type blk.[2|3].ffn_down_exps=MXFP4
--tensor-type blk.89.ffn_down_exps=Q8_0
--tensor-type blk.90.ffn_down_exps=BF16
--tensor-type blk.[91|92].ffn_down_exps=Q5_K
--tensor-type blk.93.ffn_down_exps=Q6_K
--tensor-type blk.[80-88].ffn_down_exps=IQ3_XXS
--tensor-type blk.[49-74].ffn_down_exps=IQ1_M
--tensor-type blk.[0|1|25|29|45|59|79].ffn_down_exps=Q2_K
--tensor-type ffn_gate_exps=iq1_s
--tensor-type blk.3.ffn_gate_exps=mxfp4
--tensor-type blk.[76|77].ffn_gate_exps=iq2_xs
--tensor-type blk.[85-88].ffn_gate_exps=iq3_xxs
--tensor-type blk.[89-93].ffn_gate_exps=iq4_xs
--tensor-type ffn_up_exps=iq1_s
--tensor-type blk.3.ffn_up_exps=mxfp4
--tensor-type blk.[76|77].ffn_up_exps=iq2_xs
--tensor-type blk.[85-88].ffn_up_exps=iq3_xxs
--tensor-type blk.[89-93].ffn_up_exps=iq4_xs
--tensor-type attn=iq4_xs
llama-cli test (reasoning) with 780M integrated GPU (ROCm 6.4.2)
M:\llama_latest\build\bin>.\llama-cli.exe -m M:\IQ1_S_M\Qwen3-235B-A22B-Thinking-2507-IQ1_S_M-00001-of-00010.gguf -c 8192ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon 780M Graphics, gfx1103 (0x1103), VMM: no, Wave Size: 32
build: 6367 (2c8dac72) with clang version 20.0.0git (git@github.com:Compute-Mirrors/llvm-project 33ab2c2f7838239f1e2e5c06432bbb8d887e8cb2) for x86_64-pc-windows-msvc
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon 780M Graphics) - 59175 MiB free
llama_model_loader: additional 9 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 48 key-value pairs and 1131 tensors from M:\IQ1_S_M\Qwen3-235B-A22B-Thinking-2507-IQ1_S_M-00001-of-00010.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen3moe
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen3-235B-A22B-Thinking-2507
llama_model_loader: - kv 3: general.version str = 2507
llama_model_loader: - kv 4: general.finetune str = Thinking
llama_model_loader: - kv 5: general.basename str = Qwen3-235B-A22B-Thinking-2507
llama_model_loader: - kv 6: general.quantized_by str = Unsloth
llama_model_loader: - kv 7: general.size_label str = 235B-A22B
llama_model_loader: - kv 8: general.license str = apache-2.0
llama_model_loader: - kv 9: general.license.link str = https://huggingface.co/Qwen/Qwen3-235...
llama_model_loader: - kv 10: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 11: general.base_model.count u32 = 1
llama_model_loader: - kv 12: general.base_model.0.name str = Qwen3 235B A22B Thinking 2507
llama_model_loader: - kv 13: general.base_model.0.version str = 2507
llama_model_loader: - kv 14: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 15: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3-235...
llama_model_loader: - kv 16: general.tags arr[str,2] = ["unsloth", "text-generation"]
llama_model_loader: - kv 17: qwen3moe.block_count u32 = 94
llama_model_loader: - kv 18: qwen3moe.context_length u32 = 262144
llama_model_loader: - kv 19: qwen3moe.embedding_length u32 = 4096
llama_model_loader: - kv 20: qwen3moe.feed_forward_length u32 = 12288
llama_model_loader: - kv 21: qwen3moe.attention.head_count u32 = 64
llama_model_loader: - kv 22: qwen3moe.attention.head_count_kv u32 = 4
llama_model_loader: - kv 23: qwen3moe.rope.freq_base f32 = 5000000.000000
llama_model_loader: - kv 24: qwen3moe.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 25: qwen3moe.expert_used_count u32 = 8
llama_model_loader: - kv 26: qwen3moe.attention.key_length u32 = 128
llama_model_loader: - kv 27: qwen3moe.attention.value_length u32 = 128
llama_model_loader: - kv 28: qwen3moe.expert_count u32 = 128
llama_model_loader: - kv 29: qwen3moe.expert_feed_forward_length u32 = 1536
llama_model_loader: - kv 30: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 31: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 32: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 33: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 34: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 35: tokenizer.ggml.eos_token_id u32 = 151645
llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 151654
llama_model_loader: - kv 37: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 38: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
llama_model_loader: - kv 39: general.quantization_version u32 = 2
llama_model_loader: - kv 40: general.file_type u32 = 24
llama_model_loader: - kv 41: quantize.imatrix.file str = ..\imatrix_unsloth.gguf
llama_model_loader: - kv 42: quantize.imatrix.dataset str = unsloth_calibration_Qwen3-235B-A22B-T...
llama_model_loader: - kv 43: quantize.imatrix.entries_count u32 = 744
llama_model_loader: - kv 44: quantize.imatrix.chunks_count u32 = 684
llama_model_loader: - kv 45: split.no u16 = 0
llama_model_loader: - kv 46: split.count u16 = 10
llama_model_loader: - kv 47: split.tensors.count i32 = 1131
llama_model_loader: - type f32: 471 tensors
llama_model_loader: - type q8_0: 1 tensors
llama_model_loader: - type q2_K: 11 tensors
llama_model_loader: - type q5_K: 2 tensors
llama_model_loader: - type q6_K: 3 tensors
llama_model_loader: - type iq2_xs: 4 tensors
llama_model_loader: - type iq3_xxs: 17 tensors
llama_model_loader: - type iq1_s: 206 tensors
llama_model_loader: - type iq4_xs: 386 tensors
llama_model_loader: - type iq1_m: 25 tensors
llama_model_loader: - type bf16: 1 tensors
llama_model_loader: - type mxfp4: 4 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = IQ1_S - 1.5625 bpw
print_info: file size = 56.71 GiB (2.07 BPW)
load: printing all EOG tokens:
load: - 151643 ('<|endoftext|>')
load: - 151645 ('<|im_end|>')
load: - 151662 ('<|fim_pad|>')
load: - 151663 ('<|repo_name|>')
load: - 151664 ('<|file_sep|>')
load: special tokens cache size = 26
load: token to piece cache size = 0.9311 MB
print_info: arch = qwen3moe
print_info: vocab_only = 0
print_info: n_ctx_train = 262144
print_info: n_embd = 4096
print_info: n_layer = 94
print_info: n_head = 64
print_info: n_head_kv = 4
print_info: n_rot = 128
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 16
print_info: n_embd_k_gqa = 512
print_info: n_embd_v_gqa = 512
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 12288
print_info: n_expert = 128
print_info: n_expert_used = 8
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 5000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 262144
print_info: rope_finetuned = unknown
print_info: model type = 235B.A22B
print_info: model params = 235.09 B
print_info: general.name = Qwen3-235B-A22B-Thinking-2507
print_info: n_ff_exp = 1536
print_info: vocab type = BPE
print_info: n_vocab = 151936
print_info: n_merges = 151387
print_info: BOS token = 11 ','
print_info: EOS token = 151645 '<|im_end|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151654 '<|vision_pad|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 94 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 95/95 layers to GPU
load_tensors: ROCm0 model buffer size = 57581.65 MiB
load_tensors: CPU_Mapped model buffer size = 486.86 MiB
.................................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 8192
llama_context: n_ctx_per_seq = 8192
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 5000000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (8192) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
llama_context: ROCm_Host output buffer size = 0.58 MiB
llama_kv_cache: ROCm0 KV buffer size = 1504.00 MiB
llama_kv_cache: size = 1504.00 MiB ( 8192 cells, 94 layers, 1/1 seqs), K (f16): 752.00 MiB, V (f16): 752.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context: ROCm0 compute buffer size = 304.75 MiB
llama_context: ROCm_Host compute buffer size = 24.01 MiB
llama_context: graph nodes = 5835
llama_context: graph splits = 2
common_init_from_params: added <|endoftext|> logit bias = -inf
common_init_from_params: added <|im_end|> logit bias = -inf
common_init_from_params: added <|fim_pad|> logit bias = -inf
common_init_from_params: added <|repo_name|> logit bias = -inf
common_init_from_params: added <|file_sep|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 8
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
main: chat template example:
<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
main: interactive mode on. sampler seed: 2824890048 sampler params: repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 8192 top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist generate: n_ctx = 8192, n_batch = 2048, n_predict = -1, n_keep = 0
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with ''.
- Not using system message. To change it, set a different value via -sys PROMPT
我有一个vehicle mass 质量为M kg,轮胎的半径为R,单位为米。axle之前有一个ratio为n的transmission。 n定义为input speed 除以output speed。 求换算到transmission input端的vehicle inertia。 嗯,用户的问题是关于将车辆惯性换算到传动系统输入端的。让我仔细看看。
用户有一个质量为M kg的车辆,轮胎半径是R米,还有一个传动比n的变速箱,n定义为输入转速除以输出转速。需要求换算到变速箱输入端的车辆惯性。首先得理解这个问题。
首先,惯性应该是指转动惯量(moment of inertia)吧?因为车辆整体是平动质量,但问题提到轮胎和传动系统,可能涉及旋转惯性。用户说的"vehicle inertia"可能是指等效到旋转轴上的转动惯量。
关键点:传动比n = 输入转速 / 输出转速。通常传动比定义为输入转速比输出转速,所以如果是减速箱,n>1的话,输出转速比输入慢。比如差速器或者变速箱。
现在,车辆的平动质量M,轮胎半径R,当车辆移动时,轮胎转动,带动传动系统。需要把车辆的平动惯性等效到传动输入轴上。
回想一下等效转动惯量的计算方法。当有平动和转动转换时,惯性需要根据传动比进行缩放。
首先,轮胎的转动。轮胎的线速度v = ω * R,其中ω是轮胎的角速度(rad/s)。这里轮胎的角速度就是传动系统的输出转速,因为轮胎直接装在轴上。
而传动比n = 输入转速 / 输出转速,所以输入转速 = n * 输出转速。也就是说,输出转速 = 输入转速 / n。
假设车辆速度v = ω_output * R,其中ω_output是输出端的角速度(轮胎的角速度)。
而输入端的角速度ω_input = n * ω_output,因为n = ω_input / ω_output。
现在,车辆的平动动能是(1/2)M v²。而旋转动能在输入端会是(1/2) I_input ω_input²。
等效的意思是,当车辆运动时,其平动动能应该等于等效到输入端的旋转动能。或者说,惯性需要满足动能等效。
所以,(1/2) M v² = (1/2) I_input ω_input²
代入v = ω_output * R,而ω_output = ω_input / n
所以v = (ω_input / n) * R
代入动能等式:
(1/2) M ( (ω_input R)/n )² = (1/2) I_input ω_input²
两边都除以(1/2) ω_input²,得到:
M (R² / n²) = I_input
所以I_input = M R² / n²
但这里可能有问题,因为通常转动惯量的等效要考虑传动比的平方。
比如,一个质量平动等效到旋转轴,惯性是M * (R)^2,但如果有传动比,就需要乘以(1/n)^2,因为速度被n衰减了,所以惯性要除以n²。
例如,假设输出端(轮胎)的等效转动惯量是MR²(这其实是平动质量等效为转动惯量,公式是J = MR²,但严格来说,对于纯滚动,轮胎的转动惯量可能还包括轮胎自身的惯性,但这里用户可能指的是整车的平动质量等效到转动惯量)。
通常,车辆的平动质量M,在旋转系统中等效的转动惯量为J = M * R²,其中R是滚动半径。比如,当车辆加速时,相当于在旋转系统中有一个等效惯量J = M*R²。
现在,这个J是在输出端(轮胎轴)的。现在要换算到输入端,即变速箱的输入轴。
由于传动比n = 输入转速 / 输出转速,所以输出转速 = 输入转速 / n。
因此,输出端的角速度是输入端的1/n倍。
转动惯量的等效需要考虑转速比的平方。因为动能守恒:J_output * ω_output² = J_input * ω_input²
但实际等效惯量转换公式是:J_input = J_output * (ω_output / ω_input)²
因为ω_output = ω_input / n,所以比例是1/n。
所以J_input = J_output * (1/n)²
而J_output是M*R²(车辆平动等效的转动惯量)
所以J_input = (M R²) * (1/n²) = M R² / n²
对吗?比如,假设n=2,输入转速是输出的两倍,那么输入端的等效惯量会比输出端小,因为转速高,同样的动能,惯量小。比如,输出端有较大的惯量,但输入端转速高,所以等效惯量小。
例如,假设输出端惯量J_out,那么输入端的等效惯量J_in = J_out * (N_out / N_in)^2,这里传动比n = N_in / N_out,所以N_out / N_in = 1/n,因此J_in = J_out * (1/n)^2
是的,这个公式是对的。
所以,车辆的等效转动惯量在输出端是MR²(假设轮胎没有其他惯性,仅考虑平动质量等效),那么换算到输入端就是MR² / n²
所以答案应该是 I = (M * R²) / n²
但需要确认用户的问题是否如此。
用户的问题中说"vehicle inertia",可能指的是整车的平动惯性等效到旋转系统。通常,车辆的平动质量在传动系统中等效为转动惯量,公式是J = M*R²,其中R是轮胎滚动半径。
所以,当这个等效转动惯量位于输出端(传动系统的输出侧,即轮胎侧),要换算到输入端,需要除以n²,因为传动比n是输入转速比输出转速,所以速度比是n:1,惯量与速度平方成反比。
例如,若传动比n=3,输入转速是输出的3倍,那么输入端的等效惯量是输出端的1/9。
因此,最终结果应该是 M*R² / n²
现在检查单位是否正确:M是kg,R是米,所以M*R²是kg·m²,n是无量纲,所以结果单位是kg·m²,正确。
可能用户提到的"axle之前有一个ratio为n的transmission",axle应该是指驱动轴,即传动系统的输出端连接到车轮轴。所以传动系统输入端到输出端的传动比n=输入转速/输出转速。
所以,结论正确。
要将车辆的平动惯性等效换算到传动系统输入端,需考虑质量、轮胎半径及传动比的影响。以下是详细推导和结果:
关键概念
- 车辆平动惯性:车辆质量 $ M $(kg)的平动运动等效为转动惯量时,其等效转动惯量在轮胎轴(输出端)为 $ J_{\text{output}} = M \cdot R^2 $(单位:kg·m²),其中 $ R $ 为轮胎半径(m)。
- 传动比定义:$ n = \frac{\text{输入转速}}{\text{输出转速}} $,即输入端转速是输出端的 $ n $ 倍。
- 惯量换算规则:等效转动惯量与转速比的平方成反比。若输出端惯量为 $ J_{\text{output}} $,则输入端等效惯量为: $$ J_{\text{input}} = J_{\text{output}} \cdot \left( \frac{\text{输出转速}}{\text{输入转速}} \right)^2 $$
推导过程
输出端等效惯量 车辆平动质量 $ M $ 等效到轮胎轴(输出端)的转动惯量为: $$ J_{\text{output}} = M \cdot R^2 $$
转速关系 由传动比定义 $ n = \frac{\omega_{\text{input}}}{\omega_{\text{output}}} $,得: $$ \omega_{\text{output}} = \frac{\omega_{\text{input}}}{n} $$
输入端等效惯量 根据惯量换算规则: $$ J_{\text{input}} = J_{\text{output}} \cdot \left( \frac{\omega_{\text{output}}}{\omega_{\text{input}}} \right)^2 = (M \cdot R^2) \cdot \left( \frac{1}{n} \right)^2 $$ 化简得: $$ J_{\text{input}} = \frac{M \cdot R^2}{n^2} $$
最终结果
换算到传动系统输入端的车辆等效转动惯量为:
单位验证
- $ M $: kg
- $ R $: m
- $ n $: 无量纲
- 结果 $ J_{\text{input}} $: kg·m²(符合转动惯量单位)
物理意义
- 传动比 $ n $ 的影响:当 $ n > 1 $(减速传动),输入端等效惯量显著减小,因为输入端转速更高,相同动能下惯量更小。
- 典型场景:例如电动汽车中,若 $ n = 10 $,输入端等效惯量仅为输出端的 $ 1/100 $,大幅降低电机负载。
此结果适用于将车辆平动惯性等效至传动系统输入轴,适用于动力学分析或控制设计。
llama_perf_sampler_print: sampling time = 124.87 ms / 2306 runs ( 0.05 ms per token, 18467.06 tokens per second)
llama_perf_context_print: load time = 123893.22 ms
llama_perf_context_print: prompt eval time = 5562.49 ms / 67 tokens ( 83.02 ms per token, 12.04 tokens per second)
llama_perf_context_print: eval time = 418866.68 ms / 2238 runs ( 187.16 ms per token, 5.34 tokens per second)
llama_perf_context_print: total time = 584478.10 ms / 2305 tokens
llama_perf_context_print: graphs reused = 2228
Interrupted by user
- Downloads last month
- 6
1-bit
2-bit
Model tree for lovedheart/Qwen3-235B-A22B-Thinking-2507-GGUF-IQ1-M
Base model
Qwen/Qwen3-235B-A22B-Thinking-2507