ComfyUI will still cast all layers to fp8

#2
by silveroxides - opened

With fp8_scaled models that have layers at higher precision, it will still convert these to fp8 when loading by default. You will need a node using custom ops in order to prevent this. I got around it in this manner with my custom node and the fp8_scaled variant of your distill models I made.

With fp8_scaled models that have layers at higher precision, it will still convert these to fp8 when loading by default. You will need a node using custom ops in order to prevent this. I got around it in this manner with my custom node and the fp8_scaled variant of your distill models I made.

Any more specifics on this? I'm not sure where the right place to check is, but I tried dumping to_load in BaseModel.load_model_weights during UNETLoader, and I see:

Loading weight: blocks.0.cross_attn.k.bias, shape: torch.Size([5120]), dtype: torch.float32
Loading weight: blocks.0.cross_attn.k.weight, shape: torch.Size([5120, 5120]), dtype: torch.float8_e4m3fn

Maybe something is happening later, would be nice to have more info.

That's a WVW workflow, all of my programmatic workflows are built around Comfy's built-in WAN nodes and WVW is its own separate universe. Would be helpful to know where ComfyUI is actually forcing layers to FP8 if it is, I haven't found it yet.

And whether it matters--are these layers just set to higher precision because they're small and it can't hurt, or was it found to help? I'm a little surprised that quantized models don't always leave smaller layers alone (or maybe they do and I haven't noticed).

Sign up or log in to comment