Optimize memory usage

Improves memory management by releasing temporary tensors throughout the CADE2.5 easy and hard modules, reducing RAM/VRAM peaks. Adds optional auto-save functionality for final images with configurable compression. Updates README with memory tips and clarifies workflow steps. Adjusts presets for sharpen, guidance, NAG, and mid-frequency stabilizer parameters to further optimize performance and output quality.

Files changed (5) hide show

README.md +8 -3
mod/easy/mg_cade25_easy.py +225 -8
mod/hard/mg_cade25.py +208 -10
pressets/mg_cade25.cfg +52 -36
pressets/mg_controlfusion.cfg +5 -5

README.md CHANGED Viewed

@@ -66,7 +66,12 @@ Photo Dog
 - Notes
   - Lowering the starting latent (e.g., 512x768) or lower, reduces both VRAM and RAM.
   - Disabling hi-res depth/edges (ControlFusion) reduces peaks. (not recommended!)
-  - Depth weights add a bit of RAM on load; models live under `depth-anything/`.
 ## Install (ComfyUI 0.3.60, tested on this version)
@@ -89,7 +94,7 @@ Folder `workflows/` contains ready-to-use graphs:
 You can save this workflow to ComfyUI `ComfyUI\user\default\workflows`
 6. Restart ComfyUI. Nodes appear under the "MagicNodes" categories.
-💥 I strongly recommend use `mg_Easy-Workflow` workflow + default settings + your model and my negative LoRA `mg_7lambda_negative.safetensors`, for best result.
 ## 🚀 "One-Node" Quickstart (MG_SuperSimple)
@@ -104,6 +109,7 @@ Notes:
 - When "Custom" is off, presets fully drive parameters
 - When "Custom" is on, the visible CADE controls override the Step presets across all steps; Step 1 still enforces `denoise=1.0`
 - CLIP Vision (if connected) is applied from Step 2 onward; if no reference image is provided, SuperSimple uses the previous step image as reference
 ## ❗Tips
 (!) There are almost always artifacts in the first step, don't pay attention to them, they will be removed in the next steps. Keep your prompt clean and logical, don't duplicate details and be careful with symbols.
@@ -141,7 +147,6 @@ Notes:
 15) The 4th step sometimes saves the image for a long time, just wait for the end of the process, it depends on the initial resolution you set.
 ## Repository Layout
 ```
 MagicNodes/

 - Notes
   - Lowering the starting latent (e.g., 512x768) or lower, reduces both VRAM and RAM.
   - Disabling hi-res depth/edges (ControlFusion) reduces peaks. (not recommended!)
+  - Depth weights add a bit of RAM on load; models live under `depth-anything/`.
+## 💥 Memory and ComfyUI
+- !!!!!!!During VAE tiling at ultra-high resolutions, more than those that fit into the memory of your video card, some of the processes stick in RAM. These are the nuance of ComfyUI, just restart ComfyUI. For my part, I clear everything that can be cleared from RAM and VRAM.!!!!!!
+- At each step, the image is upscaled from the previous step! Keep this in mind, the final image may not fit into your PC's memory if the starting latency is high.
 ## Install (ComfyUI 0.3.60, tested on this version)
 You can save this workflow to ComfyUI `ComfyUI\user\default\workflows`
 6. Restart ComfyUI. Nodes appear under the "MagicNodes" categories.
+I strongly recommend use `mg_Easy-Workflow` workflow + default settings + your model and my negative LoRA `mg_7lambda_negative.safetensors`, for best result.
 ## 🚀 "One-Node" Quickstart (MG_SuperSimple)
 - When "Custom" is off, presets fully drive parameters
 - When "Custom" is on, the visible CADE controls override the Step presets across all steps; Step 1 still enforces `denoise=1.0`
 - CLIP Vision (if connected) is applied from Step 2 onward; if no reference image is provided, SuperSimple uses the previous step image as reference
+- Step 1 and Step 2 it's a prewarming step.
 ## ❗Tips
 (!) There are almost always artifacts in the first step, don't pay attention to them, they will be removed in the next steps. Keep your prompt clean and logical, don't duplicate details and be careful with symbols.
 15) The 4th step sometimes saves the image for a long time, just wait for the end of the process, it depends on the initial resolution you set.
 ## Repository Layout
 ```
 MagicNodes/

mod/easy/mg_cade25_easy.py CHANGED Viewed

@@ -295,7 +295,41 @@ def _clipseg_build_mask(image_bhwc: torch.Tensor,
             except Exception:
                 pass
         m = (m * float(max(0.0, gain))).clamp(0, 1)
-        return m.unsqueeze(0).unsqueeze(-1)  # BHWC with B=1,C=1
     except Exception as e:
         if not globals().get("_CLIPSEG_WARNED", False):
             print(f"[CADE2.5][CLIPSeg] mask failed: {e}")
@@ -1052,16 +1086,27 @@ def safe_decode(vae, lat, tile=512, ovlp=64):
             out = out.detach()
         except Exception:
             pass
         try:
-            out = out.to('cpu')
         except Exception:
             pass
         if torch.cuda.is_available():
-            torch.cuda.synchronize()
-            torch.cuda.empty_cache()
     except Exception:
-        pass
-    return out
 def safe_encode(vae, img, tile=512, ovlp=64):
@@ -1953,6 +1998,10 @@ class ComfyAdaptiveDetailEnhancer25:
                 "clipseg_blend": (["fuse", "replace", "intersect"], {"default": "fuse", "tooltip": "How to combine CLIPSeg with ONNX mask."}),
                 "clipseg_ref_gate": ("BOOLEAN", {"default": False, "tooltip": "If reference provided, boost mask when far from reference (CLIP-Vision)."}),
                 "clipseg_ref_threshold": ("FLOAT", {"default": 0.03, "min": 0.0, "max": 0.2, "step": 0.001}),
                 # Polish mode (final hi-res refinement)
                 "polish_enable": ("BOOLEAN", {"default": False, "tooltip": "Polish: keep low-frequency shape from reference while allowing high-frequency details to refine."}),
@@ -1993,6 +2042,7 @@ class ComfyAdaptiveDetailEnhancer25:
                      clipseg_gain=1.0, clipseg_blend="fuse", clipseg_ref_gate=False, clipseg_ref_threshold=0.03,
                     polish_enable=False, polish_keep_low=0.4, polish_edge_lock=0.2, polish_sigma=1.0,
                    polish_start_after=1, polish_keep_low_ramp=0.2,
                      preset_step="Step 1", custom_override=False):
         # Cooperative cancel before any heavy work
         model_management.throw_exception_if_processing_interrupted()
@@ -2063,7 +2113,7 @@ class ComfyAdaptiveDetailEnhancer25:
         aq_tile = int(pv("aq_tile", 32))
         aq_stride = int(pv("aq_stride", 16))
         aq_alpha = float(pv("aq_alpha", 2.0))
-        aq_ema_beta = float(pv("aq_ema_beta", 0.8))
         midfreq_enable = bool(pv("midfreq_enable", False))
         midfreq_gain = float(pv("midfreq_gain", 0.0))
         midfreq_sigma_lo = float(pv("midfreq_sigma_lo", 0.8))
@@ -2258,6 +2308,18 @@ class ComfyAdaptiveDetailEnhancer25:
                                 CURRENT_ONNX_MASK_BCHW = None
                         except Exception:
                             CURRENT_ONNX_MASK_BCHW = None
                     # One-time damping from area (disabled by default)
                     if False:
                         try:
@@ -2340,7 +2402,11 @@ class ComfyAdaptiveDetailEnhancer25:
                             device='cpu',
                             generator=gen,
                         ).to(current_latent["samples"].device)
-                        current_latent["samples"] += (noise_offset * fade) * eps
                     # Pre-sampling ONNX detectors: handled once below (kept compact)
@@ -2433,6 +2499,20 @@ class ComfyAdaptiveDetailEnhancer25:
                                         CURRENT_ONNX_MASK_BCHW = None
                     except Exception:
                         pass
                     # Sampler model prepared once above; reused across iterations (no-op here)
                     sampler_model = sampler_model
@@ -2525,6 +2605,26 @@ class ComfyAdaptiveDetailEnhancer25:
                                     current_latent = lat_b
                                 else:
                                     current_latent = lat_a
                     except Exception:
                         pass
@@ -2578,6 +2678,31 @@ class ComfyAdaptiveDetailEnhancer25:
                     # cooperative cancel immediately after sampling
                     model_management.throw_exception_if_processing_interrupted()
                     if bool(latent_compare):
                         _cur = current_latent["samples"]
@@ -2598,6 +2723,10 @@ class ComfyAdaptiveDetailEnhancer25:
                             current_denoise = max(0.20, current_denoise * damp)
                             cfg_damp = 0.997 if damp > 0.9 else 0.99
                             current_cfg = max(1.0, current_cfg * cfg_damp)
                     # AQClip-Lite: adaptive soft clipping in latent space (before decode)
                     try:
@@ -2619,6 +2748,14 @@ class ComfyAdaptiveDetailEnhancer25:
                                 H_override=H_override,
                             )
                             current_latent["samples"] = z_new
                     except Exception:
                         pass
@@ -2705,6 +2842,33 @@ class ComfyAdaptiveDetailEnhancer25:
                             # Feed back to latent for next steps
                             current_latent = {"samples": safe_encode(vae, img2)}
                             image = img2
                         except Exception:
                             pass
@@ -2850,6 +3014,31 @@ class ComfyAdaptiveDetailEnhancer25:
         except Exception:
             pass
         return current_latent, image, int(current_steps), float(current_cfg), float(current_denoise), onnx_mask_img
@@ -3107,6 +3296,34 @@ def _smart_seed_select(model,
                 if score > best_score:
                     best_score = score
                     best_seed = sd
             except Exception as e:
                 # do not swallow user interruption; also honour sentinel
                 if isinstance(e, model_management.InterruptProcessingException) or globals().get("_MG_CANCEL_REQUESTED", False):

             except Exception:
                 pass
         m = (m * float(max(0.0, gain))).clamp(0, 1)
+        out_mask = m.unsqueeze(0).unsqueeze(-1)  # BHWC with B=1,C=1
+        # Best-effort release of temporaries to reduce RAM peak
+        try:
+            del inputs
+        except Exception:
+            pass
+        try:
+            del outputs
+        except Exception:
+            pass
+        try:
+            del logits
+        except Exception:
+            pass
+        try:
+            del prob
+        except Exception:
+            pass
+        try:
+            del pil_img
+        except Exception:
+            pass
+        try:
+            del arr
+        except Exception:
+            pass
+        try:
+            del x
+        except Exception:
+            pass
+        try:
+            del img
+        except Exception:
+            pass
+        return out_mask
     except Exception as e:
         if not globals().get("_CLIPSEG_WARNED", False):
             print(f"[CADE2.5][CLIPSeg] mask failed: {e}")
             out = out.detach()
         except Exception:
             pass
+        out_cpu = out
+        try:
+            out_cpu = out_cpu.to('cpu')
+        except Exception:
+            pass
         try:
+            del out
         except Exception:
             pass
         if torch.cuda.is_available():
+            try:
+                torch.cuda.synchronize()
+            except Exception:
+                pass
+            try:
+                torch.cuda.empty_cache()
+            except Exception:
+                pass
+        return out_cpu
     except Exception:
+        return out
 def safe_encode(vae, img, tile=512, ovlp=64):
                 "clipseg_blend": (["fuse", "replace", "intersect"], {"default": "fuse", "tooltip": "How to combine CLIPSeg with ONNX mask."}),
                 "clipseg_ref_gate": ("BOOLEAN", {"default": False, "tooltip": "If reference provided, boost mask when far from reference (CLIP-Vision)."}),
                 "clipseg_ref_threshold": ("FLOAT", {"default": 0.03, "min": 0.0, "max": 0.2, "step": 0.001}),
+                # Under-the-hood saving (disabled by default to avoid duplicate saves)
+                "auto_save": ("BOOLEAN", {"default": False, "tooltip": "Save final IMAGE directly from CADE (uses low PNG compress to reduce RAM)."}),
+                "save_prefix": ("STRING", {"default": "ComfyUI", "multiline": False}),
+                "save_compress": ("INT", {"default": 1, "min": 0, "max": 9, "step": 1}),
                 # Polish mode (final hi-res refinement)
                 "polish_enable": ("BOOLEAN", {"default": False, "tooltip": "Polish: keep low-frequency shape from reference while allowing high-frequency details to refine."}),
                      clipseg_gain=1.0, clipseg_blend="fuse", clipseg_ref_gate=False, clipseg_ref_threshold=0.03,
                     polish_enable=False, polish_keep_low=0.4, polish_edge_lock=0.2, polish_sigma=1.0,
                    polish_start_after=1, polish_keep_low_ramp=0.2,
+                    auto_save=False, save_prefix="ComfyUI", save_compress=1,
                      preset_step="Step 1", custom_override=False):
         # Cooperative cancel before any heavy work
         model_management.throw_exception_if_processing_interrupted()
         aq_tile = int(pv("aq_tile", 32))
         aq_stride = int(pv("aq_stride", 16))
         aq_alpha = float(pv("aq_alpha", 2.0))
+        aq_ema_beta = float(pv("aq_ema_beta", 0.85))
         midfreq_enable = bool(pv("midfreq_enable", False))
         midfreq_gain = float(pv("midfreq_gain", 0.0))
         midfreq_sigma_lo = float(pv("midfreq_sigma_lo", 0.8))
                                 CURRENT_ONNX_MASK_BCHW = None
                         except Exception:
                             CURRENT_ONNX_MASK_BCHW = None
+                        try:
+                            del onnx_mask
+                        except Exception:
+                            pass
+                        try:
+                            del om
+                        except Exception:
+                            pass
+                        try:
+                            del img_preview
+                        except Exception:
+                            pass
                     # One-time damping from area (disabled by default)
                     if False:
                         try:
                             device='cpu',
                             generator=gen,
                         ).to(current_latent["samples"].device)
+                        current_latent["samples"] = current_latent["samples"] + (noise_offset * fade) * eps
+                        try:
+                            del eps
+                        except Exception:
+                            pass
                     # Pre-sampling ONNX detectors: handled once below (kept compact)
                                         CURRENT_ONNX_MASK_BCHW = None
                     except Exception:
                         pass
+                    try:
+                        del img_prev2
+                    except Exception:
+                        pass
+                    try:
+                        del em2
+                    except Exception:
+                        pass
+                    try:
+                        del cmask
+                        del fused
+                        del om
+                    except Exception:
+                        pass
                     # Sampler model prepared once above; reused across iterations (no-op here)
                     sampler_model = sampler_model
                                     current_latent = lat_b
                                 else:
                                     current_latent = lat_a
+                                try:
+                                    del img_roi
+                                except Exception:
+                                    pass
+                                try:
+                                    del roi
+                                except Exception:
+                                    pass
+                                try:
+                                    del lat_in_a
+                                    del lat_a
+                                    del img_a
+                                except Exception:
+                                    pass
+                                try:
+                                    del lat_in_b
+                                    del lat_b
+                                    del img_b
+                                except Exception:
+                                    pass
                     except Exception:
                         pass
                     # cooperative cancel immediately after sampling
                     model_management.throw_exception_if_processing_interrupted()
+                    # Release heavy temporaries from sampler path
+                    try:
+                        del lat_img
+                    except Exception:
+                        pass
+                    try:
+                        del noise
+                    except Exception:
+                        pass
+                    try:
+                        del noise_mask
+                    except Exception:
+                        pass
+                    try:
+                        del callback
+                    except Exception:
+                        pass
+                    try:
+                        del sampler_obj
+                    except Exception:
+                        pass
+                    try:
+                        del sigmas
+                    except Exception:
+                        pass
                     if bool(latent_compare):
                         _cur = current_latent["samples"]
                             current_denoise = max(0.20, current_denoise * damp)
                             cfg_damp = 0.997 if damp > 0.9 else 0.99
                             current_cfg = max(1.0, current_cfg * cfg_damp)
+                    try:
+                        del prev_samples
+                    except Exception:
+                        pass
                     # AQClip-Lite: adaptive soft clipping in latent space (before decode)
                     try:
                                 H_override=H_override,
                             )
                             current_latent["samples"] = z_new
+                            try:
+                                del H_override
+                            except Exception:
+                                pass
+                            try:
+                                del Hm
+                            except Exception:
+                                pass
                     except Exception:
                         pass
                             # Feed back to latent for next steps
                             current_latent = {"samples": safe_encode(vae, img2)}
                             image = img2
+                            try:
+                                del x
+                                del r
+                                del low_x
+                                del low_r
+                                del high_x
+                                del low_mix
+                                del new
+                                del micro
+                                del gray
+                                del sobel_x
+                                del sobel_y
+                                del gx
+                                del gy
+                                del mag
+                                del m_edge
+                                del g_depth
+                                del g
+                                del ref_n
+                                del ref
+                                del img
+                            except Exception:
+                                pass
+                            try:
+                                clear_gpu_and_ram_cache()
+                            except Exception:
+                                pass
                         except Exception:
                             pass
         except Exception:
             pass
+        # Under-the-hood preview downscale for UI/output IMAGE to cap RAM during save/preview
+        try:
+            B, H, W, C = image.shape
+            max_side = max(int(H), int(W))
+            cap = 4096
+            if max_side > cap:
+                scale = float(cap) / float(max_side)
+                nh = max(1, int(round(H * scale)))
+                nw = max(1, int(round(W * scale)))
+                x = image.movedim(-1, 1)
+                x = F.interpolate(x, size=(nh, nw), mode='bilinear', align_corners=False)
+                image = x.movedim(1, -1).clamp(0, 1).to(dtype=image.dtype)
+        except Exception:
+            pass
+        # Optional: save from node with low PNG compress to reduce RAM spike; ignore UI wiring
+        try:
+            if bool(auto_save):
+                from comfy_api.latest._ui import ImageSaveHelper, FolderType
+                _ = ImageSaveHelper.save_images(
+                    [image], filename_prefix=str(save_prefix), folder_type=FolderType.output,
+                    cls=CADEEasyUI, compress_level=int(save_compress))
+        except Exception:
+            pass
         return current_latent, image, int(current_steps), float(current_cfg), float(current_denoise), onnx_mask_img
                 if score > best_score:
                     best_score = score
                     best_seed = sd
+                try:
+                    del img
+                except Exception:
+                    pass
+                try:
+                    del lat_out
+                except Exception:
+                    pass
+                try:
+                    del lat_in
+                except Exception:
+                    pass
+                try:
+                    del lch_small
+                except Exception:
+                    pass
+                try:
+                    del lap
+                except Exception:
+                    pass
+                try:
+                    del cand_embed
+                except Exception:
+                    pass
+                try:
+                    del cmask
+                except Exception:
+                    pass
             except Exception as e:
                 # do not swallow user interruption; also honour sentinel
                 if isinstance(e, model_management.InterruptProcessingException) or globals().get("_MG_CANCEL_REQUESTED", False):

mod/hard/mg_cade25.py CHANGED Viewed

@@ -156,7 +156,41 @@ def _clipseg_build_mask(image_bhwc: torch.Tensor,
             except Exception:
                 pass
         m = (m * float(max(0.0, gain))).clamp(0, 1)
-        return m.unsqueeze(0).unsqueeze(-1)  # BHWC with B=1,C=1
     except Exception as e:
         if not globals().get("_CLIPSEG_WARNED", False):
             print(f"[CADE2.5][CLIPSeg] mask failed: {e}")
@@ -723,12 +757,42 @@ def _scheduler_names():
 def safe_decode(vae, lat, tile=512, ovlp=64):
-    h, w = lat["samples"].shape[-2:]
-    if min(h, w) > 1024:
-        # Increase overlap for ultra-hires to reduce seam artifacts
-        ov = 128 if max(h, w) > 2048 else ovlp
-        return vae.decode_tiled(lat["samples"], tile_x=tile, tile_y=tile, overlap=ov)
-    return vae.decode(lat["samples"])
 def safe_encode(vae, img, tile=512, ovlp=64):
@@ -1456,6 +1520,10 @@ class ComfyAdaptiveDetailEnhancer25:
                 "clipseg_blend": (["fuse", "replace", "intersect"], {"default": "fuse", "tooltip": "How to combine CLIPSeg with any pre-mask (if present)."}),
                 "clipseg_ref_gate": ("BOOLEAN", {"default": False, "tooltip": "If reference provided, boost mask when far from reference (CLIP-Vision)."}),
                 "clipseg_ref_threshold": ("FLOAT", {"default": 0.03, "min": 0.0, "max": 0.2, "step": 0.001}),
                 # Polish mode (final hi-res refinement)
                 "polish_enable": ("BOOLEAN", {"default": False, "tooltip": "Polish: keep low-frequency shape from reference while allowing high-frequency details to refine."}),
@@ -1493,8 +1561,9 @@ class ComfyAdaptiveDetailEnhancer25:
                      clipseg_threshold=0.40, clipseg_blur=7.0, clipseg_dilate=4,
                      clipseg_gain=1.0, clipseg_blend="fuse", clipseg_ref_gate=False, clipseg_ref_threshold=0.03,
                      polish_enable=False, polish_keep_low=0.4, polish_edge_lock=0.2, polish_sigma=1.0,
-                     polish_start_after=1, polish_keep_low_ramp=0.2,
-                     kv_prune_enable=False, kv_keep=0.85, kv_min_tokens=128):
         # Hard reset of any sticky globals from prior runs
         try:
             global CURRENT_ONNX_MASK_BCHW
@@ -1615,6 +1684,23 @@ class ComfyAdaptiveDetailEnhancer25:
                 clipseg_enable = False
                 # Depth gate cache for micro-detail injection (reuse per resolution)
                 depth_gate_cache = {"size": None, "mask": None}
                 # Prepare guided sampler once per node run to avoid cloning model each iteration
                 sampler_model = _wrap_model_with_guidance(
                       model, guidance_mode, rescale_multiplier, momentum_beta, cfg_curve, perp_damp,
@@ -1665,7 +1751,11 @@ class ComfyAdaptiveDetailEnhancer25:
                             device='cpu',
                             generator=gen,
                         ).to(current_latent["samples"].device)
-                        current_latent["samples"] += (noise_offset * fade) * eps
                     # ONNX pre-sampling detectors removed
@@ -1695,6 +1785,23 @@ class ComfyAdaptiveDetailEnhancer25:
                                 # No local guidance toggles here; keep optional mask hook clear
                     except Exception:
                         pass
                     # Sampler model prepared once above; reuse it here (no-op assignment)
                     sampler_model = sampler_model
@@ -1749,6 +1856,31 @@ class ComfyAdaptiveDetailEnhancer25:
                     # cooperative cancel right after sampling, before further heavy work
                     model_management.throw_exception_if_processing_interrupted()
                     if bool(latent_compare):
                         _cur = current_latent["samples"]
@@ -1769,6 +1901,10 @@ class ComfyAdaptiveDetailEnhancer25:
                             current_denoise = max(0.20, current_denoise * damp)
                             cfg_damp = 0.997 if damp > 0.9 else 0.99
                             current_cfg = max(1.0, current_cfg * cfg_damp)
                     # AQClip-Lite: adaptive soft clipping in latent space (before decode)
                     try:
@@ -1790,6 +1926,14 @@ class ComfyAdaptiveDetailEnhancer25:
                                 H_override=H_override,
                             )
                             current_latent["samples"] = z_new
                     except Exception:
                         pass
@@ -1880,6 +2024,35 @@ class ComfyAdaptiveDetailEnhancer25:
                             # Feed back to latent for next steps
                             current_latent = {"samples": safe_encode(vae, img2)}
                             image = img2
                         except Exception:
                             pass
@@ -1989,6 +2162,31 @@ class ComfyAdaptiveDetailEnhancer25:
         except Exception:
             pass
         # Cleanup KV pruning state to avoid leaking into other nodes
         try:
             if hasattr(sa_patch, "set_kv_prune"):

             except Exception:
                 pass
         m = (m * float(max(0.0, gain))).clamp(0, 1)
+        out_mask = m.unsqueeze(0).unsqueeze(-1)  # BHWC with B=1,C=1
+        # Best-effort release of temporaries to reduce RAM peak
+        try:
+            del inputs
+        except Exception:
+            pass
+        try:
+            del outputs
+        except Exception:
+            pass
+        try:
+            del logits
+        except Exception:
+            pass
+        try:
+            del prob
+        except Exception:
+            pass
+        try:
+            del pil_img
+        except Exception:
+            pass
+        try:
+            del arr
+        except Exception:
+            pass
+        try:
+            del x
+        except Exception:
+            pass
+        try:
+            del img
+        except Exception:
+            pass
+        return out_mask
     except Exception as e:
         if not globals().get("_CLIPSEG_WARNED", False):
             print(f"[CADE2.5][CLIPSeg] mask failed: {e}")
 def safe_decode(vae, lat, tile=512, ovlp=64):
+    # Avoid building autograd graphs and release GPU memory early
+    with torch.inference_mode():
+        h, w = lat["samples"].shape[-2:]
+        if min(h, w) > 1024:
+            # Increase overlap for ultra-hires to reduce seam artifacts
+            ov = 128 if max(h, w) > 2048 else ovlp
+            out = vae.decode_tiled(lat["samples"], tile_x=tile, tile_y=tile, overlap=ov)
+        else:
+            out = vae.decode(lat["samples"])
+    # Move to CPU and free VRAM ASAP
+    try:
+        try:
+            out = out.detach()
+        except Exception:
+            pass
+        out_cpu = out
+        try:
+            out_cpu = out_cpu.to('cpu')
+        except Exception:
+            pass
+        try:
+            del out
+        except Exception:
+            pass
+        if torch.cuda.is_available():
+            try:
+                torch.cuda.synchronize()
+            except Exception:
+                pass
+            try:
+                torch.cuda.empty_cache()
+            except Exception:
+                pass
+        return out_cpu
+    except Exception:
+        return out
 def safe_encode(vae, img, tile=512, ovlp=64):
                 "clipseg_blend": (["fuse", "replace", "intersect"], {"default": "fuse", "tooltip": "How to combine CLIPSeg with any pre-mask (if present)."}),
                 "clipseg_ref_gate": ("BOOLEAN", {"default": False, "tooltip": "If reference provided, boost mask when far from reference (CLIP-Vision)."}),
                 "clipseg_ref_threshold": ("FLOAT", {"default": 0.03, "min": 0.0, "max": 0.2, "step": 0.001}),
+                # Under-the-hood saving (disabled by default)
+                "auto_save": ("BOOLEAN", {"default": False, "tooltip": "Save final IMAGE directly from CADE (uses low PNG compress to reduce RAM)."}),
+                "save_prefix": ("STRING", {"default": "ComfyUI", "multiline": False}),
+                "save_compress": ("INT", {"default": 1, "min": 0, "max": 9, "step": 1}),
                 # Polish mode (final hi-res refinement)
                 "polish_enable": ("BOOLEAN", {"default": False, "tooltip": "Polish: keep low-frequency shape from reference while allowing high-frequency details to refine."}),
                      clipseg_threshold=0.40, clipseg_blur=7.0, clipseg_dilate=4,
                      clipseg_gain=1.0, clipseg_blend="fuse", clipseg_ref_gate=False, clipseg_ref_threshold=0.03,
                      polish_enable=False, polish_keep_low=0.4, polish_edge_lock=0.2, polish_sigma=1.0,
+                      polish_start_after=1, polish_keep_low_ramp=0.2,
+                      auto_save=False, save_prefix="ComfyUI", save_compress=1,
+                      kv_prune_enable=False, kv_keep=0.85, kv_min_tokens=128):
         # Hard reset of any sticky globals from prior runs
         try:
             global CURRENT_ONNX_MASK_BCHW
                 clipseg_enable = False
                 # Depth gate cache for micro-detail injection (reuse per resolution)
                 depth_gate_cache = {"size": None, "mask": None}
+                # Release preflight temporaries to avoid keeping big tensors alive
+                try:
+                    del cmask
+                except Exception:
+                    pass
+                try:
+                    del om
+                except Exception:
+                    pass
+                try:
+                    del pre_mask
+                except Exception:
+                    pass
+                try:
+                    del image
+                except Exception:
+                    pass
                 # Prepare guided sampler once per node run to avoid cloning model each iteration
                 sampler_model = _wrap_model_with_guidance(
                       model, guidance_mode, rescale_multiplier, momentum_beta, cfg_curve, perp_damp,
                             device='cpu',
                             generator=gen,
                         ).to(current_latent["samples"].device)
+                        current_latent["samples"] = current_latent["samples"] + (noise_offset * fade) * eps
+                        try:
+                            del eps
+                        except Exception:
+                            pass
                     # ONNX pre-sampling detectors removed
                                 # No local guidance toggles here; keep optional mask hook clear
                     except Exception:
                         pass
+                    # release heavy temporaries from CLIPSeg path
+                    try:
+                        del img_prev2
+                    except Exception:
+                        pass
+                    try:
+                        del cmask
+                    except Exception:
+                        pass
+                    try:
+                        del fused
+                    except Exception:
+                        pass
+                    try:
+                        del om
+                    except Exception:
+                        pass
                     # Sampler model prepared once above; reuse it here (no-op assignment)
                     sampler_model = sampler_model
                     # cooperative cancel right after sampling, before further heavy work
                     model_management.throw_exception_if_processing_interrupted()
+                    # release sampler temporaries (best-effort)
+                    try:
+                        del lat_img
+                    except Exception:
+                        pass
+                    try:
+                        del noise
+                    except Exception:
+                        pass
+                    try:
+                        del noise_mask
+                    except Exception:
+                        pass
+                    try:
+                        del callback
+                    except Exception:
+                        pass
+                    try:
+                        del sampler_obj
+                    except Exception:
+                        pass
+                    try:
+                        del sigmas
+                    except Exception:
+                        pass
                     if bool(latent_compare):
                         _cur = current_latent["samples"]
                             current_denoise = max(0.20, current_denoise * damp)
                             cfg_damp = 0.997 if damp > 0.9 else 0.99
                             current_cfg = max(1.0, current_cfg * cfg_damp)
+                    try:
+                        del prev_samples
+                    except Exception:
+                        pass
                     # AQClip-Lite: adaptive soft clipping in latent space (before decode)
                     try:
                                 H_override=H_override,
                             )
                             current_latent["samples"] = z_new
+                            try:
+                                del H_override
+                            except Exception:
+                                pass
+                            try:
+                                del Hm
+                            except Exception:
+                                pass
                     except Exception:
                         pass
                             # Feed back to latent for next steps
                             current_latent = {"samples": safe_encode(vae, img2)}
                             image = img2
+                            # best-effort release of large temporaries
+                            try:
+                                del x
+                                del r
+                                del low_x
+                                del low_r
+                                del high_x
+                                del low_mix
+                                del new
+                                del micro
+                                del gray
+                                del sobel_x
+                                del sobel_y
+                                del gx
+                                del gy
+                                del mag
+                                del m_edge
+                                del g_edge
+                                del g_depth
+                                del g
+                                del ref_n
+                                del ref
+                                del img
+                            except Exception:
+                                pass
+                            try:
+                                clear_gpu_and_ram_cache()
+                            except Exception:
+                                pass
                         except Exception:
                             pass
         except Exception:
             pass
+        # Under-the-hood preview downscale for UI/output IMAGE to cap RAM during save/preview
+        try:
+            B, H, W, C = image.shape
+            max_side = max(int(H), int(W))
+            cap = 4096
+            if max_side > cap:
+                scale = float(cap) / float(max_side)
+                nh = max(1, int(round(H * scale)))
+                nw = max(1, int(round(W * scale)))
+                x = image.movedim(-1, 1)
+                x = F.interpolate(x, size=(nh, nw), mode='bilinear', align_corners=False)
+                image = x.movedim(1, -1).clamp(0, 1).to(dtype=image.dtype)
+        except Exception:
+            pass
+        # Optional: save from node with low PNG compress to reduce RAM spike; ignore UI wiring
+        try:
+            if bool(auto_save):
+                from comfy_api.latest._ui import ImageSaveHelper, FolderType
+                _ = ImageSaveHelper.save_images(
+                    [image], filename_prefix=str(save_prefix), folder_type=FolderType.output,
+                    cls=ComfyAdaptiveDetailEnhancer25, compress_level=int(save_compress))
+        except Exception:
+            pass
         # Cleanup KV pruning state to avoid leaking into other nodes
         try:
             if hasattr(sa_patch, "set_kv_prune"):

pressets/mg_cade25.cfg CHANGED Viewed

@@ -15,7 +15,7 @@ cfg_delta: 0.03
 denoise_delta: 0.28
 # toggles
-apply_sharpen: true
 apply_upscale: true
 apply_ids: true
 clip_clean: true
@@ -42,9 +42,9 @@ ref_cooldown: 2
 # guidance
 guidance_mode: ZeResFDG
-rescale_multiplier: 0.72
-momentum_beta: 0.12
-cfg_curve: 1.0
 perp_damp: 0.80
 # NAG
@@ -58,10 +58,12 @@ use_zero_init: false
 zero_init_steps: 0
 # FDG / ZE thresholds
-fdg_low: 0.25
 fdg_high: 0.7
 fdg_sigma: 1.20
-ze_res_zero_steps: 10
 ze_adaptive: true
 ze_r_switch_hi: 0.85
 ze_r_switch_lo: 0.25
@@ -108,7 +110,7 @@ midfreq_sigma_hi: 2.00
 # QSilk-AQClip-Lite (adaptive latent clipping)
 aqclip_enable: true
 aq_tile: 32
-aq_stride: 16
 aq_alpha: 2.0
 aq_attn: true
@@ -118,8 +120,8 @@ aq_attn: true
 seed: 0
 control_after_generate: randomize
 steps: 30
-cfg: 7.5
-denoise: 0.65
 sampler_name: ddim
 scheduler: MGHybrid
 iterations: 2
@@ -154,14 +156,14 @@ ref_cooldown: 2
 # guidance
 guidance_mode: ZeResFDG
 #rescale_multiplier: 0.75
-rescale_multiplier: 0.95
 momentum_beta: 0.15
 cfg_curve: 0.85
 perp_damp: 0.80
 # NAG
 use_nag: true
-nag_scale: 3.0
 nag_tau: 2.50
 nag_alpha: 0.25
@@ -170,7 +172,7 @@ use_zero_init: false
 zero_init_steps: 0
 # FDG / ZE thresholds
-fdg_low: 0.55
 fdg_high: 0.7
 fdg_sigma: 1.10
 ze_res_zero_steps: 12
@@ -220,8 +222,8 @@ midfreq_sigma_hi: 2.10
 # QSilk-AQClip-Lite (adaptive latent clipping)
 aqclip_enable: true
-aq_tile: 32
-aq_stride: 16
 aq_alpha: 2.0
 aq_attn: true
@@ -231,7 +233,7 @@ aq_attn: true
 seed: 0
 control_after_generate: randomize
 steps: 25
-cfg: 7.0
 denoise: 0.60
 sampler_name: ddim
 scheduler: MGHybrid
@@ -267,14 +269,15 @@ ref_cooldown: 2
 # guidance
 guidance_mode: ZeResFDG
-rescale_multiplier: 1.10
 momentum_beta: 0.37
 cfg_curve: 0.65
 perp_damp: 0.95
 # NAG
 use_nag: true
-nag_scale: 4.0
 nag_tau: 2.50
 nag_alpha: 0.25
@@ -283,10 +286,10 @@ use_zero_init: false
 zero_init_steps: 0
 # FDG / ZE thresholds
-fdg_low: 0.55
 fdg_high: 0.7
 fdg_sigma: 1.10
-ze_res_zero_steps: 12
 ze_adaptive: true
 ze_r_switch_hi: 0.85
 ze_r_switch_lo: 0.25
@@ -325,22 +328,32 @@ polish_start_after: 1
 polish_keep_low_ramp: 0.10
 # mid-frequency stabilizer (hands/objects scale)
-midfreq_enable: true
-#midfreq_gain: 0.20
-#midfreq_sigma_lo: 0.95
-#midfreq_sigma_hi: 2.20
-midfreq_gain: 0.62
-midfreq_sigma_lo: 0.50
-midfreq_sigma_hi: 1.2
 # QSilk-AQClip-Lite (adaptive latent clipping)
 aqclip_enable: true
-aq_tile: 48
-aq_stride: 32
 aq_alpha: 1.6
-aq_attn: false
 # KV pruning (self-attention speedup)
 kv_prune_enable: true
 kv_keep: 0.80
@@ -353,9 +366,10 @@ kv_min_tokens: 256
 seed: 0
 control_after_generate: randomize
 steps: 25
-cfg: 6.0
 #0.75
-denoise: 0.45
 sampler_name: ddim
 scheduler: MGHybrid
 iterations: 2
@@ -396,7 +410,7 @@ perp_damp: 0.75
 # NAG
 use_nag: true
-nag_scale: 4.0
 nag_tau: 2.50
 nag_alpha: 0.25
@@ -450,14 +464,16 @@ polish_keep_low_ramp: 0.10
 # mid-frequency stabilizer (hands/objects scale)
 midfreq_enable: true
 midfreq_gain: 0.72
-midfreq_sigma_lo: 0.50
-midfreq_sigma_hi: 1.2
 # QSilk-AQClip-Lite (adaptive latent clipping)
 aqclip_enable: true
 aq_tile: 64
-aq_stride: 8
-aq_alpha: 2.0
 aq_attn: true
 # KV pruning (self-attention speedup)

 denoise_delta: 0.28
 # toggles
+apply_sharpen: false
 apply_upscale: true
 apply_ids: true
 clip_clean: true
 # guidance
 guidance_mode: ZeResFDG
+rescale_multiplier: 0.82
+momentum_beta: 0.22
+cfg_curve: 0.85
 perp_damp: 0.80
 # NAG
 zero_init_steps: 0
 # FDG / ZE thresholds
+#fdg_low: 0.25
+fdg_low: 0.45
 fdg_high: 0.7
 fdg_sigma: 1.20
+#ze_res_zero_steps: 10
+ze_res_zero_steps: 11
 ze_adaptive: true
 ze_r_switch_hi: 0.85
 ze_r_switch_lo: 0.25
 # QSilk-AQClip-Lite (adaptive latent clipping)
 aqclip_enable: true
 aq_tile: 32
+aq_stride: 8
 aq_alpha: 2.0
 aq_attn: true
 seed: 0
 control_after_generate: randomize
 steps: 30
+cfg: 6.5
+denoise: 0.55
 sampler_name: ddim
 scheduler: MGHybrid
 iterations: 2
 # guidance
 guidance_mode: ZeResFDG
 #rescale_multiplier: 0.75
+rescale_multiplier: 0.75
 momentum_beta: 0.15
 cfg_curve: 0.85
 perp_damp: 0.80
 # NAG
 use_nag: true
+nag_scale: 4.0
 nag_tau: 2.50
 nag_alpha: 0.25
 zero_init_steps: 0
 # FDG / ZE thresholds
+fdg_low: 0.35
 fdg_high: 0.7
 fdg_sigma: 1.10
 ze_res_zero_steps: 12
 # QSilk-AQClip-Lite (adaptive latent clipping)
 aqclip_enable: true
+aq_tile: 64
+aq_stride: 8
 aq_alpha: 2.0
 aq_attn: true
 seed: 0
 control_after_generate: randomize
 steps: 25
+cfg: 4.0
 denoise: 0.60
 sampler_name: ddim
 scheduler: MGHybrid
 # guidance
 guidance_mode: ZeResFDG
+#rescale_multiplier: 1.10
+rescale_multiplier: 0.75
 momentum_beta: 0.37
 cfg_curve: 0.65
 perp_damp: 0.95
 # NAG
 use_nag: true
+nag_scale: 4.5
 nag_tau: 2.50
 nag_alpha: 0.25
 zero_init_steps: 0
 # FDG / ZE thresholds
+fdg_low: 0.40
 fdg_high: 0.7
 fdg_sigma: 1.10
+ze_res_zero_steps: 10
 ze_adaptive: true
 ze_r_switch_hi: 0.85
 ze_r_switch_lo: 0.25
 polish_keep_low_ramp: 0.10
 # mid-frequency stabilizer (hands/objects scale)
+#midfreq_enable: true
+#midfreq_gain: 0.62
+#midfreq_sigma_lo: 0.50
+#midfreq_sigma_hi: 1.2
+# QSilk-AQClip-Lite (adaptive latent clipping)
+#aqclip_enable: true
+#aq_tile: 128
+#aq_stride: 8
+#aq_alpha: 2.0
+# mid-frequency stabilizer (hands/objects scale)
+midfreq_enable: true
+midfreq_gain: 0.85
+midfreq_sigma_lo: 1.10
+midfreq_sigma_hi: 2.1
 # QSilk-AQClip-Lite (adaptive latent clipping)
 aqclip_enable: true
+aq_tile: 64
+aq_stride: 8
 aq_alpha: 1.6
+aq_ema_beta: 0.90
+aq_attn: true
 # KV pruning (self-attention speedup)
 kv_prune_enable: true
 kv_keep: 0.80
 seed: 0
 control_after_generate: randomize
 steps: 25
+#cfg: 6.0
+cfg: 5.0
 #0.75
+denoise: 0.55
 sampler_name: ddim
 scheduler: MGHybrid
 iterations: 2
 # NAG
 use_nag: true
+nag_scale: 4.5
 nag_tau: 2.50
 nag_alpha: 0.25
 # mid-frequency stabilizer (hands/objects scale)
 midfreq_enable: true
 midfreq_gain: 0.72
+midfreq_sigma_lo: 0.55
+midfreq_sigma_hi: 1.3
 # QSilk-AQClip-Lite (adaptive latent clipping)
 aqclip_enable: true
 aq_tile: 64
+aq_stride: 8
+aq_alpha: 1.8
+aq_ema_beta: 0.85
 aq_attn: true
 # KV pruning (self-attention speedup)

pressets/mg_controlfusion.cfg CHANGED Viewed

@@ -25,7 +25,7 @@ strength_neg: 1.10
 # schedule window
 start_percent: 0.000
-end_percent: 1.000
 preview_res: 1024
 mask_brightness: 1.00
 preview_show_strength: true
@@ -78,7 +78,7 @@ strength_neg: 1.10
 # schedule window
 start_percent: 0.000
-end_percent: 0.600
 preview_res: 1024
 mask_brightness: 1.00
 preview_show_strength: true
@@ -94,7 +94,7 @@ split_apply: true
 edge_start_percent: 0.000
 edge_end_percent: 0.400
 depth_start_percent: 0.000
-depth_end_percent: 0.800
 # multipliers & shape
 edge_strength_mul: 0.50
@@ -131,7 +131,7 @@ strength_neg: 1.00
 # schedule window
 start_percent: 0.000
-end_percent: 0.450
 preview_res: 1024
 mask_brightness: 1.00
 preview_show_strength: true
@@ -146,7 +146,7 @@ split_apply: true
 # split timings
 edge_start_percent: 0.000
 #0.350
-edge_end_percent: 0.450
 depth_start_percent: 0.000
 depth_end_percent: 1.000

 # schedule window
 start_percent: 0.000
+end_percent: 0.700
 preview_res: 1024
 mask_brightness: 1.00
 preview_show_strength: true
 # schedule window
 start_percent: 0.000
+end_percent: 0.800
 preview_res: 1024
 mask_brightness: 1.00
 preview_show_strength: true
 edge_start_percent: 0.000
 edge_end_percent: 0.400
 depth_start_percent: 0.000
+depth_end_percent: 0.900
 # multipliers & shape
 edge_strength_mul: 0.50
 # schedule window
 start_percent: 0.000
+end_percent: 0.750
 preview_res: 1024
 mask_brightness: 1.00
 preview_show_strength: true
 # split timings
 edge_start_percent: 0.000
 #0.350
+edge_end_percent: 0.550
 depth_start_percent: 0.000
 depth_end_percent: 1.000