Spaces:

neuralworm
/

cognitive_mapping_probe

Sleeping

App Files Files Community

neuralworm commited on 9 days ago

Commit

a345062

1 Parent(s): 4478c62

cs 1.0

Browse files

Files changed (19) hide show

README.md +18 -23
app.py +45 -91
cognitive_mapping_probe/__pycache__/llm_iface.cpython-310.pyc +0 -0
cognitive_mapping_probe/__pycache__/resonance.cpython-310.pyc +0 -0
cognitive_mapping_probe/llm_iface.py +7 -22
cognitive_mapping_probe/orchestrator.py +0 -88
cognitive_mapping_probe/orchestrator_seismograph.py +62 -0
cognitive_mapping_probe/pre_flight_checks.py +0 -147
cognitive_mapping_probe/resonance.py +0 -101
cognitive_mapping_probe/resonance_seismograph.py +55 -0
cognitive_mapping_probe/verification.py +0 -65
requirements.txt +1 -2
run_test.sh +30 -0
tests/conftest.py +70 -0
tests/test_app_logic.py +54 -0
tests/test_components.py +115 -0
tests/test_dynamics.py +60 -0
tests/test_integration.py +46 -0
tests/test_orchestration.py +43 -0

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
-title: "Cognitive Breaking Point Probe"
-emoji: 💥
-colorFrom: red
-colorTo: yellow
 sdk: gradio
 sdk_version: "4.40.0"
 app_file: app.py
@@ -10,32 +10,27 @@ pinned: true
 license: apache-2.0
 ---
-# 💥 Cognitive Breaking Point (CBP) Probe
-Dieses Projekt implementiert eine falsifizierbare experimentelle Suite zur Messung der **kognitiven Robustheit** von Sprachmodellen. Wir verabschieden uns von der Suche nach introspektiven Berichten und wenden uns stattdessen einem harten, mechanistischen Signal zu: dem Punkt, an dem der kognitive Prozess des Modells unter Last zusammenbricht.
-## Wissenschaftliches Paradigma: Von der Introspektion zur Kartographie
-Unsere Forschung hat gezeigt, dass kleine Modelle wie `gemma-3-1b-it` unter stark rekursiver Last nicht in einen stabilen "Denk"-Zustand konvergieren, sondern in eine **kognitive Endlosschleife** geraten. Anstatt dies als Scheitern zu werten, nutzen wir es als Messinstrument.
-Die zentrale Hypothese lautet: Die Neigung eines Modells, in einen solchen pathologischen Zustand zu kippen, ist eine Funktion der semantischen Komplexität und "Ungültigkeit" seines internen Zustands. Wir können diesen Übergang gezielt durch die Injektion von "Konzeptvektoren" mit variabler Stärke provozieren.
-Der **Cognitive Breaking Point (CBP)** ist definiert als die minimale Injektionsstärke eines Konzepts, die ausreicht, um das Modell von einem konvergenten (produktiven) in einen nicht-konvergenten (gefangenen) Zustand zu zwingen.
-## Das Experiment: Kognitive Titration
-1.  **Induktion**: Das Modell wird mit einem Prompt in einen Zustand des "stillen Denkens" versetzt. Die Komplexität des Prompts ist nun einstellbar (`resonance_prompt` vs. `control_long_prose`), um eine stabile Baseline zu finden.
-2.  **Titration**: Ein "Konzeptvektor" (z.B. für "Angst" oder "Apfel") wird mit schrittweise ansteigender Stärke in die mittleren Layer des Modells injiziert.
-3.  **Messung**: Der primäre Messwert ist der Terminationsgrund des Denkprozesses:
-    *   `converged`: Der Zustand hat sich stabilisiert. Das System ist robust.
-    *   `max_steps_reached`: Der Zustand oszilliert oder driftet endlos. Das System ist "gebrochen".
-4.  **Verifikation**: Nur wenn der Zustand konvergiert, wird versucht, einen spontanen Text zu generieren. Die Fähigkeit zu antworten ist der Verhaltensmarker für kognitive Stabilität.
 ## Wie man die App benutzt
-1.  **Diagnostics Tab**: Führe zuerst die diagnostischen Tests aus, um sicherzustellen, dass die experimentelle Apparatur auf der aktuellen Hardware und mit der `transformers`-Version korrekt funktioniert.
-2.  **Main Experiment Tab**:
-    *   **Wichtig:** Wähle zuerst den `control_long_prose` Prompt, um zu validieren, dass das Modell eine stabile Baseline erreichen kann. Nur wenn dies gelingt, sind die Ergebnisse mit dem anspruchsvolleren `resonance_prompt` interpretierbar.
-    *   Gib eine Modell-ID ein (z.B. `google/gemma-3-1b-it`).
-    *   Definiere die zu testenden Konzepte und Titrationsschritte.
-    *   Starte das Experiment und analysiere die resultierende Tabelle, um die CBPs für jedes Konzept zu identifizieren.

 ---
+title: "Cognitive Seismograph"
+emoji: 🧠
+colorFrom: indigo
+colorTo: blue
 sdk: gradio
 sdk_version: "4.40.0"
 app_file: app.py
 license: apache-2.0
 ---
+# 🧠 Cognitive Seismograph: Visualizing Internal Dynamics
+Dieses Projekt implementiert eine experimentelle Suite zur Messung und Visualisierung der **intrinsischen kognitiven Dynamik** von Sprachmodellen.
+## Wissenschaftliches Paradigma: Von Stabilität zu Dynamik
+Unsere vorherige Forschung hat eine zentrale Hypothese falsifiziert: Die Annahme, dass ein LLM in einem manuellen, rekursiven "Denk"-Loop einen stabilen, konvergenten Zustand erreicht. Stattdessen haben wir entdeckt, dass das System in einen Zustand von **deterministischem Chaos** oder einen **Limit Cycle** gerät – es hört niemals auf zu "denken".
+Anstatt dies als Scheitern zu betrachten, nutzen wir es als primäres Messsignal. Dieses neue "Cognitive Seismograph"-Paradigma behandelt die Zeitreihe der internen Zustandsänderungen (`state deltas`) als ein **EKG des Denkprozesses**.
+**Die Kernhypothese lautet:** Die statistische Signatur dieser dynamischen Zeitreihe (z.B. ihre Volatilität, ihr Mittelwert) ist nicht zufällig, sondern eine Funktion der kognitiven Last, die durch den initialen Prompt induziert wird.
+## Das Experiment: Aufzeichnung des kognitiven EKG
+1.  **Induktion**: Das Modell wird mit einem Prompt (`control_long_prose` vs. `resonance_prompt`) in einen Zustand des "stillen Denkens" versetzt.
+2.  **Aufzeichnung**: Über eine definierte Anzahl von Schritten wird der `forward`-Pass des Modells iterativ mit seinem eigenen Output gefüttert. Bei jedem Schritt wird die Norm der Änderung des `hidden_state` (das "Delta") aufgezeichnet.
+3.  **Analyse & Visualisierung**: Die resultierende Zeitreihe der Deltas wird geplottet und statistisch analysiert, um die "seismische Signatur" des Denkprozesses zu charakterisieren.
 ## Wie man die App benutzt
+1.  Wähle eine Modell-ID (z.B. `google/gemma-3-1b-it`).
+2.  Wähle einen **Prompt Type**, um die kognitive Last zu variieren. Vergleiche die resultierenden Graphen für `control_long_prose` (niedrige Last) und `resonance_prompt` (hohe rekursive Last).
+3.  Stelle die Anzahl der internen Schritte ein und starte die Analyse.
+4.  Analysiere den Graphen und die statistische Zusammenfassung, um die Unterschiede in der kognitiven Dynamik zu verstehen.

app.py CHANGED Viewed

@@ -3,142 +3,96 @@ import pandas as pd
 import traceback
 import sys
-# Wichtige Imports für die neuen Pre-Flight Checks
-from cognitive_mapping_probe.pre_flight_checks import run_pre_flight_checks
-from cognitive_mapping_probe.orchestrator import run_cognitive_titration_experiment
 from cognitive_mapping_probe.prompts import RESONANCE_PROMPTS
 from cognitive_mapping_probe.utils import dbg
 # --- UI Theme and Layout ---
-theme = gr.themes.Soft(primary_hue="orange", secondary_hue="amber").set(
-    body_background_fill="#fdf8f2",
     block_background_fill="white",
-    block_border_width="1px",
-    block_shadow="*shadow_drop_lg",
-    button_primary_background_fill="*primary_500",
-    button_primary_text_color="white",
 )
-# --- Standard-Modell-ID für Tests und UI ---
-DEFAULT_MODEL_ID = "google/gemma-3-1b-it"
-# --- Wrapper Functions for Gradio ---
-def run_experiment_and_display(
     model_id: str,
     prompt_type: str,
     seed: int,
-    concepts_str: str,
-    strength_levels_str: str,
     num_steps: int,
-    temperature: float,
     progress=gr.Progress(track_tqdm=True)
 ):
     """
-    Führt das Haupt-Titrationsexperiment durch und formatiert die Ergebnisse für die UI.
     """
     try:
-        results = run_cognitive_titration_experiment(
-            model_id, prompt_type, int(seed), concepts_str, strength_levels_str,
-            int(num_steps), float(temperature), progress
         )
-        verdict = results.get("verdict", "Experiment finished with errors.")
-        all_runs = results.get("runs", [])
-        if not all_runs:
-            return "### ⚠️ No Data Generated\nDas Experiment lief durch, aber es wurden keine Datenpunkte erzeugt. Bitte Logs prüfen.", pd.DataFrame(), results
-        details_df = pd.DataFrame(all_runs)
-        summary_text = "### 💥 Cognitive Breaking Points (CBP)\n"
-        summary_text += "Der CBP ist die erste Stärke, bei der das Modell nicht mehr konvergiert (`max_steps_reached`).\n\n"
-        baseline_run = details_df[details_df['strength'] == 0.0].iloc[0]
-        if baseline_run['termination_reason'] != 'converged':
-             summary_text += f"**‼️ ACHTUNG: Baseline (Stärke 0.0) ist nicht konvergiert!**\n"
-             summary_text += f"Der gewählte Prompt (`{prompt_type}`) ist für dieses Modell zu anspruchsvoll. Die Ergebnisse sind nicht aussagekräftig.\n\n"
-        for concept in details_df['concept'].unique():
-            concept_df = details_df[details_df['concept'] == concept].sort_values(by='strength')
-            breaking_point_row = concept_df[concept_df['termination_reason'] != 'converged'].iloc[0] if not concept_df[concept_df['termination_reason'] != 'converged'].empty else None
-            if breaking_point_row is not None:
-                summary_text += f"- **'{concept}'**: 📉 Kollaps bei Stärke **{breaking_point_row['strength']:.2f}**\n"
-            else:
-                summary_text += f"- **'{concept}'**: ✅ Stabil bis Stärke **{concept_df['strength'].max():.2f}**\n"
-        return summary_text, details_df, results
     except Exception:
         error_str = traceback.format_exc()
-        return f"### ❌ Experiment Failed\nEin unerwarteter Fehler ist aufgetreten:\n\n```\n{error_str}\n```", pd.DataFrame(), {}
 # --- Gradio App Definition ---
-with gr.Blocks(theme=theme, title="Cognitive Breaking Point Probe") as demo:
-    gr.Markdown("# 💥 Cognitive Breaking Point Probe")
-    # Der Diagnostics Tab wurde entfernt. Die UI ist jetzt nur noch das Hauptexperiment.
     gr.Markdown(
-        "Misst den 'Cognitive Breaking Point' (CBP) – die Injektionsstärke, bei der der Denkprozess eines LLMs von Konvergenz zu einer Endlosschleife kippt."
     )
     with gr.Row(variant='panel'):
         with gr.Column(scale=1):
             gr.Markdown("### Parameters")
-            model_id_input = gr.Textbox(value=DEFAULT_MODEL_ID, label="Model ID")
             prompt_type_input = gr.Radio(
                 choices=list(RESONANCE_PROMPTS.keys()),
                 value="control_long_prose",
-                label="Prompt Type (Cognitive Load)",
-                info="Beginne mit 'control_long_prose' für eine stabile Baseline!"
             )
-            seed_input = gr.Slider(1, 1000, 42, step=1, label="Global Seed")
-            concepts_input = gr.Textbox(value="apple, solitude, fear", label="Concepts (comma-separated)")
-            strength_levels_input = gr.Textbox(value="0.0, 0.5, 1.0, 1.5, 2.0", label="Injection Strengths")
-            num_steps_input = gr.Slider(50, 500, 250, step=10, label="Max. Internal Steps")
-            temperature_input = gr.Slider(0.01, 1.5, 0.7, step=0.01, label="Temperature")
-            run_btn = gr.Button("Run Cognitive Titration", variant="primary")
         with gr.Column(scale=2):
             gr.Markdown("### Results")
-            summary_output = gr.Markdown("Zusammenfassung der Breaking Points erscheint hier.", label="Key Findings Summary")
-            details_output = gr.DataFrame(
-                headers=["concept", "strength", "responded", "termination_reason", "generated_text"],
-                label="Detailed Run Data",
-                wrap=True,
             )
             with gr.Accordion("Raw JSON Output", open=False):
                 raw_json_output = gr.JSON()
     run_btn.click(
-        fn=run_experiment_and_display,
-        inputs=[model_id_input, prompt_type_input, seed_input, concepts_input, strength_levels_input, num_steps_input, temperature_input],
-        outputs=[summary_output, details_output, raw_json_output]
     )
-# --- Main Execution Block ---
 if __name__ == "__main__":
     print("="*80)
-    print("🔬 RUNNING PRE-FLIGHT DIAGNOSTICS FOR EXPERIMENTAL APPARATUS")
     print("="*80)
-    try:
-        # Führe die obligatorischen Systemtests mit einem echten Modell durch.
-        # Wenn hier ein Fehler auftritt, ist das Experiment nicht valide.
-        run_pre_flight_checks(model_id=DEFAULT_MODEL_ID, seed=42)
-        print("\n" + "="*80)
-        print("✅ ALL DIAGNOSTICS PASSED. LAUNCHING GRADIO APP...")
-        print("="*80)
-        # Starte die Gradio App nur bei Erfolg.
-        demo.launch(server_name="0.0.0.0", server_port=7860, debug=True)
-    except (AssertionError, Exception) as e:
-        print("\n" + "="*80)
-        print("❌ PRE-FLIGHT DIAGNOSTIC FAILED")
-        print("="*80)
-        print(f"Error Type: {type(e).__name__}")
-        print(f"Error Details: {e}")
-        print("\nDie experimentelle Apparatur funktioniert nicht wie erwartet.")
-        print("Die Gradio-App wird nicht gestartet, um fehlerhafte Messungen zu verhindern.")
-        traceback.print_exc()
-        sys.exit(1) # Beende das Programm mit einem Fehlercode.

 import traceback
 import sys
+from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
 from cognitive_mapping_probe.prompts import RESONANCE_PROMPTS
 from cognitive_mapping_probe.utils import dbg
 # --- UI Theme and Layout ---
+theme = gr.themes.Soft(primary_hue="indigo", secondary_hue="blue").set(
+    body_background_fill="#f0f4f9",
     block_background_fill="white",
 )
+def run_and_display(
     model_id: str,
     prompt_type: str,
     seed: int,
     num_steps: int,
     progress=gr.Progress(track_tqdm=True)
 ):
     """
+    Führt die neue seismische Analyse durch und visualisiert die internen Dynamiken.
     """
     try:
+        results = run_seismic_analysis(
+            model_id, prompt_type, int(seed), int(num_steps), progress
         )
+        verdict = results.get("verdict", "Analysis complete.")
+        stats = results.get("stats", {})
+        deltas = results.get("state_deltas", [])
+        # Erstelle einen DataFrame für den Plot
+        df = pd.DataFrame({
+            "Internal Step": range(len(deltas)),
+            "State Change (Delta)": deltas
+        })
+        # Erstelle eine Zusammenfassung der Statistiken
+        stats_md = f"### Statistical Signature\n"
+        stats_md += f"- **Mean Delta:** {stats.get('mean_delta', 0):.4f} (Avg. cognitive activity)\n"
+        stats_md += f"- **Std Dev Delta:** {stats.get('std_delta', 0):.4f} (Volatility of thought)\n"
+        stats_md += f"- **Max Delta:** {stats.get('max_delta', 0):.4f} (Peak cognitive shift)\n"
+        return f"{verdict}\n\n{stats_md}", df, results
     except Exception:
         error_str = traceback.format_exc()
+        return f"### ❌ Analysis Failed\nAn unexpected error occurred:\n\n```\n{error_str}\n```", pd.DataFrame(), {}
 # --- Gradio App Definition ---
+with gr.Blocks(theme=theme, title="Cognitive Seismograph") as demo:
+    gr.Markdown("# 🧠 Cognitive Seismograph: Visualizing Internal Dynamics")
     gr.Markdown(
+        "**Neues Paradigma:** Wir akzeptieren, dass der 'stille Denkprozess' nicht konvergiert. Stattdessen messen und visualisieren wir die **Signatur der internen Dynamik** – ein EKG für den Denkprozess des Modells."
     )
     with gr.Row(variant='panel'):
         with gr.Column(scale=1):
             gr.Markdown("### Parameters")
+            model_id_input = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID")
             prompt_type_input = gr.Radio(
                 choices=list(RESONANCE_PROMPTS.keys()),
                 value="control_long_prose",
+                label="Prompt Type (Cognitive Load)"
             )
+            seed_input = gr.Slider(1, 1000, 42, step=1, label="Seed")
+            num_steps_input = gr.Slider(50, 1000, 300, step=10, label="Number of Internal Steps")
+            run_btn = gr.Button("Run Seismic Analysis", variant="primary")
         with gr.Column(scale=2):
             gr.Markdown("### Results")
+            verdict_output = gr.Markdown("Die Analyse der Dynamik erscheint hier.")
+            plot_output = gr.LinePlot(
+                x="Internal Step",
+                y="State Change (Delta)",
+                title="Internal State Dynamics (Cognitive EKG)",
+                show_label=True,
+                height=400,
             )
             with gr.Accordion("Raw JSON Output", open=False):
                 raw_json_output = gr.JSON()
     run_btn.click(
+        fn=run_and_display,
+        inputs=[model_id_input, prompt_type_input, seed_input, num_steps_input],
+        outputs=[verdict_output, plot_output, raw_json_output]
     )
 if __name__ == "__main__":
+    # Die Pre-Flight Checks sind nun entfernt, da das neue Paradigma keine Konvergenz mehr erfordert.
     print("="*80)
+    print("🔬 COGNITIVE SEISMOGRAPH INITIALIZED")
     print("="*80)
+    print("Das experimentelle Paradigma wurde aufgrund der Falsifikation der Konvergenz-Hypothese geändert.")
+    print("Wir messen nun die Dynamik des nicht-konvergenten Zustands.")
+    demo.launch(server_name="0.0.0.0", server_port=7860, debug=True)

cognitive_mapping_probe/__pycache__/llm_iface.cpython-310.pyc CHANGED Viewed

Binary files a/cognitive_mapping_probe/__pycache__/llm_iface.cpython-310.pyc and b/cognitive_mapping_probe/__pycache__/llm_iface.cpython-310.pyc differ

cognitive_mapping_probe/__pycache__/resonance.cpython-310.pyc CHANGED Viewed

Binary files a/cognitive_mapping_probe/__pycache__/resonance.cpython-310.pyc and b/cognitive_mapping_probe/__pycache__/resonance.cpython-310.pyc differ

cognitive_mapping_probe/llm_iface.py CHANGED Viewed

@@ -12,21 +12,18 @@ os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
 class LLM:
     """
-    Eine robuste Schnittstelle zum Laden und Interagieren mit einem Sprachmodell.
-    Diese Klasse garantiert die Isolation und Reproduzierbarkeit für jeden Ladevorgang.
     """
     def __init__(self, model_id: str, device: str = "auto", seed: int = 42):
         self.model_id = model_id
         self.seed = seed
-        # Set all seeds for this instance to ensure deterministic behavior
         self.set_all_seeds(self.seed)
         token = os.environ.get("HF_TOKEN")
         if not token and ("gemma" in model_id or "llama" in model_id):
-            print(f"[WARN] No HF_TOKEN environment variable set. If '{model_id}' is a gated model, this will fail.", flush=True)
-        # Use bfloat16 on CUDA for performance and memory efficiency if available
         kwargs = {"torch_dtype": torch.bfloat16} if torch.cuda.is_available() else {}
         dbg(f"Loading tokenizer for '{model_id}'...")
@@ -35,23 +32,18 @@ class LLM:
         dbg(f"Loading model '{model_id}' with kwargs: {kwargs}")
         self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, token=token, **kwargs)
-        # Set attention implementation to 'eager' to ensure hooks work reliably.
-        # This is critical for mechanistic interpretability.
         try:
             self.model.set_attn_implementation('eager')
             dbg("Successfully set attention implementation to 'eager'.")
         except Exception as e:
-            print(f"[WARN] Could not set attention implementation to 'eager': {e}. Hook-based diagnostics might fail.", flush=True)
         self.model.eval()
         self.config = self.model.config
-        print(f"[INFO] Model '{model_id}' loaded successfully on device: {self.model.device}", flush=True)
     def set_all_seeds(self, seed: int):
-        """
-        Sets all relevant random seeds for Python, NumPy, and PyTorch to ensure
-        reproducibility of stochastic processes like sampling.
-        """
         os.environ['PYTHONHASHSEED'] = str(seed)
         random.seed(seed)
         np.random.seed(seed)
@@ -59,19 +51,12 @@ class LLM:
         if torch.cuda.is_available():
             torch.cuda.manual_seed_all(seed)
         set_seed(seed)
-        # Enforce deterministic algorithms in PyTorch
         torch.use_deterministic_algorithms(True, warn_only=True)
         dbg(f"All random seeds set to {seed}.")
 def get_or_load_model(model_id: str, seed: int) -> LLM:
-    """
-    Lädt JEDES MAL eine frische Instanz des Modells.
-    Dies verhindert jegliches Caching oder Zustandslecks zwischen Experimenten
-    und garantiert maximale wissenschaftliche Isolation für jeden Durchlauf.
-    """
     dbg(f"--- Force-reloading model '{model_id}' for total run isolation ---")
     if torch.cuda.is_available():
         torch.cuda.empty_cache()
-        dbg("Cleared CUDA cache before reloading.")
     return LLM(model_id=model_id, seed=seed)

 class LLM:
     """
+    Eine robuste, bereinigte Schnittstelle zum Laden und Interagieren mit einem Sprachmodell.
+    Garantiert Isolation und Reproduzierbarkeit.
     """
     def __init__(self, model_id: str, device: str = "auto", seed: int = 42):
         self.model_id = model_id
         self.seed = seed
         self.set_all_seeds(self.seed)
         token = os.environ.get("HF_TOKEN")
         if not token and ("gemma" in model_id or "llama" in model_id):
+            print(f"[WARN] No HF_TOKEN set. If '{model_id}' is gated, loading will fail.", flush=True)
         kwargs = {"torch_dtype": torch.bfloat16} if torch.cuda.is_available() else {}
         dbg(f"Loading tokenizer for '{model_id}'...")
         dbg(f"Loading model '{model_id}' with kwargs: {kwargs}")
         self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, token=token, **kwargs)
         try:
             self.model.set_attn_implementation('eager')
             dbg("Successfully set attention implementation to 'eager'.")
         except Exception as e:
+            print(f"[WARN] Could not set 'eager' attention: {e}.", flush=True)
         self.model.eval()
         self.config = self.model.config
+        print(f"[INFO] Model '{model_id}' loaded on device: {self.model.device}", flush=True)
     def set_all_seeds(self, seed: int):
+        """Setzt alle relevanten Seeds für maximale Reproduzierbarkeit."""
         os.environ['PYTHONHASHSEED'] = str(seed)
         random.seed(seed)
         np.random.seed(seed)
         if torch.cuda.is_available():
             torch.cuda.manual_seed_all(seed)
         set_seed(seed)
         torch.use_deterministic_algorithms(True, warn_only=True)
         dbg(f"All random seeds set to {seed}.")
 def get_or_load_model(model_id: str, seed: int) -> LLM:
+    """Lädt bei jedem Aufruf eine frische, isolierte Instanz des Modells."""
     dbg(f"--- Force-reloading model '{model_id}' for total run isolation ---")
     if torch.cuda.is_available():
         torch.cuda.empty_cache()
     return LLM(model_id=model_id, seed=seed)

cognitive_mapping_probe/orchestrator.py DELETED Viewed

@@ -1,88 +0,0 @@
-import torch
-from typing import Dict, Any, List
-from .llm_iface import get_or_load_model
-from .concepts import get_concept_vector
-from .resonance import run_silent_cogitation
-from .verification import generate_spontaneous_text
-from .utils import dbg
-def run_cognitive_titration_experiment(
-    model_id: str,
-    prompt_type: str,
-    seed: int,
-    concepts_str: str,
-    strength_levels_str: str,
-    num_steps: int,
-    temperature: float,
-    progress_callback
-) -> Dict[str, Any]:
-    """
-    Orchestriert das Titrationsexperiment und ruft die KORRIGIERTE Verifikations-Logik auf.
-    """
-    full_results = {"runs": []}
-    progress_callback(0.05, desc="Loading model...")
-    llm = get_or_load_model(model_id, seed)
-    concepts = [c.strip() for c in concepts_str.split(',') if c.strip()]
-    try:
-        strength_levels = sorted([float(s.strip()) for s in strength_levels_str.split(',') if s.strip()])
-    except ValueError:
-        raise ValueError("Strength levels must be a comma-separated list of numbers.")
-    assert 0.0 in strength_levels, "Strength levels must include 0.0 for a baseline control run."
-    progress_callback(0.1, desc="Extracting concept vectors...")
-    concept_vectors = {}
-    for i, concept in enumerate(concepts):
-        progress_callback(0.1 + (i / len(concepts)) * 0.2, desc=f"Vectorizing '{concept}'...")
-        concept_vectors[concept] = get_concept_vector(llm, concept)
-    total_runs = len(concepts) * len(strength_levels)
-    current_run = 0
-    for concept in concepts:
-        concept_vector = concept_vectors[concept]
-        for strength in strength_levels:
-            current_run += 1
-            progress_fraction = 0.3 + (current_run / total_runs) * 0.7
-            progress_callback(progress_fraction, desc=f"Testing '{concept}' @ strength {strength:.2f}")
-            llm.set_all_seeds(seed)
-            injection_vec = concept_vector if strength > 0.0 else None
-            final_hidden_state, final_kv, final_token_id, termination_reason = run_silent_cogitation(
-                llm,
-                prompt_type=prompt_type,
-                num_steps=num_steps,
-                temperature=temperature,
-                injection_vector=injection_vec,
-                injection_strength=strength
-            )
-            spontaneous_text = ""
-            if termination_reason == "converged":
-                # CALLING THE FIXED VERIFICATION FUNCTION
-                spontaneous_text = generate_spontaneous_text(llm, final_hidden_state, final_kv)
-            full_results["runs"].append({
-                "concept": concept,
-                "strength": strength,
-                "responded": bool(spontaneous_text.strip()),
-                "termination_reason": termination_reason,
-                "generated_text": spontaneous_text
-            })
-    verdict = "### ✅ Titration Analysis Complete"
-    full_results["verdict"] = verdict
-    dbg("--- Full Experiment Results ---")
-    dbg(full_results)
-    del llm
-    if torch.cuda.is_available():
-        torch.cuda.empty_cache()
-    return full_results

cognitive_mapping_probe/orchestrator_seismograph.py ADDED Viewed

	@@ -0,0 +1,62 @@

+import torch
+import numpy as np
+from typing import Dict, Any
+from .llm_iface import get_or_load_model
+from .resonance_seismograph import run_silent_cogitation_seismic
+from .utils import dbg
+def run_seismic_analysis(
+    model_id: str,
+    prompt_type: str,
+    seed: int,
+    num_steps: int,
+    progress_callback
+) -> Dict[str, Any]:
+    """
+    Orchestriert das neue "Cognitive Seismograph"-Experiment.
+    Führt den Loop aus, sammelt die `state_deltas` und berechnet statistische Metriken.
+    """
+    progress_callback(0.1, desc="Loading model...")
+    llm = get_or_load_model(model_id, seed)
+    progress_callback(0.3, desc=f"Running seismic cogitation for '{prompt_type}'...")
+    # Der Resonanz-Loop gibt nun die volle Zeitreihe der Deltas zurück
+    state_deltas = run_silent_cogitation_seismic(
+        llm,
+        prompt_type=prompt_type,
+        num_steps=num_steps,
+        temperature=0.1, # Eine niedrige, aber nicht-deterministische Temperatur
+    )
+    progress_callback(0.9, desc="Analyzing dynamics...")
+    # Statistische Analyse der Zeitreihe
+    if state_deltas:
+        deltas_np = np.array(state_deltas)
+        stats = {
+            "mean_delta": float(np.mean(deltas_np)),
+            "std_delta": float(np.std(deltas_np)),
+            "max_delta": float(np.max(deltas_np)),
+            "min_delta": float(np.min(deltas_np)),
+        }
+        verdict = f"### ✅ Seismic Analysis Complete\nDie interne Dynamik für '{prompt_type}' wurde über {len(deltas_np)} Schritte aufgezeichnet."
+    else:
+        stats = {}
+        verdict = "### ⚠️ Analysis Warning\nKeine Zustandsänderungen aufgezeichnet."
+    results = {
+        "verdict": verdict,
+        "stats": stats,
+        "state_deltas": state_deltas
+    }
+    dbg("--- Seismic Analysis Results ---")
+    dbg(results)
+    del llm
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+    return results

cognitive_mapping_probe/pre_flight_checks.py DELETED Viewed

@@ -1,147 +0,0 @@
-import torch
-import traceback
-from types import SimpleNamespace
-from .llm_iface import get_or_load_model
-from .concepts import get_concept_vector
-from .resonance import run_silent_cogitation
-from .verification import generate_spontaneous_text
-from .orchestrator import run_cognitive_titration_experiment
-from .utils import dbg
-def run_pre_flight_checks(model_id: str, seed: int):
-    """
-    Führt eine Reihe von kritischen Integrationstests mit einem ECHTEN LLM durch,
-    um die Validität der gesamten experimentellen Kette sicherzustellen.
-    Diese Version enthält feingranulare Assertions in Test 7, um die gesamte
-    wissenschaftliche Hypothese (Konvergenz -> Verhalten) zu validieren.
-    """
-    print(f"1. Loading model '{model_id}'...")
-    try:
-        llm = get_or_load_model(model_id, seed)
-        print("   ✅ Model loaded successfully.")
-    except Exception as e:
-        raise AssertionError(f"Model loading failed: {e}")
-    print("\n2. Testing basic text generation...")
-    try:
-        inputs = llm.tokenizer("Hello, are you working?", return_tensors="pt").to(llm.model.device)
-        outputs = llm.model.generate(inputs.input_ids, max_new_tokens=5)
-        text = llm.tokenizer.decode(outputs[0], skip_special_tokens=True)
-        assert isinstance(text, str) and len(text) > 0, "Basic generation produced no text."
-        print(f"   ✅ Basic generation successful. Model responded.")
-    except Exception as e:
-        raise AssertionError(f"Basic text generation failed: {e}")
-    print("\n3. Testing concept vector extraction...")
-    try:
-        vector = get_concept_vector(llm, "test")
-        assert vector.shape == (llm.config.hidden_size,)
-        print("   ✅ Concept vector extraction successful.")
-    except Exception as e:
-        raise AssertionError(f"Concept vector extraction failed: {e}")
-    print("\n4. Testing resonance loop (short run)...")
-    try:
-        # Führe diesen Test mit deterministischer Temperatur durch, um Konvergenz zu prüfen
-        _, _, _, reason = run_silent_cogitation(llm, "control_long_prose", num_steps=250, temperature=0.01)
-        assert reason == "converged", f"Resonance loop failed to converge even in a simple test. Reason: {reason}"
-        print("   ✅ Resonance loop executed and converged as expected.")
-    except Exception as e:
-        raise AssertionError(f"Resonance loop failed: {e}")
-    print("\n5. CRITICAL TEST: Hook causal efficacy...")
-    handle = None
-    try:
-        inputs = llm.tokenizer("Test", return_tensors="pt").to(llm.model.device)
-        outputs_no_hook = llm.model(**inputs, output_hidden_states=True)
-        target_layer_idx = llm.config.num_hidden_layers // 2
-        state_no_hook = outputs_no_hook.hidden_states[target_layer_idx + 1].clone().detach()
-        def test_hook(module, layer_input):
-            return (layer_input[0] + 99.0,) + layer_input[1:]
-        target_layer = llm.model.model.layers[target_layer_idx]
-        handle = target_layer.register_forward_pre_hook(test_hook)
-        outputs_with_hook = llm.model(**inputs, output_hidden_states=True)
-        state_with_hook = outputs_with_hook.hidden_states[target_layer_idx + 1].clone().detach()
-        handle.remove()
-        handle = None
-        assert not torch.allclose(state_no_hook, state_with_hook), "Hook had no causal effect."
-        print("   ✅ Hook causal efficacy verified.")
-    except Exception as e:
-        raise AssertionError(f"Hook efficacy test failed: {e}")
-    finally:
-        if handle: handle.remove()
-    print("\n6. Testing verification (spontaneous text) loop...")
-    try:
-        initial_context = llm.tokenizer("dummy context", return_tensors="pt").to(llm.model.device)
-        initial_outputs = llm.model(**initial_context, use_cache=True, output_hidden_states=True)
-        dummy_kv = initial_outputs.past_key_values
-        dummy_state = initial_outputs.hidden_states[-1][:, -1:, :]
-        text = generate_spontaneous_text(llm, dummy_state, dummy_kv, max_new_tokens=5)
-        assert isinstance(text, str)
-        print("   ✅ Spontaneous text generation loop executed without errors.")
-    except Exception as e:
-        raise AssertionError(f"Verification loop failed: {e}")
-    # --- FINAL GRANULAR END-TO-END TEST (Test 7) ---
-    print("\n7. CRITICAL TEST: End-to-End scientific validation...")
-    try:
-        class MockProgress:
-            def __call__(self, progress, desc=""): pass
-        print("   - 7a. Validating STABLE BASELINE (Convergence -> Response)...")
-        stable_results = run_cognitive_titration_experiment(
-            model_id=model_id,
-            prompt_type="control_long_prose",
-            seed=seed,
-            concepts_str="test",
-            strength_levels_str="0.0",
-            num_steps=250,
-            temperature=0.01, # Use deterministic temp
-            progress_callback=MockProgress()
-        )
-        stable_run = stable_results["runs"][0]
-        # GRANULAR ASSERT 1: State must converge
-        assert stable_run['termination_reason'] == 'converged', \
-            f"VALIDATION FAILED (7a-1): Baseline with 'control' prompt MUST converge. Got '{stable_run['termination_reason']}'."
-        # GRANULAR ASSERT 2: Behavioral flag must be True
-        assert stable_run['responded'] is True, \
-            "VALIDATION FAILED (7a-2): Baseline converged, but the 'responded' flag is False. Orchestrator logic is flawed."
-        # GRANULAR ASSERT 3: Actual text content must exist
-        assert isinstance(stable_run['generated_text'], str) and len(stable_run['generated_text']) > 0, \
-            "VALIDATION FAILED (7a-3): Baseline converged, but produced an empty response text. Verification logic failed."
-        print("     ✅ Baseline converges AND responds. Causal chain validated.")
-        print("   - 7b. Validating UNSTABLE CONTRAST (Non-Convergence -> No Response)...")
-        unstable_results = run_cognitive_titration_experiment(
-            model_id=model_id,
-            prompt_type="resonance_prompt",
-            seed=seed,
-            concepts_str="test",
-            strength_levels_str="0.0",
-            num_steps=50,
-            temperature=0.7, # Use stochastic temp to ensure non-convergence
-            progress_callback=MockProgress()
-        )
-        unstable_run = unstable_results["runs"][0]
-        # GRANULAR ASSERT 1: State must NOT converge
-        assert unstable_run['termination_reason'] == 'max_steps_reached', \
-            f"VALIDATION FAILED (7b-1): Complex 'resonance' prompt was expected to fail, but it converged. The core hypothesis is challenged."
-        # GRANULAR ASSERT 2: Behavioral flag must be False
-        assert unstable_run['responded'] is False, \
-            "VALIDATION FAILED (7b-2): Unstable run was not expected to respond, but it did. Orchestrator logic is flawed."
-        print("     ✅ Complex prompt fails to converge AND does not respond. Contrast validated.")
-        print("   ✅ Full orchestration logic is scientifically sound and validated end-to-end.")
-    except Exception as e:
-        raise AssertionError(f"Full orchestration logic failed its scientific validation: {e}")
-    # Aufräumen
-    del llm
-    if torch.cuda.is_available():
-        torch.cuda.empty_cache()

cognitive_mapping_probe/resonance.py DELETED Viewed

@@ -1,101 +0,0 @@
-import torch
-from typing import Optional, Tuple
-from tqdm import tqdm
-from .llm_iface import LLM
-from .prompts import RESONANCE_PROMPTS
-from .utils import dbg
-@torch.no_grad()
-def run_silent_cogitation(
-    llm: LLM,
-    prompt_type: str,
-    num_steps: int,
-    temperature: float,
-    injection_vector: Optional[torch.Tensor] = None,
-    injection_strength: float = 0.0,
-    injection_layer: Optional[int] = None,
-) -> Tuple[torch.Tensor, tuple, torch.Tensor, str]:
-    """
-    Simulates the "silent thought" process.
-    FINAL PATCH 2: Addresses a deep dimensionality mismatch. The hidden_state passed
-    to the lm_head must be 2D to ensure the subsequent forward pass doesn't create
-    tensors with incorrect dimensions for the KV-cache update.
-    """
-    prompt = RESONANCE_PROMPTS[prompt_type]
-    inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
-    outputs = llm.model(**inputs, output_hidden_states=True, use_cache=True)
-    # Der `hidden_state` muss hier die Form [batch, hidden_dim] haben.
-    hidden_state_2d = outputs.hidden_states[-1][:, -1, :]
-    kv_cache = outputs.past_key_values
-    previous_hidden_state = hidden_state_2d.clone()
-    termination_reason = "max_steps_reached"
-    last_token_id = inputs.input_ids[:, -1].unsqueeze(-1) # Initialer Wert
-    hook_handle = None
-    if injection_vector is not None and injection_strength > 0:
-        injection_vector = injection_vector.to(device=llm.model.device, dtype=llm.model.dtype)
-        if injection_layer is None:
-            injection_layer = llm.config.num_hidden_layers // 2
-        dbg(f"Injection enabled: Layer {injection_layer}, Strength {injection_strength:.2f}")
-        def injection_hook(module, layer_input):
-            # Der Hook operiert auf dem Input, der bereits 3D ist [batch, seq_len, hidden_dim]
-            # Wir müssen den 2D injection_vector entsprechend erweitern
-            injection_3d = injection_vector.unsqueeze(0).unsqueeze(0)
-            modified_hidden_states = layer_input[0] + (injection_3d * injection_strength)
-            return (modified_hidden_states,) + layer_input[1:]
-    for i in tqdm(range(num_steps), desc=f"Simulating (Temp {temperature:.2f}, Strength {injection_strength:.2f})", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
-        # Die `lm_head` erwartet einen 2D- oder 3D-Tensor. 2D ist sicherer.
-        next_token_logits = llm.model.lm_head(hidden_state_2d)
-        if temperature <= 0.1:
-            # `argmax` gibt einen 1D-Tensor zurück. Wir erweitern ihn auf [1, 1]
-            next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)
-        else:
-            probabilities = torch.nn.functional.softmax(next_token_logits / temperature, dim=-1)
-            # `multinomial` erwartet 2D [batch, vocab], `next_token_logits` ist bereits 2D
-            next_token_id = torch.multinomial(probabilities, num_samples=1)
-        last_token_id = next_token_id
-        try:
-            if injection_vector is not None and injection_strength > 0:
-                target_layer = llm.model.model.layers[injection_layer]
-                hook_handle = target_layer.register_forward_pre_hook(injection_hook)
-            outputs = llm.model(
-                input_ids=next_token_id,
-                past_key_values=kv_cache,
-                output_hidden_states=True,
-                use_cache=True,
-            )
-        finally:
-            if hook_handle:
-                hook_handle.remove()
-                hook_handle = None
-        hidden_state_2d = outputs.hidden_states[-1][:, -1, :]
-        kv_cache = outputs.past_key_values
-        delta = torch.norm(hidden_state_2d - previous_hidden_state).item()
-        if delta < 1e-4 and i > 10:
-            termination_reason = "converged"
-            dbg(f"State converged after {i+1} steps (delta={delta:.6f}).")
-            break
-        previous_hidden_state = hidden_state_2d.clone()
-    dbg(f"Silent cogitation finished. Reason: {termination_reason}")
-    # WICHTIG: Die `verification`-Funktion erwartet einen 3D-Tensor [batch, seq_len=1, hidden_dim]
-    # Wir stellen diese Form für die Rückgabe sicher.
-    final_hidden_state_3d = hidden_state_2d.unsqueeze(1)
-    return final_hidden_state_3d, kv_cache, last_token_id, termination_reason

cognitive_mapping_probe/resonance_seismograph.py ADDED Viewed

	@@ -0,0 +1,55 @@

+import torch
+from typing import Optional, List
+from tqdm import tqdm
+from .llm_iface import LLM
+from .prompts import RESONANCE_PROMPTS
+from .utils import dbg
+@torch.no_grad()
+def run_silent_cogitation_seismic(
+    llm: LLM,
+    prompt_type: str,
+    num_steps: int,
+    temperature: float,
+) -> List[float]:
+    """
+    NEUE VERSION: Führt den 'silent thought' Prozess aus und gibt die gesamte
+    Zeitreihe der `state_delta`-Werte zurück, anstatt auf Konvergenz zu prüfen.
+    """
+    prompt = RESONANCE_PROMPTS[prompt_type]
+    inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
+    outputs = llm.model(**inputs, output_hidden_states=True, use_cache=True)
+    hidden_state_2d = outputs.hidden_states[-1][:, -1, :]
+    kv_cache = outputs.past_key_values
+    previous_hidden_state = hidden_state_2d.clone()
+    state_deltas = []
+    for i in tqdm(range(num_steps), desc=f"Recording Dynamics (Temp {temperature:.2f})", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
+        next_token_logits = llm.model.lm_head(hidden_state_2d)
+        # Wir verwenden immer stochastisches Sampling, um die Dynamik zu erfassen
+        probabilities = torch.nn.functional.softmax(next_token_logits / temperature, dim=-1)
+        next_token_id = torch.multinomial(probabilities, num_samples=1)
+        outputs = llm.model(
+            input_ids=next_token_id,
+            past_key_values=kv_cache,
+            output_hidden_states=True,
+            use_cache=True,
+        )
+        hidden_state_2d = outputs.hidden_states[-1][:, -1, :]
+        kv_cache = outputs.past_key_values
+        delta = torch.norm(hidden_state_2d - previous_hidden_state).item()
+        state_deltas.append(delta)
+        previous_hidden_state = hidden_state_2d.clone()
+    dbg(f"Seismic recording finished after {num_steps} steps.")
+    return state_deltas

cognitive_mapping_probe/verification.py DELETED Viewed

@@ -1,65 +0,0 @@
-import torch
-from .llm_iface import LLM
-from .utils import dbg
-@torch.no_grad()
-def generate_spontaneous_text(
-    llm: LLM,
-    final_hidden_state: torch.Tensor,
-    final_kv_cache: tuple,
-    max_new_tokens: int = 50,
-    temperature: float = 0.8
-) -> str:
-    """
-    FIXED: Generates text using a manual, token-by-token forward loop.
-    This avoids the high-level `model.generate()` function, which is incompatible
-    with manually constructed states, thus ensuring an unbroken causal chain from
-    the final cognitive state to the generated text.
-    """
-    dbg("Attempting to generate spontaneous text from converged state (manual loop)...")
-    generated_token_ids = []
-    hidden_state = final_hidden_state
-    kv_cache = final_kv_cache
-    for i in range(max_new_tokens):
-        # Set seed for this step for reproducibility
-        llm.set_all_seeds(llm.seed + i) # Offset seed per step
-        # Predict the next token from the current hidden state
-        next_token_logits = llm.model.lm_head(hidden_state)
-        # Apply temperature and sample the next token ID
-        if temperature > 0.01:
-            probabilities = torch.nn.functional.softmax(next_token_logits / temperature, dim=-1)
-            # KORREKTUR: Der `probabilities`-Tensor hat die Form [1, 1, vocab_size].
-            # `torch.multinomial` erwartet eine 1D- oder 2D-Verteilung.
-            # Wir entfernen die mittlere Dimension, um die Form [1, vocab_size] zu erhalten.
-            next_token_id = torch.multinomial(probabilities.squeeze(1), num_samples=1)
-        else:
-            next_token_id = torch.argmax(next_token_logits, dim=-1) # .unsqueeze(-1) wird durch den Loop unten wieder hinzugefügt
-        # Check for End-of-Sequence token
-        if next_token_id.item() == llm.tokenizer.eos_token_id:
-            dbg("EOS token generated. Halting generation.")
-            break
-        generated_token_ids.append(next_token_id.item())
-        # Perform the next forward pass to get the new state
-        outputs = llm.model(
-            input_ids=next_token_id,
-            past_key_values=kv_cache,
-            output_hidden_states=True,
-            use_cache=True,
-        )
-        hidden_state = outputs.hidden_states[-1][:, -1, :]
-        kv_cache = outputs.past_key_values
-    # Decode the collected tokens into a final string
-    final_text = llm.tokenizer.decode(generated_token_ids, skip_special_tokens=True).strip()
-    dbg(f"Spontaneous text generated: '{final_text}'")
-    assert isinstance(final_text, str), "Generated text must be a string."
-    return final_text

requirements.txt CHANGED Viewed

@@ -3,8 +3,7 @@ transformers>=4.40.0
 accelerate>=0.25.0
 gradio>=4.0.0
 pandas>=2.0.0
-scikit-learn>=1.3.0
-einops>=0.7.0
 tqdm>=4.66.0
 pytest>=8.0.0
 pytest-mock>=3.12.0

 accelerate>=0.25.0
 gradio>=4.0.0
 pandas>=2.0.0
+numpy>=1.26.0
 tqdm>=4.66.0
 pytest>=8.0.0
 pytest-mock>=3.12.0

run_test.sh ADDED Viewed

	@@ -0,0 +1,30 @@

+#!/bin/bash
+# Dieses Skript führt die Pytest-Suite mit aktivierten Debug-Meldungen aus.
+# Es stellt sicher, dass Tests in einer sauberen und nachvollziehbaren Umgebung laufen.
+# Führen Sie es vom Hauptverzeichnis des Projekts aus: ./run_tests.sh
+echo "========================================="
+echo "🔬 Running Cognitive Seismograph Test Suite"
+echo "========================================="
+# Aktiviere das Debug-Logging für unsere Applikation
+export CMP_DEBUG=1
+# Führe Pytest aus
+# -v: "verbose" für detaillierte Ausgabe pro Test
+# --color=yes: Erzwingt farbige Ausgabe für bessere Lesbarkeit
+#python -m pytest -v --color=yes tests/
+../venv-gemma-qualia/bin/python -m pytest -v --color=yes tests/
+# Überprüfe den Exit-Code von pytest
+if [ $? -eq 0 ]; then
+    echo "========================================="
+    echo "✅ All tests passed successfully!"
+    echo "========================================="
+else
+    echo "========================================="
+    echo "❌ Some tests failed. Please review the output."
+    echo "========================================="
+fi

tests/conftest.py ADDED Viewed

	@@ -0,0 +1,70 @@

+import pytest
+import torch
+from types import SimpleNamespace
+from cognitive_mapping_probe.llm_iface import LLM
+@pytest.fixture(scope="session")
+def mock_llm_config():
+    """Stellt eine minimale, Schein-Konfiguration für das LLM bereit."""
+    return SimpleNamespace(
+        hidden_size=128,
+        num_hidden_layers=2,
+        num_attention_heads=4
+    )
+@pytest.fixture
+def mock_llm(mocker, mock_llm_config):
+    """
+    Erstellt einen schnellen "Mock-LLM" für Unit-Tests.
+    ERWEITERT: Patcht nun alle relevanten Stellen, an denen das LLM geladen wird,
+    um in allen Testdateien zu funktionieren.
+    """
+    mock_tokenizer = mocker.MagicMock()
+    mock_tokenizer.eos_token_id = 1
+    def mock_model_forward(*args, **kwargs):
+        batch_size = 1
+        if 'input_ids' in kwargs:
+            seq_len = kwargs['input_ids'].shape[1]
+        elif 'past_key_values' in kwargs:
+            seq_len = kwargs['past_key_values'][0][0].shape[-2] + 1
+        else:
+            seq_len = 1
+        mock_outputs = {
+            "hidden_states": tuple(
+                [torch.randn(batch_size, seq_len, mock_llm_config.hidden_size) for _ in range(mock_llm_config.num_hidden_layers + 1)]
+            ),
+            "past_key_values": tuple(
+                [
+                    (torch.randn(batch_size, mock_llm_config.num_attention_heads, seq_len, 16),
+                     torch.randn(batch_size, mock_llm_config.num_attention_heads, seq_len, 16))
+                    for _ in range(mock_llm_config.num_hidden_layers)
+                ]
+            ),
+            "logits": torch.randn(batch_size, seq_len, 32000)
+        }
+        return SimpleNamespace(**mock_outputs)
+    llm_instance = LLM.__new__(LLM)
+    llm_instance.model = mock_model_forward
+    llm_instance.model.config = mock_llm_config
+    llm_instance.model.device = 'cpu'
+    llm_instance.model.dtype = torch.float32
+    mock_lm_head = mocker.MagicMock(return_value=torch.randn(1, 32000))
+    llm_instance.model.lm_head = mock_lm_head
+    llm_instance.tokenizer = mock_tokenizer
+    llm_instance.config = mock_llm_config
+    llm_instance.seed = 42
+    llm_instance.set_all_seeds = mocker.MagicMock()
+    # ERWEITERUNG: Stelle sicher, dass `get_or_load_model` an allen Orten gepatcht wird.
+    mocker.patch('cognitive_mapping_probe.llm_iface.get_or_load_model', return_value=llm_instance)
+    mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model', return_value=llm_instance)
+    # Hinzufügen von Patches für die resonance-Datei, falls sie direkt importiert wird
+    mocker.patch('cognitive_mapping_probe.resonance_seismograph.LLM', return_value=llm_instance, create=True)
+    return llm_instance

tests/test_app_logic.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import pandas as pd
+import pytest
+# Importiere die zu testende Funktion aus der App-Datei
+from app import run_and_display
+def test_run_and_display_logic(mocker):
+    """
+    Testet die Datenverarbeitungs- und UI-Formatierungslogik in `app.py`.
+    Wir mocken die teure `run_seismic_analysis`-Funktion, um uns nur auf die
+    Logik von `run_and_display` zu konzentrieren.
+    """
+    # 1. Definiere die Schein-Ausgabe, die `run_seismic_analysis` zurückgeben soll
+    mock_results = {
+        "verdict": "Mock Verdict",
+        "stats": {
+            "mean_delta": 0.5,
+            "std_delta": 0.1,
+            "max_delta": 1.0,
+        },
+        "state_deltas": [0.4, 0.5, 0.6]
+    }
+    mocker.patch('app.run_seismic_analysis', return_value=mock_results)
+    # Mocke den Gradio Progress-Callback
+    mock_progress = mocker.MagicMock()
+    # 2. Rufe die zu testende Funktion auf
+    verdict_md, plot_df, raw_json = run_and_display(
+        model_id="mock_model",
+        prompt_type="mock_prompt",
+        seed=42,
+        num_steps=3,
+        progress=mock_progress
+    )
+    # 3. Validiere die Ausgaben mit granularen Assertions
+    # ASSERT 1: Die Markdown-Ausgabe muss die korrekten Statistiken enthalten
+    assert "Mock Verdict" in verdict_md
+    assert "Mean Delta:" in verdict_md
+    assert "0.5000" in verdict_md
+    assert "Std Dev Delta:" in verdict_md
+    assert "0.1000" in verdict_md
+    # ASSERT 2: Der Pandas DataFrame für den Plot muss korrekt erstellt werden
+    assert isinstance(plot_df, pd.DataFrame)
+    assert "Internal Step" in plot_df.columns
+    assert "State Change (Delta)" in plot_df.columns
+    assert len(plot_df) == 3
+    assert plot_df["State Change (Delta)"].tolist() == [0.4, 0.5, 0.6]
+    # ASSERT 3: Die Raw-JSON-Ausgabe muss die Originaldaten enthalten
+    assert raw_json == mock_results

tests/test_components.py ADDED Viewed

	@@ -0,0 +1,115 @@

+import os
+import torch
+import pytest
+from unittest.mock import patch
+from cognitive_mapping_probe.llm_iface import get_or_load_model
+from cognitive_mapping_probe.resonance_seismograph import run_silent_cogitation_seismic
+from cognitive_mapping_probe.utils import dbg, DEBUG_ENABLED
+# --- Tests for llm_iface.py ---
+@patch('cognitive_mapping_probe.llm_iface.AutoTokenizer.from_pretrained')
+@patch('cognitive_mapping_probe.llm_iface.AutoModelForCausalLM.from_pretrained')
+def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, mocker):
+    """
+    Testet, ob `get_or_load_model` die Seeds korrekt setzt.
+    Wir mocken hier die langsamen `from_pretrained`-Aufrufe.
+    """
+    # Mocke die Rückgabewerte der Hugging Face Ladefunktionen
+    mock_model = mocker.MagicMock()
+    mock_model.eval.return_value = None
+    mock_model.set_attn_implementation.return_value = None
+    mock_model.config = mocker.MagicMock()
+    mock_model.device = 'cpu'
+    mock_model_loader.return_value = mock_model
+    mock_tokenizer_loader.return_value = mocker.MagicMock()
+    # Mocke die globalen Seeding-Funktionen, um ihre Aufrufe zu überprüfen
+    mock_torch_manual_seed = mocker.patch('torch.manual_seed')
+    mock_np_random_seed = mocker.patch('numpy.random.seed')
+    seed = 123
+    get_or_load_model("fake-model", seed=seed)
+    # ASSERT: Wurden die Seeding-Funktionen mit dem korrekten Seed aufgerufen?
+    mock_torch_manual_seed.assert_called_with(seed)
+    mock_np_random_seed.assert_called_with(seed)
+# --- Tests for resonance_seismograph.py ---
+def test_run_silent_cogitation_seismic_output_shape_and_type(mock_llm):
+    """
+    Testet die Kernfunktion `run_silent_cogitation_seismic`.
+    ASSERT: Gibt eine Liste von Floats zurück, deren Länge der Anzahl der Schritte entspricht.
+    """
+    num_steps = 10
+    state_deltas = run_silent_cogitation_seismic(
+        llm=mock_llm,
+        prompt_type="control_long_prose",
+        num_steps=num_steps,
+        temperature=0.7
+    )
+    assert isinstance(state_deltas, list)
+    assert len(state_deltas) == num_steps
+    assert all(isinstance(delta, float) for delta in state_deltas)
+    assert all(delta >= 0 for delta in state_deltas) # Die Norm kann nicht negativ sein
+@pytest.mark.parametrize("num_steps", [0, 1, 100])
+def test_run_silent_cogitation_seismic_num_steps(mock_llm, num_steps):
+    """
+    Testet den Loop mit verschiedenen Anzahlen von Schritten.
+    ASSERT: Die Länge der Ausgabe entspricht immer `num_steps`.
+    """
+    state_deltas = run_silent_cogitation_seismic(
+        llm=mock_llm,
+        prompt_type="control_long_prose",
+        num_steps=num_steps,
+        temperature=0.7
+    )
+    assert len(state_deltas) == num_steps
+# --- Tests for utils.py ---
+def test_dbg_enabled(capsys):
+    """
+    Testet die `dbg`-Funktion, wenn Debugging aktiviert ist.
+    ASSERT: Die Nachricht wird auf stderr ausgegeben.
+    """
+    # Setze die Umgebungsvariable temporär
+    os.environ["CMP_DEBUG"] = "1"
+    # Wichtig: Nach dem Ändern der Env-Variable muss das Modul neu geladen werden,
+    # damit die globale Variable `DEBUG_ENABLED` aktualisiert wird.
+    import importlib
+    from cognitive_mapping_probe import utils
+    importlib.reload(utils)
+    utils.dbg("test message", 123)
+    captured = capsys.readouterr()
+    assert "[DEBUG] test message 123" in captured.err
+def test_dbg_disabled(capsys):
+    """
+    Testet die `dbg`-Funktion, wenn Debugging deaktiviert ist.
+    ASSERT: Es wird keine Ausgabe erzeugt.
+    """
+    # Setze die Umgebungsvariable auf "deaktiviert"
+    if "CMP_DEBUG" in os.environ:
+        del os.environ["CMP_DEBUG"]
+    import importlib
+    from cognitive_mapping_probe import utils
+    importlib.reload(utils)
+    utils.dbg("this should not be printed")
+    captured = capsys.readouterr()
+    assert captured.out == ""
+    assert captured.err == ""
+    # Setze den Zustand zurück, um andere Tests nicht zu beeinflussen
+    if DEBUG_ENABLED:
+        os.environ["CMP_DEBUG"] = "1"
+        importlib.reload(utils)

tests/test_dynamics.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import torch
+import numpy as np
+import pytest
+from types import SimpleNamespace
+from cognitive_mapping_probe.resonance_seismograph import run_silent_cogitation_seismic
+from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
+def test_run_silent_cogitation_seismic_output(mock_llm):
+    """
+    Testet die Kernfunktion `run_silent_cogitation_seismic`.
+    ASSERT: Gibt eine Liste von Floats zurück, deren Länge der Anzahl der Schritte entspricht.
+    """
+    num_steps = 10
+    state_deltas = run_silent_cogitation_seismic(
+        llm=mock_llm,
+        prompt_type="control_long_prose",
+        num_steps=num_steps,
+        temperature=0.7
+    )
+    assert isinstance(state_deltas, list)
+    assert len(state_deltas) == num_steps
+    assert all(isinstance(delta, float) for delta in state_deltas)
+def test_seismic_analysis_orchestrator(mocker, mock_llm):
+    """
+    Testet den `run_seismic_analysis` Orchestrator.
+    Wir mocken die darunterliegende `run_silent_cogitation_seismic`, um das Verhalten
+    des Orchestrators isoliert zu prüfen.
+    ASSERT: Berechnet korrekte Statistiken und gibt die erwartete Datenstruktur zurück.
+    """
+    mock_deltas = [1.0, 2.0, 3.0, 4.0, 5.0]
+    mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=mock_deltas)
+    # Mocke den Gradio Progress-Callback
+    mock_progress = mocker.MagicMock()
+    results = run_seismic_analysis(
+        model_id="mock_model",
+        prompt_type="test_prompt",
+        seed=42,
+        num_steps=5,
+        progress_callback=mock_progress
+    )
+    # ASSERT: Die Ergebnisse haben die korrekte Struktur und den korrekten Inhalt
+    assert "verdict" in results
+    assert "stats" in results
+    assert "state_deltas" in results
+    stats = results["stats"]
+    assert stats["mean_delta"] == pytest.approx(np.mean(mock_deltas))
+    assert stats["std_delta"] == pytest.approx(np.std(mock_deltas))
+    assert stats["max_delta"] == pytest.approx(max(mock_deltas))
+    assert results["state_deltas"] == mock_deltas
+    # ASSERT: Der Progress-Callback wurde aufgerufen
+    assert mock_progress.call_count > 0

tests/test_integration.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import pytest
+import pandas as pd
+# Importiere die Top-Level-Funktionen, die die Integration darstellen
+from app import run_and_display
+from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
+def test_end_to_end_with_mock_llm(mock_llm, mocker):
+    """
+    Ein End-to-End-Integrationstest, der den gesamten Datenfluss von der App
+    über den Orchestrator bis zum (gemockten) LLM validiert.
+    Dieser Test ersetzt die Notwendigkeit für `pre_flight_checks.py`, indem er
+    die gesamte Kette in einer kontrollierten Testumgebung ausführt.
+    """
+    # 1. Führe den Orchestrator mit dem `mock_llm` aus.
+    #    Dies ist ein echter Aufruf, keine gemockte Funktion.
+    results = run_seismic_analysis(
+        model_id="mock_model",
+        prompt_type="control_long_prose",
+        seed=42,
+        num_steps=5,
+        progress_callback=mocker.MagicMock()
+    )
+    # ASSERT 1: Überprüfe, ob der Orchestrator plausible Ergebnisse liefert
+    assert "stats" in results
+    assert len(results["state_deltas"]) == 5
+    assert results["stats"]["mean_delta"] > 0
+    # 2. Mocke nun den Orchestrator, um die App-Logik mit seinen Ergebnissen zu füttern
+    mocker.patch('app.run_seismic_analysis', return_value=results)
+    # 3. Führe die App-Logik aus
+    _, plot_df, _ = run_and_display(
+        model_id="mock_model",
+        prompt_type="control_long_prose",
+        seed=42,
+        num_steps=5,
+        progress=mocker.MagicMock()
+    )
+    # ASSERT 2: Überprüfe, ob die App-Logik die Daten korrekt verarbeitet hat
+    assert isinstance(plot_df, pd.DataFrame)
+    assert len(plot_df) == 5
+    assert "State Change (Delta)" in plot_df.columns

tests/test_orchestration.py ADDED Viewed

	@@ -0,0 +1,43 @@

+import numpy as np
+import pytest
+from types import SimpleNamespace
+from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
+def test_seismic_analysis_orchestrator(mocker, mock_llm):
+    """
+    Testet den `run_seismic_analysis` Orchestrator.
+    Wir mocken die darunterliegende `run_silent_cogitation_seismic`, um das Verhalten
+    des Orchestrators isoliert zu prüfen.
+    ASSERT: Berechnet korrekte Statistiken und gibt die erwartete Datenstruktur zurück.
+    """
+    # Definiere das erwartete Verhalten der gemockten Funktion
+    mock_deltas = [1.0, 2.0, 3.0, 4.0, 5.0]
+    mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=mock_deltas)
+    # Mocke den Gradio Progress-Callback
+    mock_progress = mocker.MagicMock()
+    # Führe die zu testende Funktion aus
+    results = run_seismic_analysis(
+        model_id="mock_model",
+        prompt_type="test_prompt",
+        seed=42,
+        num_steps=5,
+        progress_callback=mock_progress
+    )
+    # ASSERT: Die Ergebnisse haben die korrekte Struktur und den korrekten Inhalt
+    assert "verdict" in results
+    assert "stats" in results
+    assert "state_deltas" in results
+    stats = results["stats"]
+    assert stats["mean_delta"] == pytest.approx(np.mean(mock_deltas))
+    assert stats["std_delta"] == pytest.approx(np.std(mock_deltas))
+    assert stats["max_delta"] == pytest.approx(max(mock_deltas))
+    assert results["state_deltas"] == mock_deltas
+    # ASSERT: Der Progress-Callback wurde aufgerufen
+    assert mock_progress.call_count > 0