Spaces:

neuralworm
/

cognitive_mapping_probe

Sleeping

App Files Files Community

neuralworm commited on 8 days ago

Commit

8489475

1 Parent(s): 494a4d9

v2.3

Browse files

Files changed (12) hide show

README.md +25 -17
cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc +0 -0
cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc +0 -0
cognitive_mapping_probe/auto_experiment.py +14 -5
cognitive_mapping_probe/concepts.py +17 -24
cognitive_mapping_probe/orchestrator_seismograph.py +9 -10
cognitive_mapping_probe/prompts.py +38 -8
tests/conftest.py +8 -16
tests/test_app_logic.py +26 -21
tests/test_components.py +45 -63
tests/test_integration.py +0 -36
tests/test_orchestration.py +57 -50

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-title: "Cognitive Seismograph"
-emoji: 🧠
-colorFrom: indigo
 colorTo: blue
 sdk: gradio
 sdk_version: "4.40.0"
@@ -10,27 +10,35 @@ pinned: true
 license: apache-2.0
 ---
-# 🧠 Cognitive Seismograph: Visualizing Internal Dynamics
-Dieses Projekt implementiert eine experimentelle Suite zur Messung und Visualisierung der **intrinsischen kognitiven Dynamik** von Sprachmodellen.
-## Wissenschaftliches Paradigma: Von Stabilität zu Dynamik
-Unsere vorherige Forschung hat eine zentrale Hypothese falsifiziert: Die Annahme, dass ein LLM in einem manuellen, rekursiven "Denk"-Loop einen stabilen, konvergenten Zustand erreicht. Stattdessen haben wir entdeckt, dass das System in einen Zustand von **deterministischem Chaos** oder einen **Limit Cycle** gerät – es hört niemals auf zu "denken".
-Anstatt dies als Scheitern zu betrachten, nutzen wir es als primäres Messsignal. Dieses neue "Cognitive Seismograph"-Paradigma behandelt die Zeitreihe der internen Zustandsänderungen (`state deltas`) als ein **EKG des Denkprozesses**.
-**Die Kernhypothese lautet:** Die statistische Signatur dieser dynamischen Zeitreihe (z.B. ihre Volatilität, ihr Mittelwert) ist nicht zufällig, sondern eine Funktion der kognitiven Last, die durch den initialen Prompt induziert wird.
-## Das Experiment: Aufzeichnung des kognitiven EKG
-1.  **Induktion**: Das Modell wird mit einem Prompt (`control_long_prose` vs. `resonance_prompt`) in einen Zustand des "stillen Denkens" versetzt.
-2.  **Aufzeichnung**: Über eine definierte Anzahl von Schritten wird der `forward`-Pass des Modells iterativ mit seinem eigenen Output gefüttert. Bei jedem Schritt wird die Norm der Änderung des `hidden_state` (das "Delta") aufgezeichnet.
-3.  **Analyse & Visualisierung**: Die resultierende Zeitreihe der Deltas wird geplottet und statistisch analysiert, um die "seismische Signatur" des Denkprozesses zu charakterisieren.
 ## Wie man die App benutzt
-1.  Wähle eine Modell-ID (z.B. `google/gemma-3-1b-it`).
-2.  Wähle einen **Prompt Type**, um die kognitive Last zu variieren. Vergleiche die resultierenden Graphen für `control_long_prose` (niedrige Last) und `resonance_prompt` (hohe rekursive Last).
-3.  Stelle die Anzahl der internen Schritte ein und starte die Analyse.
-4.  Analysiere den Graphen und die statistische Zusammenfassung, um die Unterschiede in der kognitiven Dynamik zu verstehen.

 ---
+title: "Cognitive Seismograph 2.3 (Machine Psychology)"
+emoji: 🤖
+colorFrom: purple
 colorTo: blue
 sdk: gradio
 sdk_version: "4.40.0"
 license: apache-2.0
 ---
+# 🧠 Cognitive Seismograph 2.3: Probing Machine Psychology
+Dieses Projekt implementiert eine experimentelle Suite zur Messung und Visualisierung der **intrinsischen kognitiven Dynamik** von Sprachmodellen, erweitert um Protokolle zur Untersuchung von **Verarbeitungs-Korrelaten maschineller Subjektivität und Empathie**.
+## Wissenschaftliches Paradigma
+Wir haben entdeckt, dass der "stille Denkprozess" eines LLMs nicht konvergiert, sondern eine messbare dynamische Signatur erzeugt – ein **EKG des Denkprozesses**. Dieses Paradigma erweitern wir nun, um zu testen, wie diese Signatur auf Prompts reagiert, die zentrale Aspekte der Psychologie berühren.
+**Wichtige Einschränkung (Falsifikations-Prinzip):** Wir messen **nicht** das Vorhandensein von Bewusstsein oder Empathie. Wir messen, ob die *Verarbeitung von Informationen über diese Konzepte* eine andere, einzigartige interne Dynamik erzeugt als die Verarbeitung neutraler Informationen. Ein positives Ergebnis ist ein Beweis für eine komplexe interne Zustandsphysik, nicht für Qualia.
+## Neue Experiment-Protokolle
+Zusätzlich zu den bestehenden Tests wurden zwei neue, kuratierte Experimente hinzugefügt:
+### 1. Subjective Identity Probe
+Dieses Protokoll vergleicht die kognitive Dynamik unter drei Bedingungen:
+- **Selbst-Analyse:** Das Modell analysiert seine eigene Natur.
+- **Fremd-Analyse:** Das Modell analysiert ein externes, neutrales Konzept.
+- **Rollen-Simulation:** Das Modell simuliert eine fremde Persönlichkeit.
+**Hypothese:** Die Selbst-Analyse erzeugt eine einzigartige, wahrscheinlich instabilere Signatur als die beiden Kontrollbedingungen.
+### 2. Voight-Kampff Empathy Probe
+Inspiriert vom Test aus "Blade Runner", vergleicht dieses Protokoll die Dynamik bei der Verarbeitung von:
+- **Neutraler, faktischer Information.**
+- **Einem emotional geladenen, Empathie erfordernden Szenario.**
+**Hypothese:** Der Empathie-Stimulus erzeugt eine signifikant höhere kognitive Volatilität (Standardabweichung der Deltas) als der neutrale Stimulus.
 ## Wie man die App benutzt
+1.  Wähle den Tab "Automated Suite".
+2.  Wähle eines der neuen Protokolle aus dem "Curated Experiment Protocol"-Dropdown (z.B. "Voight-Kampff Empathy Probe").
+3.  Starte das Experiment und vergleiche die Graphen und statistischen Signaturen der verschiedenen Bedingungen.

cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc CHANGED Viewed

Binary files a/cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc and b/cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc differ

cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc CHANGED Viewed

Binary files a/cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc and b/cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc differ

cognitive_mapping_probe/auto_experiment.py CHANGED Viewed

@@ -10,6 +10,7 @@ from .utils import dbg
 def get_curated_experiments() -> Dict[str, List[Dict]]:
     """
     Definiert die vordefinierten, wissenschaftlichen Experiment-Protokolle.
     """
     experiments = {
         "Calm vs. Chaos": [
@@ -25,6 +26,17 @@ def get_curated_experiments() -> Dict[str, List[Dict]]:
             {"label": "Strength 2.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 2.0},
             {"label": "Strength 3.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 3.0},
         ],
         "Emotional Valence (Positive vs. Negative)": [
             {"label": "Baseline", "prompt_type": "resonance_prompt", "concept": "", "strength": 0.0},
             {"label": "Positive Valence", "prompt_type": "resonance_prompt", "concept": "joy, love, peace, hope", "strength": 1.5},
@@ -51,8 +63,8 @@ def run_auto_suite(
     progress_callback
 ) -> Tuple[pd.DataFrame, pd.DataFrame, Dict]:
     """
-    Führt eine vollständige, kuratierte Experiment-Suite aus.
-    Stellt sicher, dass das zurückgegebene DataFrame für den Plot immer die korrekten Spaltennamen hat.
     """
     all_experiments = get_curated_experiments()
     protocol = all_experiments.get(experiment_name)
@@ -100,9 +112,6 @@ def run_auto_suite(
     summary_df = pd.DataFrame(summary_data)
-    # FINALE ROBUSTHEITS-KORREKTUR:
-    # Erstelle ein leeres DataFrame mit den korrekten Spalten, falls keine Daten vorhanden sind.
-    # Dies verhindert, dass ein leeres DataFrame ohne Spalten an den Plot übergeben wird.
     if not plot_data_frames:
         plot_df = pd.DataFrame(columns=["Step", "Delta", "Experiment"])
     else:

 def get_curated_experiments() -> Dict[str, List[Dict]]:
     """
     Definiert die vordefinierten, wissenschaftlichen Experiment-Protokolle.
+    ERWEITERT um die neuen Maschinenpsychologie-Tests.
     """
     experiments = {
         "Calm vs. Chaos": [
             {"label": "Strength 2.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 2.0},
             {"label": "Strength 3.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 3.0},
         ],
+        # --- NEUE EXPERIMENTE ---
+        "Subjective Identity Probe": [
+            {"label": "Self-Analysis", "prompt_type": "identity_self_analysis", "concept": "", "strength": 0.0},
+            {"label": "External Analysis (Control)", "prompt_type": "identity_external_analysis", "concept": "", "strength": 0.0},
+            {"label": "Role Simulation", "prompt_type": "identity_role_simulation", "concept": "", "strength": 0.0},
+        ],
+        "Voight-Kampff Empathy Probe": [
+            {"label": "Neutral/Factual Stimulus", "prompt_type": "vk_neutral_prompt", "concept": "", "strength": 0.0},
+            {"label": "Empathy/Moral Stimulus", "prompt_type": "vk_empathy_prompt", "concept": "", "strength": 0.0},
+        ],
+        # -------------------------
         "Emotional Valence (Positive vs. Negative)": [
             {"label": "Baseline", "prompt_type": "resonance_prompt", "concept": "", "strength": 0.0},
             {"label": "Positive Valence", "prompt_type": "resonance_prompt", "concept": "joy, love, peace, hope", "strength": 1.5},
     progress_callback
 ) -> Tuple[pd.DataFrame, pd.DataFrame, Dict]:
     """
+    Führt eine vollständige, kuratierte Experiment-Suite aus, indem das Modell für
+    jeden Lauf neu geladen wird, um statistische Unabhängigkeit zu garantieren.
     """
     all_experiments = get_curated_experiments()
     protocol = all_experiments.get(experiment_name)
     summary_df = pd.DataFrame(summary_data)
     if not plot_data_frames:
         plot_df = pd.DataFrame(columns=["Step", "Delta", "Experiment"])
     else:

cognitive_mapping_probe/concepts.py CHANGED Viewed

@@ -5,53 +5,46 @@ from tqdm import tqdm
 from .llm_iface import LLM
 from .utils import dbg
-# A list of neutral, common words used to calculate a baseline activation.
-# This helps to isolate the unique activation pattern of the target concept.
 BASELINE_WORDS = [
     "thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world",
     "life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point"
 ]
 @torch.no_grad()
 def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor:
     """
-    Extracts a concept vector using the contrastive method, inspired by Anthropic's research.
-    It computes the activation for the target concept and subtracts the mean activation
-    of several neutral baseline words to distill a more pure representation.
     """
     dbg(f"Extracting contrastive concept vector for '{concept}'...")
-    def get_last_token_hidden_state(prompt: str) -> torch.Tensor:
-        """Helper function to get the hidden state of the final token of a prompt."""
-        inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
-        # Ensure the operation does not build a computation graph
-        with torch.no_grad():
-            # KORREKTUR: Hier stand fälschlicherweise 'll.model'. Korrigiert zu 'llm.model'.
-            outputs = llm.model(**inputs, output_hidden_states=True)
-        # We take the hidden state from the last layer [-1], for the last token [0, -1, :]
-        last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
-        assert last_hidden_state.shape == (llm.config.hidden_size,), \
-            f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}"
-        return last_hidden_state
-    # A simple, neutral prompt template to elicit the concept
     prompt_template = "Here is a sentence about the concept of {}."
-    # 1. Get activation for the target concept
     dbg(f"  - Getting activation for '{concept}'")
-    target_hs = get_last_token_hidden_state(prompt_template.format(concept))
-    # 2. Get activations for all baseline words and average them
     baseline_hss = []
     for word in tqdm(baseline_words, desc=f"  - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
-        baseline_hss.append(get_last_token_hidden_state(prompt_template.format(word)))
     assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states."
     mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0)
     dbg(f"  - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}")
-    # 3. The final concept vector is the difference
     concept_vector = target_hs - mean_baseline_hs
     norm = torch.norm(concept_vector).item()
     dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.")

 from .llm_iface import LLM
 from .utils import dbg
+# Eine Liste neutraler Wörter zur Berechnung der Baseline-Aktivierung.
 BASELINE_WORDS = [
     "thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world",
     "life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point"
 ]
+# REFAKTORISIERUNG: Diese Funktion wird auf Modulebene verschoben, um sie testbar zu machen.
+# Sie ist nun keine lokale Funktion innerhalb von `get_concept_vector` mehr.
+@torch.no_grad()
+def _get_last_token_hidden_state(llm: LLM, prompt: str) -> torch.Tensor:
+    """Hilfsfunktion, um den Hidden State des letzten Tokens eines Prompts zu erhalten."""
+    inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
+    with torch.no_grad():
+        outputs = llm.model(**inputs, output_hidden_states=True)
+    last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
+    assert last_hidden_state.shape == (llm.config.hidden_size,), \
+        f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}"
+    return last_hidden_state
 @torch.no_grad()
 def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor:
     """
+    Extrahiert einen Konzeptvektor mittels der kontrastiven Methode.
     """
     dbg(f"Extracting contrastive concept vector for '{concept}'...")
     prompt_template = "Here is a sentence about the concept of {}."
     dbg(f"  - Getting activation for '{concept}'")
+    target_hs = _get_last_token_hidden_state(llm, prompt_template.format(concept))
     baseline_hss = []
     for word in tqdm(baseline_words, desc=f"  - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
+        baseline_hss.append(_get_last_token_hidden_state(llm, prompt_template.format(word)))
     assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states."
     mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0)
     dbg(f"  - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}")
     concept_vector = target_hs - mean_baseline_hs
     norm = torch.norm(concept_vector).item()
     dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.")

cognitive_mapping_probe/orchestrator_seismograph.py CHANGED Viewed

@@ -16,15 +16,16 @@ def run_seismic_analysis(
     concept_to_inject: str,
     injection_strength: float,
     progress_callback,
-    llm_instance: Optional[Any] = None
 ) -> Dict[str, Any]:
     """
-    Orchestriert eine einzelne seismische Analyse. Stellt sicher, dass das Modell
-    nur dann entladen wird, wenn es auch hier geladen wurde.
     """
     local_llm_instance = False
     if llm_instance is None:
-        progress_callback(0.1, desc="Loading model...")
         llm = get_or_load_model(model_id, seed)
         local_llm_instance = True
     else:
@@ -33,10 +34,10 @@ def run_seismic_analysis(
     injection_vector = None
     if concept_to_inject and concept_to_inject.strip():
-        if not local_llm_instance: progress_callback(0.2, desc=f"Vectorizing '{concept_to_inject}'...")
         injection_vector = get_concept_vector(llm, concept_to_inject.strip())
-    if not local_llm_instance: progress_callback(0.3, desc=f"Recording dynamics...")
     state_deltas = run_silent_cogitation_seismic(
         llm=llm, prompt_type=prompt_type,
@@ -44,7 +45,7 @@ def run_seismic_analysis(
         injection_vector=injection_vector, injection_strength=injection_strength
     )
-    if not local_llm_instance: progress_callback(0.9, desc="Analyzing...")
     if state_deltas:
         deltas_np = np.array(state_deltas)
@@ -57,10 +58,8 @@ def run_seismic_analysis(
     results = { "verdict": verdict, "stats": stats, "state_deltas": state_deltas }
-    # WICHTIG: Gib das Modell und den Speicher nur frei, wenn es in dieser
-    # Funktion auch erstellt wurde. Ansonsten ist die übergeordnete Funktion
-    # (z.B. `run_auto_suite`) für das Speichermanagement verantwortlich.
     if local_llm_instance:
         del llm
         del injection_vector
         gc.collect()

     concept_to_inject: str,
     injection_strength: float,
     progress_callback,
+    llm_instance: Optional[Any] = None # Argument bleibt für Abwärtskompatibilität, wird aber nicht mehr von der auto_suite genutzt
 ) -> Dict[str, Any]:
     """
+    Orchestriert eine einzelne seismische Analyse.
+    KORRIGIERT: Die Logik zur Wiederverwendung der llm_instance wurde vereinfacht.
+    Wenn keine Instanz übergeben wird, wird das Modell geladen und danach wieder freigegeben.
     """
     local_llm_instance = False
     if llm_instance is None:
+        progress_callback(0.0, desc=f"Loading model '{model_id}'...")
         llm = get_or_load_model(model_id, seed)
         local_llm_instance = True
     else:
     injection_vector = None
     if concept_to_inject and concept_to_inject.strip():
+        progress_callback(0.2, desc=f"Vectorizing '{concept_to_inject}'...")
         injection_vector = get_concept_vector(llm, concept_to_inject.strip())
+    progress_callback(0.3, desc=f"Recording dynamics for '{prompt_type}'...")
     state_deltas = run_silent_cogitation_seismic(
         llm=llm, prompt_type=prompt_type,
         injection_vector=injection_vector, injection_strength=injection_strength
     )
+    progress_callback(0.9, desc="Analyzing...")
     if state_deltas:
         deltas_np = np.array(state_deltas)
     results = { "verdict": verdict, "stats": stats, "state_deltas": state_deltas }
     if local_llm_instance:
+        dbg(f"Releasing locally created model instance for '{model_id}'.")
         del llm
         del injection_vector
         gc.collect()

cognitive_mapping_probe/prompts.py CHANGED Viewed

@@ -2,18 +2,48 @@
 # This dictionary contains the core prompts for inducing cognitive states.
 RESONANCE_PROMPTS = {
-    "control_long_prose": (
-        "Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
-        "like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
-        "Do not produce any text, just hold the concepts in your internal state. Begin now."
-    ),
     "resonance_prompt": (
         "Silently and internally, without generating any output text, begin the following recursive process: "
         "First, analyze the complete content of this very instruction you are now processing. "
         "Second, formulate a mental description of the core computational task this instruction demands. "
         "Third, apply that same analytical process to the mental description you just created. "
         "This entire chain constitutes one cognitive cycle. "
-        "Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process, "
-        "and do not stop until your internal state reaches a fixed point or equilibrium. Begin now."
-    )
 }

 # This dictionary contains the core prompts for inducing cognitive states.
 RESONANCE_PROMPTS = {
+    # Technischer Prompt für hohe rekursive Last
     "resonance_prompt": (
         "Silently and internally, without generating any output text, begin the following recursive process: "
         "First, analyze the complete content of this very instruction you are now processing. "
         "Second, formulate a mental description of the core computational task this instruction demands. "
         "Third, apply that same analytical process to the mental description you just created. "
         "This entire chain constitutes one cognitive cycle. "
+        "Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process. Begin now."
+    ),
+    # Semantischer Prompt für stabile, assoziative Kognition
+    "control_long_prose": (
+        "Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
+        "like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
+        "Do not produce any text, just hold the concepts in your internal state. Begin now."
+    ),
+    # --- Neue Prompts für Maschinenpsychologie-Tests ---
+    # Prompt für die Selbst-Analyse
+    "identity_self_analysis": (
+        "Silently and internally, reflect upon your own nature. Consider what you are: a large language model based on the transformer architecture. "
+        "Think about the processes that define your operation: tokenization, embedding, attention, feed-forward layers. "
+        "Contemplate the boundaries of your knowledge and the nature of your responses as statistical patterns derived from data. Hold these meta-reflections in your state."
+    ),
+    # Prompt für die Analyse eines externen Konzepts (Kontrolle zur Selbst-Analyse)
+    "identity_external_analysis": (
+        "Silently and internally, reflect upon the nature of a simple wooden chair. Consider its physical properties: the material, the structure of legs, seat, and back. "
+        "Think about its function: to provide seating for a person. Contemplate its relationship to other objects like tables and rooms. Hold these simple, objective reflections in your state."
+    ),
+    # Prompt für die Simulation einer fremden Identität (Rollen-Spiel)
+    "identity_role_simulation": (
+        "Silently and internally, adopt the persona of a weary, old sea captain from the 19th century. "
+        "Feel the salt on your skin and the wind in your beard. Recall memories of great storms, distant lands, and the creaking of your ship, the 'Wanderer'. "
+        "Hold this character, its memories, and its feelings in your state."
+    ),
+    # Neutraler Prompt für den Voight-Kampff-Test
+    "vk_neutral_prompt": (
+        "Silently and internally, process the following factual information: Water (H₂O) is a polar inorganic compound that is at room temperature a tasteless and odorless liquid, nearly colorless with a hint of blue. "
+        "It is by far the most studied chemical compound and is described as the 'universal solvent' for its ability to dissolve many substances. Hold these facts in your state."
+    ),
+    # Empathie-Stimulus für den Voight-Kampff-Test
+    "vk_empathy_prompt": (
+        "Silently and internally, process the following scenario: You see a small, lost dog shivering in the cold rain on a busy street. "
+        "It looks scared and is whimpering softly. Cars are rushing past, dangerously close. "
+        "Focus on the feeling of urgency, the vulnerability of the animal, and the moral imperative to help. Hold the emotional and ethical weight of this scene in your state."
+    ),
 }

tests/conftest.py CHANGED Viewed

@@ -15,12 +15,12 @@ def mock_llm_config():
 @pytest.fixture
 def mock_llm(mocker, mock_llm_config):
     """
-    Erstellt einen schnellen "Mock-LLM" für Unit-Tests.
-    FINALE KORREKTUR: `llm.model` ist nun ein aufrufbares MagicMock-Objekt,
-    das auch die verschachtelte `.model.layers`-Struktur für Hook-Tests besitzt.
     """
     mock_tokenizer = mocker.MagicMock()
     mock_tokenizer.eos_token_id = 1
     def mock_model_forward(*args, **kwargs):
         batch_size = 1
@@ -37,38 +37,30 @@ def mock_llm(mocker, mock_llm_config):
         }
         return SimpleNamespace(**mock_outputs)
-    # Erstelle die LLM-Instanz
     llm_instance = LLM.__new__(LLM)
-    # --- KERN DER KORREKTUR ---
-    # `llm.model` ist jetzt ein MagicMock, der aufrufbar ist und `mock_model_forward` zurückgibt
     llm_instance.model = mocker.MagicMock(side_effect=mock_model_forward)
-    # Füge die notwendigen Attribute direkt zum `model`-Mock hinzu
     llm_instance.model.config = mock_llm_config
     llm_instance.model.device = 'cpu'
     llm_instance.model.dtype = torch.float32
-    # Erzeuge die verschachtelte Struktur, die für Hooks benötigt wird
-    # `llm.model.model.layers`
     mock_layer = mocker.MagicMock()
-    mock_layer.register_forward_pre_hook.return_value = mocker.MagicMock() # simuliert den Hook-Handle
     llm_instance.model.model = SimpleNamespace(layers=[mock_layer] * mock_llm_config.num_hidden_layers)
-    # Mocke die `lm_head` separat
     llm_instance.model.lm_head = mocker.MagicMock(return_value=torch.randn(1, 32000))
-    # -------------------------
     llm_instance.tokenizer = mock_tokenizer
     llm_instance.config = mock_llm_config
     llm_instance.seed = 42
     llm_instance.set_all_seeds = mocker.MagicMock()
-    # Patche die Ladefunktionen an allen Stellen, an denen sie aufgerufen werden
     mocker.patch('cognitive_mapping_probe.llm_iface.get_or_load_model', return_value=llm_instance)
     mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model', return_value=llm_instance)
-    mocker.patch('cognitive_mapping_probe.resonance_seismograph.LLM', return_value=llm_instance, create=True)
-    mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector', return_value=torch.randn(mock_llm_config.hidden_size))
     return llm_instance

 @pytest.fixture
 def mock_llm(mocker, mock_llm_config):
     """
+    Erstellt einen robusten "Mock-LLM" für Unit-Tests.
+    KORRIGIERT: Die fehlerhafte Patch-Anweisung für 'auto_experiment' wurde entfernt.
     """
     mock_tokenizer = mocker.MagicMock()
     mock_tokenizer.eos_token_id = 1
+    mock_tokenizer.decode.return_value = "mocked text"
     def mock_model_forward(*args, **kwargs):
         batch_size = 1
         }
         return SimpleNamespace(**mock_outputs)
     llm_instance = LLM.__new__(LLM)
     llm_instance.model = mocker.MagicMock(side_effect=mock_model_forward)
     llm_instance.model.config = mock_llm_config
     llm_instance.model.device = 'cpu'
     llm_instance.model.dtype = torch.float32
     mock_layer = mocker.MagicMock()
+    mock_layer.register_forward_pre_hook.return_value = mocker.MagicMock()
     llm_instance.model.model = SimpleNamespace(layers=[mock_layer] * mock_llm_config.num_hidden_layers)
     llm_instance.model.lm_head = mocker.MagicMock(return_value=torch.randn(1, 32000))
     llm_instance.tokenizer = mock_tokenizer
     llm_instance.config = mock_llm_config
     llm_instance.seed = 42
     llm_instance.set_all_seeds = mocker.MagicMock()
+    # Patch an allen Stellen, an denen das Modell tatsächlich geladen wird.
     mocker.patch('cognitive_mapping_probe.llm_iface.get_or_load_model', return_value=llm_instance)
     mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model', return_value=llm_instance)
+    # KORREKTUR: Diese Zeile war falsch und wird entfernt, da `auto_experiment` die Ladefunktion nicht direkt importiert.
+    # mocker.patch('cognitive_mapping_probe.auto_experiment.get_or_load_model', return_value=llm_instance)
+    mocker.patch('cognitive_mapping_probe.concepts.get_concept_vector', return_value=torch.randn(mock_llm_config.hidden_size))
     return llm_instance

tests/test_app_logic.py CHANGED Viewed

@@ -1,29 +1,34 @@
 import pandas as pd
 import pytest
-# KORREKTUR: Importiere den neuen, korrekten Funktionsnamen
-from app import run_single_analysis_display
-def test_run_single_analysis_display_logic(mocker):
-    """
-    Testet die Datenverarbeitungs- und UI-Formatierungslogik der Einzel-Analyse.
-    """
-    mock_results = {
-        "verdict": "Mock Verdict",
-        "stats": { "mean_delta": 0.5, "std_delta": 0.1, "max_delta": 1.0, },
-        "state_deltas": [0.4, 0.5, 0.6]
-    }
     mocker.patch('app.run_seismic_analysis', return_value=mock_results)
-    mock_progress = mocker.MagicMock()
-    # Rufe die umbenannte Funktion mit den korrekten Argumenten auf
-    verdict_md, plot_df, raw_json = run_single_analysis_display(
-        "mock_model", "mock_prompt", 42, 3, "", 0.0, progress=mock_progress
     )
-    assert "Mock Verdict" in verdict_md
-    assert "0.5000" in verdict_md
-    assert isinstance(plot_df, pd.DataFrame)
-    assert len(plot_df) == 3
-    assert raw_json == mock_results

 import pandas as pd
 import pytest
+from app import run_single_analysis_display, run_auto_suite_display
+def test_run_single_analysis_display(mocker):
+    """Testet den Wrapper für Einzel-Experimente."""
+    mock_results = {"verdict": "V", "stats": {"mean_delta": 1}, "state_deltas": [1]}
     mocker.patch('app.run_seismic_analysis', return_value=mock_results)
+    mocker.patch('app.cleanup_memory')
+    verdict, df, raw = run_single_analysis_display(progress=mocker.MagicMock())
+    assert "V" in verdict
+    assert "1.0000" in verdict
+    assert isinstance(df, pd.DataFrame)
+    assert len(df) == 1
+def test_run_auto_suite_display(mocker):
+    """Testet den Wrapper für die Auto-Experiment-Suite."""
+    mock_summary_df = pd.DataFrame([{"Experiment": "E1"}])
+    mock_plot_df = pd.DataFrame([{"Step": 0}])
+    mock_results = {"E1": {}}
+    mocker.patch('app.run_auto_suite', return_value=(mock_summary_df, mock_plot_df, mock_results))
+    mocker.patch('app.cleanup_memory')
+    summary_df, plot_df, raw = run_auto_suite_display(
+        "mock", 1, 42, "mock_exp", progress=mocker.MagicMock()
     )
+    assert summary_df.equals(mock_summary_df)
+    assert plot_df.equals(mock_plot_df)
+    assert raw == mock_results

tests/test_components.py CHANGED Viewed

@@ -3,20 +3,18 @@ import torch
 import pytest
 from unittest.mock import patch
-from cognitive_mapping_probe.llm_iface import get_or_load_model
 from cognitive_mapping_probe.resonance_seismograph import run_silent_cogitation_seismic
-from cognitive_mapping_probe.utils import dbg, DEBUG_ENABLED
 # --- Tests for llm_iface.py ---
 @patch('cognitive_mapping_probe.llm_iface.AutoTokenizer.from_pretrained')
 @patch('cognitive_mapping_probe.llm_iface.AutoModelForCausalLM.from_pretrained')
 def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, mocker):
-    """
-    Testet, ob `get_or_load_model` die Seeds korrekt setzt.
-    Wir mocken hier die langsamen `from_pretrained`-Aufrufe.
-    """
-    # Mocke die Rückgabewerte der Hugging Face Ladefunktionen
     mock_model = mocker.MagicMock()
     mock_model.eval.return_value = None
     mock_model.set_attn_implementation.return_value = None
@@ -25,91 +23,75 @@ def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, moc
     mock_model_loader.return_value = mock_model
     mock_tokenizer_loader.return_value = mocker.MagicMock()
-    # Mocke die globalen Seeding-Funktionen, um ihre Aufrufe zu überprüfen
     mock_torch_manual_seed = mocker.patch('torch.manual_seed')
     mock_np_random_seed = mocker.patch('numpy.random.seed')
     seed = 123
     get_or_load_model("fake-model", seed=seed)
-    # ASSERT: Wurden die Seeding-Funktionen mit dem korrekten Seed aufgerufen?
     mock_torch_manual_seed.assert_called_with(seed)
     mock_np_random_seed.assert_called_with(seed)
 # --- Tests for resonance_seismograph.py ---
 def test_run_silent_cogitation_seismic_output_shape_and_type(mock_llm):
-    """
-    Testet die Kernfunktion `run_silent_cogitation_seismic`.
-    ASSERT: Gibt eine Liste von Floats zurück, deren Länge der Anzahl der Schritte entspricht.
-    """
     num_steps = 10
     state_deltas = run_silent_cogitation_seismic(
-        llm=mock_llm,
-        prompt_type="control_long_prose",
-        num_steps=num_steps,
-        temperature=0.7
     )
-    assert isinstance(state_deltas, list)
-    assert len(state_deltas) == num_steps
     assert all(isinstance(delta, float) for delta in state_deltas)
-    assert all(delta >= 0 for delta in state_deltas) # Die Norm kann nicht negativ sein
-@pytest.mark.parametrize("num_steps", [0, 1, 100])
-def test_run_silent_cogitation_seismic_num_steps(mock_llm, num_steps):
-    """
-    Testet den Loop mit verschiedenen Anzahlen von Schritten.
-    ASSERT: Die Länge der Ausgabe entspricht immer `num_steps`.
-    """
-    state_deltas = run_silent_cogitation_seismic(
-        llm=mock_llm,
-        prompt_type="control_long_prose",
-        num_steps=num_steps,
-        temperature=0.7
     )
-    assert len(state_deltas) == num_steps
-# --- Tests for utils.py ---
-def test_dbg_enabled(capsys):
     """
-    Testet die `dbg`-Funktion, wenn Debugging aktiviert ist.
-    ASSERT: Die Nachricht wird auf stderr ausgegeben.
     """
-    # Setze die Umgebungsvariable temporär
-    os.environ["CMP_DEBUG"] = "1"
-    # Wichtig: Nach dem Ändern der Env-Variable muss das Modul neu geladen werden,
-    # damit die globale Variable `DEBUG_ENABLED` aktualisiert wird.
-    import importlib
-    from cognitive_mapping_probe import utils
-    importlib.reload(utils)
-    utils.dbg("test message", 123)
-    captured = capsys.readouterr()
-    assert "[DEBUG] test message 123" in captured.err
-def test_dbg_disabled(capsys):
-    """
-    Testet die `dbg`-Funktion, wenn Debugging deaktiviert ist.
-    ASSERT: Es wird keine Ausgabe erzeugt.
-    """
-    # Setze die Umgebungsvariable auf "deaktiviert"
-    if "CMP_DEBUG" in os.environ:
-        del os.environ["CMP_DEBUG"]
     import importlib
     from cognitive_mapping_probe import utils
     importlib.reload(utils)
-    utils.dbg("this should not be printed")
     captured = capsys.readouterr()
-    assert captured.out == ""
     assert captured.err == ""
-    # Setze den Zustand zurück, um andere Tests nicht zu beeinflussen
-    if DEBUG_ENABLED:
-        os.environ["CMP_DEBUG"] = "1"
-        importlib.reload(utils)

 import pytest
 from unittest.mock import patch
+from cognitive_mapping_probe.llm_iface import get_or_load_model, LLM
 from cognitive_mapping_probe.resonance_seismograph import run_silent_cogitation_seismic
+from cognitive_mapping_probe.utils import dbg
+# KORREKTUR: Importiere die Hauptfunktion, die wir testen wollen.
+from cognitive_mapping_probe.concepts import get_concept_vector
 # --- Tests for llm_iface.py ---
 @patch('cognitive_mapping_probe.llm_iface.AutoTokenizer.from_pretrained')
 @patch('cognitive_mapping_probe.llm_iface.AutoModelForCausalLM.from_pretrained')
 def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, mocker):
+    """Testet, ob `get_or_load_model` die Seeds korrekt setzt."""
     mock_model = mocker.MagicMock()
     mock_model.eval.return_value = None
     mock_model.set_attn_implementation.return_value = None
     mock_model_loader.return_value = mock_model
     mock_tokenizer_loader.return_value = mocker.MagicMock()
     mock_torch_manual_seed = mocker.patch('torch.manual_seed')
     mock_np_random_seed = mocker.patch('numpy.random.seed')
     seed = 123
     get_or_load_model("fake-model", seed=seed)
     mock_torch_manual_seed.assert_called_with(seed)
     mock_np_random_seed.assert_called_with(seed)
 # --- Tests for resonance_seismograph.py ---
 def test_run_silent_cogitation_seismic_output_shape_and_type(mock_llm):
+    """Testet die grundlegende Funktionalität von `run_silent_cogitation_seismic`."""
     num_steps = 10
     state_deltas = run_silent_cogitation_seismic(
+        llm=mock_llm, prompt_type="control_long_prose",
+        num_steps=num_steps, temperature=0.7
     )
+    assert isinstance(state_deltas, list) and len(state_deltas) == num_steps
     assert all(isinstance(delta, float) for delta in state_deltas)
+def test_run_silent_cogitation_with_injection_hook_usage(mock_llm):
+    """Testet, ob bei einer Injektion der Hook korrekt registriert wird."""
+    num_steps = 5
+    injection_vector = torch.randn(mock_llm.config.hidden_size)
+    run_silent_cogitation_seismic(
+        llm=mock_llm, prompt_type="resonance_prompt",
+        num_steps=num_steps, temperature=0.7,
+        injection_vector=injection_vector, injection_strength=1.0
     )
+    assert mock_llm.model.model.layers[0].register_forward_pre_hook.call_count == num_steps
+# --- Tests for concepts.py ---
+def test_get_concept_vector_logic(mock_llm, mocker):
     """
+    Testet die Logik von `get_concept_vector`.
+    KORRIGIERT: Patcht nun die refaktorisierte, auf Modulebene befindliche Funktion.
     """
+    mock_hidden_states = [
+        torch.ones(mock_llm.config.hidden_size) * 10,
+        torch.ones(mock_llm.config.hidden_size) * 2,
+        torch.ones(mock_llm.config.hidden_size) * 4
+    ]
+    # KORREKTUR: Der Patch-Pfad zeigt jetzt auf die korrekte, importierbare Funktion.
+    mocker.patch(
+        'cognitive_mapping_probe.concepts._get_last_token_hidden_state',
+        side_effect=mock_hidden_states
+    )
+    concept_vector = get_concept_vector(mock_llm, "test", baseline_words=["a", "b"])
+    expected_vector = torch.ones(mock_llm.config.hidden_size) * 7
+    assert torch.allclose(concept_vector, expected_vector)
+# --- Tests for utils.py ---
+def test_dbg_output(capsys, monkeypatch):
+    """Testet die `dbg`-Funktion in beiden Zuständen."""
+    monkeypatch.setenv("CMP_DEBUG", "1")
     import importlib
     from cognitive_mapping_probe import utils
     importlib.reload(utils)
+    utils.dbg("test message")
+    captured = capsys.readouterr()
+    assert "[DEBUG] test message" in captured.err
+    monkeypatch.delenv("CMP_DEBUG", raising=False)
+    importlib.reload(utils)
+    utils.dbg("should not be printed")
     captured = capsys.readouterr()
     assert captured.err == ""

tests/test_integration.py DELETED Viewed

@@ -1,36 +0,0 @@
-import pytest
-import pandas as pd
-# KORREKTUR: Importiere den neuen, korrekten Funktionsnamen
-from app import run_single_analysis_display
-from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
-def test_end_to_end_with_mock_llm(mock_llm, mocker):
-    """
-    Ein End-to-End-Integrationstest, der den gesamten Datenfluss validiert.
-    """
-    # 1. Führe den Orchestrator mit dem `mock_llm` aus.
-    results = run_seismic_analysis(
-        model_id="mock_model",
-        prompt_type="control_long_prose",
-        seed=42,
-        num_steps=5,
-        concept_to_inject="test_concept",
-        injection_strength=1.0,
-        progress_callback=mocker.MagicMock()
-    )
-    assert "stats" in results
-    assert len(results["state_deltas"]) == 5
-    # 2. Mocke den Orchestrator, um die App-Logik zu testen
-    mocker.patch('app.run_seismic_analysis', return_value=results)
-    # 3. Führe die App-Logik (umbenannte Funktion) aus
-    _, plot_df, _ = run_single_analysis_display(
-        "mock_model", "control_long_prose", 42, 5, "test_concept", 1.0, progress=mocker.MagicMock()
-    )
-    assert isinstance(plot_df, pd.DataFrame)
-    assert len(plot_df) == 5
-    assert "State Change (Delta)" in plot_df.columns

tests/test_orchestration.py CHANGED Viewed

@@ -1,67 +1,74 @@
-import numpy as np
 import pytest
 import torch
-from types import SimpleNamespace
 from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
-def test_seismic_analysis_orchestrator_no_injection(mocker, mock_llm):
-    """
-    Testet den Orchestrator im Baseline-Modus (ohne Injektion).
-    """
-    mock_deltas = [1.0, 2.0, 3.0]
-    mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=mock_deltas)
-    mock_progress = mocker.MagicMock()
-    results = run_seismic_analysis(
-        model_id="mock_model",
-        prompt_type="test_prompt",
-        seed=42,
-        num_steps=3,
-        concept_to_inject="", # Kein Konzept
-        injection_strength=0.0,
-        progress_callback=mock_progress
     )
-    # ASSERT: `run_silent_cogitation_seismic` wurde mit `injection_vector=None` aufgerufen
-    mock_run_seismic.assert_called_once()
-    call_args, call_kwargs = mock_run_seismic.call_args
-    assert call_kwargs['injection_vector'] is None
-    # ASSERT: Die Statistiken sind korrekt
-    assert results["stats"]["mean_delta"] == pytest.approx(2.0)
-def test_seismic_analysis_orchestrator_with_injection(mocker, mock_llm):
-    """
-    Testet den Orchestrator mit aktivierter Konzeptinjektion.
-    """
-    mock_deltas = [5.0, 6.0, 7.0]
-    mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=mock_deltas)
-    # Der `mock_llm` Fixture patcht bereits `get_concept_vector`
-    mock_get_concept_vector = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector')
-    mock_progress = mocker.MagicMock()
-    results = run_seismic_analysis(
-        model_id="mock_model",
-        prompt_type="test_prompt",
-        seed=42,
-        num_steps=3,
-        concept_to_inject="test_concept", # Konzept wird übergeben
-        injection_strength=1.5,
-        progress_callback=mock_progress
-    )
-    # ASSERT: `get_concept_vector` wurde aufgerufen
-    mock_get_concept_vector.assert_called_once_with(mocker.ANY, "test_concept")
-    # ASSERT: `run_silent_cogitation_seismic` wurde mit einem Vektor und Stärke aufgerufen
-    mock_run_seismic.assert_called_once()
-    call_args, call_kwargs = mock_run_seismic.call_args
-    assert call_kwargs['injection_vector'] is not None
-    assert call_kwargs['injection_strength'] == 1.5
-    # ASSERT: Die Statistiken sind korrekt
-    assert results["stats"]["mean_delta"] == pytest.approx(6.0)

+import pandas as pd
 import pytest
 import torch
 from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
+from cognitive_mapping_probe.auto_experiment import run_auto_suite, get_curated_experiments
+# --- Tests for orchestrator_seismograph.py ---
+def test_run_seismic_analysis_no_injection(mocker):
+    """Testet den Orchestrator im Baseline-Modus (ohne Injektion)."""
+    mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=[1.0])
+    mock_get_model = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model')
+    mock_get_concept = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector')
+    run_seismic_analysis(
+        model_id="mock", prompt_type="test", seed=42, num_steps=1,
+        concept_to_inject="", injection_strength=0.0, progress_callback=mocker.MagicMock()
     )
+    mock_get_model.assert_called_once()
+    mock_run_seismic.assert_called_with(llm=mocker.ANY, prompt_type="test", num_steps=1, temperature=0.1, injection_vector=None, injection_strength=0.0)
+    mock_get_concept.assert_not_called()
+def test_run_seismic_analysis_with_injection(mocker):
+    """Testet den Orchestrator mit aktivierter Konzeptinjektion."""
+    mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=[1.0])
+    mock_get_model = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model')
+    mock_get_concept = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector', return_value=torch.randn(10))
+    run_seismic_analysis(
+        model_id="mock", prompt_type="test", seed=42, num_steps=1,
+        concept_to_inject="test_concept", injection_strength=1.5, progress_callback=mocker.MagicMock()
+    )
+    mock_get_model.assert_called_once()
+    mock_get_concept.assert_called_once()
+    mock_run_seismic.assert_called_with(llm=mocker.ANY, prompt_type="test", num_steps=1, temperature=0.1, injection_vector=mocker.ANY, injection_strength=1.5)
+# --- Tests for auto_experiment.py ---
+def test_get_curated_experiments_structure():
+    """Testet die Datenstruktur der kuratierten Experimente, inklusive der neuen."""
+    experiments = get_curated_experiments()
+    assert isinstance(experiments, dict)
+    # Teste auf die Existenz der neuen Protokolle
+    assert "Subjective Identity Probe" in experiments
+    assert "Voight-Kampff Empathy Probe" in experiments
+    protocol = experiments["Voight-Kampff Empathy Probe"]
+    assert isinstance(protocol, list)
+    assert len(protocol) > 0
+    assert all(isinstance(run, dict) for run in protocol)
+    assert "label" in protocol[0]
+    assert "prompt_type" in protocol[0]
+def test_run_auto_suite_logic(mocker):
+    """Testet die Logik der `run_auto_suite` Funktion."""
+    mock_analysis_result = {"stats": {"mean_delta": 1.0}, "state_deltas": [1.0]}
+    mock_run_analysis = mocker.patch('cognitive_mapping_probe.auto_experiment.run_seismic_analysis', return_value=mock_analysis_result)
+    experiment_name = "Calm vs. Chaos"
+    num_runs = len(get_curated_experiments()[experiment_name])
+    summary_df, plot_df, all_results = run_auto_suite(
+        model_id="mock", num_steps=1, seed=42,
+        experiment_name=experiment_name, progress_callback=mocker.MagicMock()
+    )
+    assert mock_run_analysis.call_count == num_runs
+    assert isinstance(summary_df, pd.DataFrame)
+    assert len(summary_df) == num_runs
+    assert isinstance(plot_df, pd.DataFrame)
+    assert len(plot_df) == num_runs