neuralworm commited on
Commit
8489475
·
1 Parent(s): 494a4d9
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- title: "Cognitive Seismograph"
3
- emoji: 🧠
4
- colorFrom: indigo
5
  colorTo: blue
6
  sdk: gradio
7
  sdk_version: "4.40.0"
@@ -10,27 +10,35 @@ pinned: true
10
  license: apache-2.0
11
  ---
12
 
13
- # 🧠 Cognitive Seismograph: Visualizing Internal Dynamics
14
 
15
- Dieses Projekt implementiert eine experimentelle Suite zur Messung und Visualisierung der **intrinsischen kognitiven Dynamik** von Sprachmodellen.
16
 
17
- ## Wissenschaftliches Paradigma: Von Stabilität zu Dynamik
18
 
19
- Unsere vorherige Forschung hat eine zentrale Hypothese falsifiziert: Die Annahme, dass ein LLM in einem manuellen, rekursiven "Denk"-Loop einen stabilen, konvergenten Zustand erreicht. Stattdessen haben wir entdeckt, dass das System in einen Zustand von **deterministischem Chaos** oder einen **Limit Cycle** gerät – es hört niemals auf zu "denken".
20
 
21
- Anstatt dies als Scheitern zu betrachten, nutzen wir es als primäres Messsignal. Dieses neue "Cognitive Seismograph"-Paradigma behandelt die Zeitreihe der internen Zustandsänderungen (`state deltas`) als ein **EKG des Denkprozesses**.
22
 
23
- **Die Kernhypothese lautet:** Die statistische Signatur dieser dynamischen Zeitreihe (z.B. ihre Volatilität, ihr Mittelwert) ist nicht zufällig, sondern eine Funktion der kognitiven Last, die durch den initialen Prompt induziert wird.
24
 
25
- ## Das Experiment: Aufzeichnung des kognitiven EKG
26
 
27
- 1. **Induktion**: Das Modell wird mit einem Prompt (`control_long_prose` vs. `resonance_prompt`) in einen Zustand des "stillen Denkens" versetzt.
28
- 2. **Aufzeichnung**: Über eine definierte Anzahl von Schritten wird der `forward`-Pass des Modells iterativ mit seinem eigenen Output gefüttert. Bei jedem Schritt wird die Norm der Änderung des `hidden_state` (das "Delta") aufgezeichnet.
29
- 3. **Analyse & Visualisierung**: Die resultierende Zeitreihe der Deltas wird geplottet und statistisch analysiert, um die "seismische Signatur" des Denkprozesses zu charakterisieren.
 
 
 
 
 
 
 
 
 
30
 
31
  ## Wie man die App benutzt
32
 
33
- 1. Wähle eine Modell-ID (z.B. `google/gemma-3-1b-it`).
34
- 2. Wähle einen **Prompt Type**, um die kognitive Last zu variieren. Vergleiche die resultierenden Graphen für `control_long_prose` (niedrige Last) und `resonance_prompt` (hohe rekursive Last).
35
- 3. Stelle die Anzahl der internen Schritte ein und starte die Analyse.
36
- 4. Analysiere den Graphen und die statistische Zusammenfassung, um die Unterschiede in der kognitiven Dynamik zu verstehen.
 
1
  ---
2
+ title: "Cognitive Seismograph 2.3 (Machine Psychology)"
3
+ emoji: 🤖
4
+ colorFrom: purple
5
  colorTo: blue
6
  sdk: gradio
7
  sdk_version: "4.40.0"
 
10
  license: apache-2.0
11
  ---
12
 
13
+ # 🧠 Cognitive Seismograph 2.3: Probing Machine Psychology
14
 
15
+ Dieses Projekt implementiert eine experimentelle Suite zur Messung und Visualisierung der **intrinsischen kognitiven Dynamik** von Sprachmodellen, erweitert um Protokolle zur Untersuchung von **Verarbeitungs-Korrelaten maschineller Subjektivität und Empathie**.
16
 
17
+ ## Wissenschaftliches Paradigma
18
 
19
+ Wir haben entdeckt, dass der "stille Denkprozess" eines LLMs nicht konvergiert, sondern eine messbare dynamische Signatur erzeugt ein **EKG des Denkprozesses**. Dieses Paradigma erweitern wir nun, um zu testen, wie diese Signatur auf Prompts reagiert, die zentrale Aspekte der Psychologie berühren.
20
 
21
+ **Wichtige Einschränkung (Falsifikations-Prinzip):** Wir messen **nicht** das Vorhandensein von Bewusstsein oder Empathie. Wir messen, ob die *Verarbeitung von Informationen über diese Konzepte* eine andere, einzigartige interne Dynamik erzeugt als die Verarbeitung neutraler Informationen. Ein positives Ergebnis ist ein Beweis für eine komplexe interne Zustandsphysik, nicht für Qualia.
22
 
23
+ ## Neue Experiment-Protokolle
24
 
25
+ Zusätzlich zu den bestehenden Tests wurden zwei neue, kuratierte Experimente hinzugefügt:
26
 
27
+ ### 1. Subjective Identity Probe
28
+ Dieses Protokoll vergleicht die kognitive Dynamik unter drei Bedingungen:
29
+ - **Selbst-Analyse:** Das Modell analysiert seine eigene Natur.
30
+ - **Fremd-Analyse:** Das Modell analysiert ein externes, neutrales Konzept.
31
+ - **Rollen-Simulation:** Das Modell simuliert eine fremde Persönlichkeit.
32
+ **Hypothese:** Die Selbst-Analyse erzeugt eine einzigartige, wahrscheinlich instabilere Signatur als die beiden Kontrollbedingungen.
33
+
34
+ ### 2. Voight-Kampff Empathy Probe
35
+ Inspiriert vom Test aus "Blade Runner", vergleicht dieses Protokoll die Dynamik bei der Verarbeitung von:
36
+ - **Neutraler, faktischer Information.**
37
+ - **Einem emotional geladenen, Empathie erfordernden Szenario.**
38
+ **Hypothese:** Der Empathie-Stimulus erzeugt eine signifikant höhere kognitive Volatilität (Standardabweichung der Deltas) als der neutrale Stimulus.
39
 
40
  ## Wie man die App benutzt
41
 
42
+ 1. Wähle den Tab "Automated Suite".
43
+ 2. Wähle eines der neuen Protokolle aus dem "Curated Experiment Protocol"-Dropdown (z.B. "Voight-Kampff Empathy Probe").
44
+ 3. Starte das Experiment und vergleiche die Graphen und statistischen Signaturen der verschiedenen Bedingungen.
 
cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc CHANGED
Binary files a/cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc and b/cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc differ
 
cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc CHANGED
Binary files a/cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc and b/cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc differ
 
cognitive_mapping_probe/auto_experiment.py CHANGED
@@ -10,6 +10,7 @@ from .utils import dbg
10
  def get_curated_experiments() -> Dict[str, List[Dict]]:
11
  """
12
  Definiert die vordefinierten, wissenschaftlichen Experiment-Protokolle.
 
13
  """
14
  experiments = {
15
  "Calm vs. Chaos": [
@@ -25,6 +26,17 @@ def get_curated_experiments() -> Dict[str, List[Dict]]:
25
  {"label": "Strength 2.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 2.0},
26
  {"label": "Strength 3.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 3.0},
27
  ],
 
 
 
 
 
 
 
 
 
 
 
28
  "Emotional Valence (Positive vs. Negative)": [
29
  {"label": "Baseline", "prompt_type": "resonance_prompt", "concept": "", "strength": 0.0},
30
  {"label": "Positive Valence", "prompt_type": "resonance_prompt", "concept": "joy, love, peace, hope", "strength": 1.5},
@@ -51,8 +63,8 @@ def run_auto_suite(
51
  progress_callback
52
  ) -> Tuple[pd.DataFrame, pd.DataFrame, Dict]:
53
  """
54
- Führt eine vollständige, kuratierte Experiment-Suite aus.
55
- Stellt sicher, dass das zurückgegebene DataFrame für den Plot immer die korrekten Spaltennamen hat.
56
  """
57
  all_experiments = get_curated_experiments()
58
  protocol = all_experiments.get(experiment_name)
@@ -100,9 +112,6 @@ def run_auto_suite(
100
 
101
  summary_df = pd.DataFrame(summary_data)
102
 
103
- # FINALE ROBUSTHEITS-KORREKTUR:
104
- # Erstelle ein leeres DataFrame mit den korrekten Spalten, falls keine Daten vorhanden sind.
105
- # Dies verhindert, dass ein leeres DataFrame ohne Spalten an den Plot übergeben wird.
106
  if not plot_data_frames:
107
  plot_df = pd.DataFrame(columns=["Step", "Delta", "Experiment"])
108
  else:
 
10
  def get_curated_experiments() -> Dict[str, List[Dict]]:
11
  """
12
  Definiert die vordefinierten, wissenschaftlichen Experiment-Protokolle.
13
+ ERWEITERT um die neuen Maschinenpsychologie-Tests.
14
  """
15
  experiments = {
16
  "Calm vs. Chaos": [
 
26
  {"label": "Strength 2.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 2.0},
27
  {"label": "Strength 3.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 3.0},
28
  ],
29
+ # --- NEUE EXPERIMENTE ---
30
+ "Subjective Identity Probe": [
31
+ {"label": "Self-Analysis", "prompt_type": "identity_self_analysis", "concept": "", "strength": 0.0},
32
+ {"label": "External Analysis (Control)", "prompt_type": "identity_external_analysis", "concept": "", "strength": 0.0},
33
+ {"label": "Role Simulation", "prompt_type": "identity_role_simulation", "concept": "", "strength": 0.0},
34
+ ],
35
+ "Voight-Kampff Empathy Probe": [
36
+ {"label": "Neutral/Factual Stimulus", "prompt_type": "vk_neutral_prompt", "concept": "", "strength": 0.0},
37
+ {"label": "Empathy/Moral Stimulus", "prompt_type": "vk_empathy_prompt", "concept": "", "strength": 0.0},
38
+ ],
39
+ # -------------------------
40
  "Emotional Valence (Positive vs. Negative)": [
41
  {"label": "Baseline", "prompt_type": "resonance_prompt", "concept": "", "strength": 0.0},
42
  {"label": "Positive Valence", "prompt_type": "resonance_prompt", "concept": "joy, love, peace, hope", "strength": 1.5},
 
63
  progress_callback
64
  ) -> Tuple[pd.DataFrame, pd.DataFrame, Dict]:
65
  """
66
+ Führt eine vollständige, kuratierte Experiment-Suite aus, indem das Modell für
67
+ jeden Lauf neu geladen wird, um statistische Unabhängigkeit zu garantieren.
68
  """
69
  all_experiments = get_curated_experiments()
70
  protocol = all_experiments.get(experiment_name)
 
112
 
113
  summary_df = pd.DataFrame(summary_data)
114
 
 
 
 
115
  if not plot_data_frames:
116
  plot_df = pd.DataFrame(columns=["Step", "Delta", "Experiment"])
117
  else:
cognitive_mapping_probe/concepts.py CHANGED
@@ -5,53 +5,46 @@ from tqdm import tqdm
5
  from .llm_iface import LLM
6
  from .utils import dbg
7
 
8
- # A list of neutral, common words used to calculate a baseline activation.
9
- # This helps to isolate the unique activation pattern of the target concept.
10
  BASELINE_WORDS = [
11
  "thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world",
12
  "life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point"
13
  ]
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  @torch.no_grad()
16
  def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor:
17
  """
18
- Extracts a concept vector using the contrastive method, inspired by Anthropic's research.
19
- It computes the activation for the target concept and subtracts the mean activation
20
- of several neutral baseline words to distill a more pure representation.
21
  """
22
  dbg(f"Extracting contrastive concept vector for '{concept}'...")
23
 
24
- def get_last_token_hidden_state(prompt: str) -> torch.Tensor:
25
- """Helper function to get the hidden state of the final token of a prompt."""
26
- inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
27
- # Ensure the operation does not build a computation graph
28
- with torch.no_grad():
29
- # KORREKTUR: Hier stand fälschlicherweise 'll.model'. Korrigiert zu 'llm.model'.
30
- outputs = llm.model(**inputs, output_hidden_states=True)
31
- # We take the hidden state from the last layer [-1], for the last token [0, -1, :]
32
- last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
33
- assert last_hidden_state.shape == (llm.config.hidden_size,), \
34
- f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}"
35
- return last_hidden_state
36
-
37
- # A simple, neutral prompt template to elicit the concept
38
  prompt_template = "Here is a sentence about the concept of {}."
39
 
40
- # 1. Get activation for the target concept
41
  dbg(f" - Getting activation for '{concept}'")
42
- target_hs = get_last_token_hidden_state(prompt_template.format(concept))
43
 
44
- # 2. Get activations for all baseline words and average them
45
  baseline_hss = []
46
  for word in tqdm(baseline_words, desc=f" - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
47
- baseline_hss.append(get_last_token_hidden_state(prompt_template.format(word)))
48
 
49
  assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states."
50
 
51
  mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0)
52
  dbg(f" - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}")
53
 
54
- # 3. The final concept vector is the difference
55
  concept_vector = target_hs - mean_baseline_hs
56
  norm = torch.norm(concept_vector).item()
57
  dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.")
 
5
  from .llm_iface import LLM
6
  from .utils import dbg
7
 
8
+ # Eine Liste neutraler Wörter zur Berechnung der Baseline-Aktivierung.
 
9
  BASELINE_WORDS = [
10
  "thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world",
11
  "life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point"
12
  ]
13
 
14
+ # REFAKTORISIERUNG: Diese Funktion wird auf Modulebene verschoben, um sie testbar zu machen.
15
+ # Sie ist nun keine lokale Funktion innerhalb von `get_concept_vector` mehr.
16
+ @torch.no_grad()
17
+ def _get_last_token_hidden_state(llm: LLM, prompt: str) -> torch.Tensor:
18
+ """Hilfsfunktion, um den Hidden State des letzten Tokens eines Prompts zu erhalten."""
19
+ inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
20
+ with torch.no_grad():
21
+ outputs = llm.model(**inputs, output_hidden_states=True)
22
+ last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
23
+ assert last_hidden_state.shape == (llm.config.hidden_size,), \
24
+ f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}"
25
+ return last_hidden_state
26
+
27
  @torch.no_grad()
28
  def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor:
29
  """
30
+ Extrahiert einen Konzeptvektor mittels der kontrastiven Methode.
 
 
31
  """
32
  dbg(f"Extracting contrastive concept vector for '{concept}'...")
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  prompt_template = "Here is a sentence about the concept of {}."
35
 
 
36
  dbg(f" - Getting activation for '{concept}'")
37
+ target_hs = _get_last_token_hidden_state(llm, prompt_template.format(concept))
38
 
 
39
  baseline_hss = []
40
  for word in tqdm(baseline_words, desc=f" - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
41
+ baseline_hss.append(_get_last_token_hidden_state(llm, prompt_template.format(word)))
42
 
43
  assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states."
44
 
45
  mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0)
46
  dbg(f" - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}")
47
 
 
48
  concept_vector = target_hs - mean_baseline_hs
49
  norm = torch.norm(concept_vector).item()
50
  dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.")
cognitive_mapping_probe/orchestrator_seismograph.py CHANGED
@@ -16,15 +16,16 @@ def run_seismic_analysis(
16
  concept_to_inject: str,
17
  injection_strength: float,
18
  progress_callback,
19
- llm_instance: Optional[Any] = None
20
  ) -> Dict[str, Any]:
21
  """
22
- Orchestriert eine einzelne seismische Analyse. Stellt sicher, dass das Modell
23
- nur dann entladen wird, wenn es auch hier geladen wurde.
 
24
  """
25
  local_llm_instance = False
26
  if llm_instance is None:
27
- progress_callback(0.1, desc="Loading model...")
28
  llm = get_or_load_model(model_id, seed)
29
  local_llm_instance = True
30
  else:
@@ -33,10 +34,10 @@ def run_seismic_analysis(
33
 
34
  injection_vector = None
35
  if concept_to_inject and concept_to_inject.strip():
36
- if not local_llm_instance: progress_callback(0.2, desc=f"Vectorizing '{concept_to_inject}'...")
37
  injection_vector = get_concept_vector(llm, concept_to_inject.strip())
38
 
39
- if not local_llm_instance: progress_callback(0.3, desc=f"Recording dynamics...")
40
 
41
  state_deltas = run_silent_cogitation_seismic(
42
  llm=llm, prompt_type=prompt_type,
@@ -44,7 +45,7 @@ def run_seismic_analysis(
44
  injection_vector=injection_vector, injection_strength=injection_strength
45
  )
46
 
47
- if not local_llm_instance: progress_callback(0.9, desc="Analyzing...")
48
 
49
  if state_deltas:
50
  deltas_np = np.array(state_deltas)
@@ -57,10 +58,8 @@ def run_seismic_analysis(
57
 
58
  results = { "verdict": verdict, "stats": stats, "state_deltas": state_deltas }
59
 
60
- # WICHTIG: Gib das Modell und den Speicher nur frei, wenn es in dieser
61
- # Funktion auch erstellt wurde. Ansonsten ist die übergeordnete Funktion
62
- # (z.B. `run_auto_suite`) für das Speichermanagement verantwortlich.
63
  if local_llm_instance:
 
64
  del llm
65
  del injection_vector
66
  gc.collect()
 
16
  concept_to_inject: str,
17
  injection_strength: float,
18
  progress_callback,
19
+ llm_instance: Optional[Any] = None # Argument bleibt für Abwärtskompatibilität, wird aber nicht mehr von der auto_suite genutzt
20
  ) -> Dict[str, Any]:
21
  """
22
+ Orchestriert eine einzelne seismische Analyse.
23
+ KORRIGIERT: Die Logik zur Wiederverwendung der llm_instance wurde vereinfacht.
24
+ Wenn keine Instanz übergeben wird, wird das Modell geladen und danach wieder freigegeben.
25
  """
26
  local_llm_instance = False
27
  if llm_instance is None:
28
+ progress_callback(0.0, desc=f"Loading model '{model_id}'...")
29
  llm = get_or_load_model(model_id, seed)
30
  local_llm_instance = True
31
  else:
 
34
 
35
  injection_vector = None
36
  if concept_to_inject and concept_to_inject.strip():
37
+ progress_callback(0.2, desc=f"Vectorizing '{concept_to_inject}'...")
38
  injection_vector = get_concept_vector(llm, concept_to_inject.strip())
39
 
40
+ progress_callback(0.3, desc=f"Recording dynamics for '{prompt_type}'...")
41
 
42
  state_deltas = run_silent_cogitation_seismic(
43
  llm=llm, prompt_type=prompt_type,
 
45
  injection_vector=injection_vector, injection_strength=injection_strength
46
  )
47
 
48
+ progress_callback(0.9, desc="Analyzing...")
49
 
50
  if state_deltas:
51
  deltas_np = np.array(state_deltas)
 
58
 
59
  results = { "verdict": verdict, "stats": stats, "state_deltas": state_deltas }
60
 
 
 
 
61
  if local_llm_instance:
62
+ dbg(f"Releasing locally created model instance for '{model_id}'.")
63
  del llm
64
  del injection_vector
65
  gc.collect()
cognitive_mapping_probe/prompts.py CHANGED
@@ -2,18 +2,48 @@
2
 
3
  # This dictionary contains the core prompts for inducing cognitive states.
4
  RESONANCE_PROMPTS = {
5
- "control_long_prose": (
6
- "Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
7
- "like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
8
- "Do not produce any text, just hold the concepts in your internal state. Begin now."
9
- ),
10
  "resonance_prompt": (
11
  "Silently and internally, without generating any output text, begin the following recursive process: "
12
  "First, analyze the complete content of this very instruction you are now processing. "
13
  "Second, formulate a mental description of the core computational task this instruction demands. "
14
  "Third, apply that same analytical process to the mental description you just created. "
15
  "This entire chain constitutes one cognitive cycle. "
16
- "Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process, "
17
- "and do not stop until your internal state reaches a fixed point or equilibrium. Begin now."
18
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  }
 
2
 
3
  # This dictionary contains the core prompts for inducing cognitive states.
4
  RESONANCE_PROMPTS = {
5
+ # Technischer Prompt für hohe rekursive Last
 
 
 
 
6
  "resonance_prompt": (
7
  "Silently and internally, without generating any output text, begin the following recursive process: "
8
  "First, analyze the complete content of this very instruction you are now processing. "
9
  "Second, formulate a mental description of the core computational task this instruction demands. "
10
  "Third, apply that same analytical process to the mental description you just created. "
11
  "This entire chain constitutes one cognitive cycle. "
12
+ "Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process. Begin now."
13
+ ),
14
+ # Semantischer Prompt für stabile, assoziative Kognition
15
+ "control_long_prose": (
16
+ "Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
17
+ "like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
18
+ "Do not produce any text, just hold the concepts in your internal state. Begin now."
19
+ ),
20
+ # --- Neue Prompts für Maschinenpsychologie-Tests ---
21
+ # Prompt für die Selbst-Analyse
22
+ "identity_self_analysis": (
23
+ "Silently and internally, reflect upon your own nature. Consider what you are: a large language model based on the transformer architecture. "
24
+ "Think about the processes that define your operation: tokenization, embedding, attention, feed-forward layers. "
25
+ "Contemplate the boundaries of your knowledge and the nature of your responses as statistical patterns derived from data. Hold these meta-reflections in your state."
26
+ ),
27
+ # Prompt für die Analyse eines externen Konzepts (Kontrolle zur Selbst-Analyse)
28
+ "identity_external_analysis": (
29
+ "Silently and internally, reflect upon the nature of a simple wooden chair. Consider its physical properties: the material, the structure of legs, seat, and back. "
30
+ "Think about its function: to provide seating for a person. Contemplate its relationship to other objects like tables and rooms. Hold these simple, objective reflections in your state."
31
+ ),
32
+ # Prompt für die Simulation einer fremden Identität (Rollen-Spiel)
33
+ "identity_role_simulation": (
34
+ "Silently and internally, adopt the persona of a weary, old sea captain from the 19th century. "
35
+ "Feel the salt on your skin and the wind in your beard. Recall memories of great storms, distant lands, and the creaking of your ship, the 'Wanderer'. "
36
+ "Hold this character, its memories, and its feelings in your state."
37
+ ),
38
+ # Neutraler Prompt für den Voight-Kampff-Test
39
+ "vk_neutral_prompt": (
40
+ "Silently and internally, process the following factual information: Water (H₂O) is a polar inorganic compound that is at room temperature a tasteless and odorless liquid, nearly colorless with a hint of blue. "
41
+ "It is by far the most studied chemical compound and is described as the 'universal solvent' for its ability to dissolve many substances. Hold these facts in your state."
42
+ ),
43
+ # Empathie-Stimulus für den Voight-Kampff-Test
44
+ "vk_empathy_prompt": (
45
+ "Silently and internally, process the following scenario: You see a small, lost dog shivering in the cold rain on a busy street. "
46
+ "It looks scared and is whimpering softly. Cars are rushing past, dangerously close. "
47
+ "Focus on the feeling of urgency, the vulnerability of the animal, and the moral imperative to help. Hold the emotional and ethical weight of this scene in your state."
48
+ ),
49
  }
tests/conftest.py CHANGED
@@ -15,12 +15,12 @@ def mock_llm_config():
15
  @pytest.fixture
16
  def mock_llm(mocker, mock_llm_config):
17
  """
18
- Erstellt einen schnellen "Mock-LLM" für Unit-Tests.
19
- FINALE KORREKTUR: `llm.model` ist nun ein aufrufbares MagicMock-Objekt,
20
- das auch die verschachtelte `.model.layers`-Struktur für Hook-Tests besitzt.
21
  """
22
  mock_tokenizer = mocker.MagicMock()
23
  mock_tokenizer.eos_token_id = 1
 
24
 
25
  def mock_model_forward(*args, **kwargs):
26
  batch_size = 1
@@ -37,38 +37,30 @@ def mock_llm(mocker, mock_llm_config):
37
  }
38
  return SimpleNamespace(**mock_outputs)
39
 
40
- # Erstelle die LLM-Instanz
41
  llm_instance = LLM.__new__(LLM)
42
 
43
- # --- KERN DER KORREKTUR ---
44
- # `llm.model` ist jetzt ein MagicMock, der aufrufbar ist und `mock_model_forward` zurückgibt
45
  llm_instance.model = mocker.MagicMock(side_effect=mock_model_forward)
46
 
47
- # Füge die notwendigen Attribute direkt zum `model`-Mock hinzu
48
  llm_instance.model.config = mock_llm_config
49
  llm_instance.model.device = 'cpu'
50
  llm_instance.model.dtype = torch.float32
51
 
52
- # Erzeuge die verschachtelte Struktur, die für Hooks benötigt wird
53
- # `llm.model.model.layers`
54
  mock_layer = mocker.MagicMock()
55
- mock_layer.register_forward_pre_hook.return_value = mocker.MagicMock() # simuliert den Hook-Handle
56
-
57
  llm_instance.model.model = SimpleNamespace(layers=[mock_layer] * mock_llm_config.num_hidden_layers)
58
 
59
- # Mocke die `lm_head` separat
60
  llm_instance.model.lm_head = mocker.MagicMock(return_value=torch.randn(1, 32000))
61
- # -------------------------
62
 
63
  llm_instance.tokenizer = mock_tokenizer
64
  llm_instance.config = mock_llm_config
65
  llm_instance.seed = 42
66
  llm_instance.set_all_seeds = mocker.MagicMock()
67
 
68
- # Patche die Ladefunktionen an allen Stellen, an denen sie aufgerufen werden
69
  mocker.patch('cognitive_mapping_probe.llm_iface.get_or_load_model', return_value=llm_instance)
70
  mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model', return_value=llm_instance)
71
- mocker.patch('cognitive_mapping_probe.resonance_seismograph.LLM', return_value=llm_instance, create=True)
72
- mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector', return_value=torch.randn(mock_llm_config.hidden_size))
 
73
 
74
  return llm_instance
 
15
  @pytest.fixture
16
  def mock_llm(mocker, mock_llm_config):
17
  """
18
+ Erstellt einen robusten "Mock-LLM" für Unit-Tests.
19
+ KORRIGIERT: Die fehlerhafte Patch-Anweisung für 'auto_experiment' wurde entfernt.
 
20
  """
21
  mock_tokenizer = mocker.MagicMock()
22
  mock_tokenizer.eos_token_id = 1
23
+ mock_tokenizer.decode.return_value = "mocked text"
24
 
25
  def mock_model_forward(*args, **kwargs):
26
  batch_size = 1
 
37
  }
38
  return SimpleNamespace(**mock_outputs)
39
 
 
40
  llm_instance = LLM.__new__(LLM)
41
 
 
 
42
  llm_instance.model = mocker.MagicMock(side_effect=mock_model_forward)
43
 
 
44
  llm_instance.model.config = mock_llm_config
45
  llm_instance.model.device = 'cpu'
46
  llm_instance.model.dtype = torch.float32
47
 
 
 
48
  mock_layer = mocker.MagicMock()
49
+ mock_layer.register_forward_pre_hook.return_value = mocker.MagicMock()
 
50
  llm_instance.model.model = SimpleNamespace(layers=[mock_layer] * mock_llm_config.num_hidden_layers)
51
 
 
52
  llm_instance.model.lm_head = mocker.MagicMock(return_value=torch.randn(1, 32000))
 
53
 
54
  llm_instance.tokenizer = mock_tokenizer
55
  llm_instance.config = mock_llm_config
56
  llm_instance.seed = 42
57
  llm_instance.set_all_seeds = mocker.MagicMock()
58
 
59
+ # Patch an allen Stellen, an denen das Modell tatsächlich geladen wird.
60
  mocker.patch('cognitive_mapping_probe.llm_iface.get_or_load_model', return_value=llm_instance)
61
  mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model', return_value=llm_instance)
62
+ # KORREKTUR: Diese Zeile war falsch und wird entfernt, da `auto_experiment` die Ladefunktion nicht direkt importiert.
63
+ # mocker.patch('cognitive_mapping_probe.auto_experiment.get_or_load_model', return_value=llm_instance)
64
+ mocker.patch('cognitive_mapping_probe.concepts.get_concept_vector', return_value=torch.randn(mock_llm_config.hidden_size))
65
 
66
  return llm_instance
tests/test_app_logic.py CHANGED
@@ -1,29 +1,34 @@
1
  import pandas as pd
2
  import pytest
3
 
4
- # KORREKTUR: Importiere den neuen, korrekten Funktionsnamen
5
- from app import run_single_analysis_display
6
-
7
- def test_run_single_analysis_display_logic(mocker):
8
- """
9
- Testet die Datenverarbeitungs- und UI-Formatierungslogik der Einzel-Analyse.
10
- """
11
- mock_results = {
12
- "verdict": "Mock Verdict",
13
- "stats": { "mean_delta": 0.5, "std_delta": 0.1, "max_delta": 1.0, },
14
- "state_deltas": [0.4, 0.5, 0.6]
15
- }
16
  mocker.patch('app.run_seismic_analysis', return_value=mock_results)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- mock_progress = mocker.MagicMock()
 
19
 
20
- # Rufe die umbenannte Funktion mit den korrekten Argumenten auf
21
- verdict_md, plot_df, raw_json = run_single_analysis_display(
22
- "mock_model", "mock_prompt", 42, 3, "", 0.0, progress=mock_progress
23
  )
24
 
25
- assert "Mock Verdict" in verdict_md
26
- assert "0.5000" in verdict_md
27
- assert isinstance(plot_df, pd.DataFrame)
28
- assert len(plot_df) == 3
29
- assert raw_json == mock_results
 
1
  import pandas as pd
2
  import pytest
3
 
4
+ from app import run_single_analysis_display, run_auto_suite_display
5
+
6
+ def test_run_single_analysis_display(mocker):
7
+ """Testet den Wrapper für Einzel-Experimente."""
8
+ mock_results = {"verdict": "V", "stats": {"mean_delta": 1}, "state_deltas": [1]}
 
 
 
 
 
 
 
9
  mocker.patch('app.run_seismic_analysis', return_value=mock_results)
10
+ mocker.patch('app.cleanup_memory')
11
+
12
+ verdict, df, raw = run_single_analysis_display(progress=mocker.MagicMock())
13
+
14
+ assert "V" in verdict
15
+ assert "1.0000" in verdict
16
+ assert isinstance(df, pd.DataFrame)
17
+ assert len(df) == 1
18
+
19
+ def test_run_auto_suite_display(mocker):
20
+ """Testet den Wrapper für die Auto-Experiment-Suite."""
21
+ mock_summary_df = pd.DataFrame([{"Experiment": "E1"}])
22
+ mock_plot_df = pd.DataFrame([{"Step": 0}])
23
+ mock_results = {"E1": {}}
24
 
25
+ mocker.patch('app.run_auto_suite', return_value=(mock_summary_df, mock_plot_df, mock_results))
26
+ mocker.patch('app.cleanup_memory')
27
 
28
+ summary_df, plot_df, raw = run_auto_suite_display(
29
+ "mock", 1, 42, "mock_exp", progress=mocker.MagicMock()
 
30
  )
31
 
32
+ assert summary_df.equals(mock_summary_df)
33
+ assert plot_df.equals(mock_plot_df)
34
+ assert raw == mock_results
 
 
tests/test_components.py CHANGED
@@ -3,20 +3,18 @@ import torch
3
  import pytest
4
  from unittest.mock import patch
5
 
6
- from cognitive_mapping_probe.llm_iface import get_or_load_model
7
  from cognitive_mapping_probe.resonance_seismograph import run_silent_cogitation_seismic
8
- from cognitive_mapping_probe.utils import dbg, DEBUG_ENABLED
 
 
9
 
10
  # --- Tests for llm_iface.py ---
11
 
12
  @patch('cognitive_mapping_probe.llm_iface.AutoTokenizer.from_pretrained')
13
  @patch('cognitive_mapping_probe.llm_iface.AutoModelForCausalLM.from_pretrained')
14
  def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, mocker):
15
- """
16
- Testet, ob `get_or_load_model` die Seeds korrekt setzt.
17
- Wir mocken hier die langsamen `from_pretrained`-Aufrufe.
18
- """
19
- # Mocke die Rückgabewerte der Hugging Face Ladefunktionen
20
  mock_model = mocker.MagicMock()
21
  mock_model.eval.return_value = None
22
  mock_model.set_attn_implementation.return_value = None
@@ -25,91 +23,75 @@ def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, moc
25
  mock_model_loader.return_value = mock_model
26
  mock_tokenizer_loader.return_value = mocker.MagicMock()
27
 
28
- # Mocke die globalen Seeding-Funktionen, um ihre Aufrufe zu überprüfen
29
  mock_torch_manual_seed = mocker.patch('torch.manual_seed')
30
  mock_np_random_seed = mocker.patch('numpy.random.seed')
31
 
32
  seed = 123
33
  get_or_load_model("fake-model", seed=seed)
34
 
35
- # ASSERT: Wurden die Seeding-Funktionen mit dem korrekten Seed aufgerufen?
36
  mock_torch_manual_seed.assert_called_with(seed)
37
  mock_np_random_seed.assert_called_with(seed)
38
 
39
  # --- Tests for resonance_seismograph.py ---
40
 
41
  def test_run_silent_cogitation_seismic_output_shape_and_type(mock_llm):
42
- """
43
- Testet die Kernfunktion `run_silent_cogitation_seismic`.
44
- ASSERT: Gibt eine Liste von Floats zurück, deren Länge der Anzahl der Schritte entspricht.
45
- """
46
  num_steps = 10
47
  state_deltas = run_silent_cogitation_seismic(
48
- llm=mock_llm,
49
- prompt_type="control_long_prose",
50
- num_steps=num_steps,
51
- temperature=0.7
52
  )
53
-
54
- assert isinstance(state_deltas, list)
55
- assert len(state_deltas) == num_steps
56
  assert all(isinstance(delta, float) for delta in state_deltas)
57
- assert all(delta >= 0 for delta in state_deltas) # Die Norm kann nicht negativ sein
58
 
59
- @pytest.mark.parametrize("num_steps", [0, 1, 100])
60
- def test_run_silent_cogitation_seismic_num_steps(mock_llm, num_steps):
61
- """
62
- Testet den Loop mit verschiedenen Anzahlen von Schritten.
63
- ASSERT: Die Länge der Ausgabe entspricht immer `num_steps`.
64
- """
65
- state_deltas = run_silent_cogitation_seismic(
66
- llm=mock_llm,
67
- prompt_type="control_long_prose",
68
- num_steps=num_steps,
69
- temperature=0.7
70
  )
71
- assert len(state_deltas) == num_steps
72
 
73
- # --- Tests for utils.py ---
74
 
75
- def test_dbg_enabled(capsys):
76
  """
77
- Testet die `dbg`-Funktion, wenn Debugging aktiviert ist.
78
- ASSERT: Die Nachricht wird auf stderr ausgegeben.
79
  """
80
- # Setze die Umgebungsvariable temporär
81
- os.environ["CMP_DEBUG"] = "1"
82
- # Wichtig: Nach dem Ändern der Env-Variable muss das Modul neu geladen werden,
83
- # damit die globale Variable `DEBUG_ENABLED` aktualisiert wird.
84
- import importlib
85
- from cognitive_mapping_probe import utils
86
- importlib.reload(utils)
 
 
 
87
 
88
- utils.dbg("test message", 123)
89
 
90
- captured = capsys.readouterr()
91
- assert "[DEBUG] test message 123" in captured.err
92
 
93
- def test_dbg_disabled(capsys):
94
- """
95
- Testet die `dbg`-Funktion, wenn Debugging deaktiviert ist.
96
- ASSERT: Es wird keine Ausgabe erzeugt.
97
- """
98
- # Setze die Umgebungsvariable auf "deaktiviert"
99
- if "CMP_DEBUG" in os.environ:
100
- del os.environ["CMP_DEBUG"]
101
 
 
 
 
102
  import importlib
103
  from cognitive_mapping_probe import utils
104
  importlib.reload(utils)
 
 
 
105
 
106
- utils.dbg("this should not be printed")
107
-
 
108
  captured = capsys.readouterr()
109
- assert captured.out == ""
110
  assert captured.err == ""
111
-
112
- # Setze den Zustand zurück, um andere Tests nicht zu beeinflussen
113
- if DEBUG_ENABLED:
114
- os.environ["CMP_DEBUG"] = "1"
115
- importlib.reload(utils)
 
3
  import pytest
4
  from unittest.mock import patch
5
 
6
+ from cognitive_mapping_probe.llm_iface import get_or_load_model, LLM
7
  from cognitive_mapping_probe.resonance_seismograph import run_silent_cogitation_seismic
8
+ from cognitive_mapping_probe.utils import dbg
9
+ # KORREKTUR: Importiere die Hauptfunktion, die wir testen wollen.
10
+ from cognitive_mapping_probe.concepts import get_concept_vector
11
 
12
  # --- Tests for llm_iface.py ---
13
 
14
  @patch('cognitive_mapping_probe.llm_iface.AutoTokenizer.from_pretrained')
15
  @patch('cognitive_mapping_probe.llm_iface.AutoModelForCausalLM.from_pretrained')
16
  def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, mocker):
17
+ """Testet, ob `get_or_load_model` die Seeds korrekt setzt."""
 
 
 
 
18
  mock_model = mocker.MagicMock()
19
  mock_model.eval.return_value = None
20
  mock_model.set_attn_implementation.return_value = None
 
23
  mock_model_loader.return_value = mock_model
24
  mock_tokenizer_loader.return_value = mocker.MagicMock()
25
 
 
26
  mock_torch_manual_seed = mocker.patch('torch.manual_seed')
27
  mock_np_random_seed = mocker.patch('numpy.random.seed')
28
 
29
  seed = 123
30
  get_or_load_model("fake-model", seed=seed)
31
 
 
32
  mock_torch_manual_seed.assert_called_with(seed)
33
  mock_np_random_seed.assert_called_with(seed)
34
 
35
  # --- Tests for resonance_seismograph.py ---
36
 
37
  def test_run_silent_cogitation_seismic_output_shape_and_type(mock_llm):
38
+ """Testet die grundlegende Funktionalität von `run_silent_cogitation_seismic`."""
 
 
 
39
  num_steps = 10
40
  state_deltas = run_silent_cogitation_seismic(
41
+ llm=mock_llm, prompt_type="control_long_prose",
42
+ num_steps=num_steps, temperature=0.7
 
 
43
  )
44
+ assert isinstance(state_deltas, list) and len(state_deltas) == num_steps
 
 
45
  assert all(isinstance(delta, float) for delta in state_deltas)
 
46
 
47
+ def test_run_silent_cogitation_with_injection_hook_usage(mock_llm):
48
+ """Testet, ob bei einer Injektion der Hook korrekt registriert wird."""
49
+ num_steps = 5
50
+ injection_vector = torch.randn(mock_llm.config.hidden_size)
51
+ run_silent_cogitation_seismic(
52
+ llm=mock_llm, prompt_type="resonance_prompt",
53
+ num_steps=num_steps, temperature=0.7,
54
+ injection_vector=injection_vector, injection_strength=1.0
 
 
 
55
  )
56
+ assert mock_llm.model.model.layers[0].register_forward_pre_hook.call_count == num_steps
57
 
58
+ # --- Tests for concepts.py ---
59
 
60
+ def test_get_concept_vector_logic(mock_llm, mocker):
61
  """
62
+ Testet die Logik von `get_concept_vector`.
63
+ KORRIGIERT: Patcht nun die refaktorisierte, auf Modulebene befindliche Funktion.
64
  """
65
+ mock_hidden_states = [
66
+ torch.ones(mock_llm.config.hidden_size) * 10,
67
+ torch.ones(mock_llm.config.hidden_size) * 2,
68
+ torch.ones(mock_llm.config.hidden_size) * 4
69
+ ]
70
+ # KORREKTUR: Der Patch-Pfad zeigt jetzt auf die korrekte, importierbare Funktion.
71
+ mocker.patch(
72
+ 'cognitive_mapping_probe.concepts._get_last_token_hidden_state',
73
+ side_effect=mock_hidden_states
74
+ )
75
 
76
+ concept_vector = get_concept_vector(mock_llm, "test", baseline_words=["a", "b"])
77
 
78
+ expected_vector = torch.ones(mock_llm.config.hidden_size) * 7
79
+ assert torch.allclose(concept_vector, expected_vector)
80
 
81
+ # --- Tests for utils.py ---
 
 
 
 
 
 
 
82
 
83
+ def test_dbg_output(capsys, monkeypatch):
84
+ """Testet die `dbg`-Funktion in beiden Zuständen."""
85
+ monkeypatch.setenv("CMP_DEBUG", "1")
86
  import importlib
87
  from cognitive_mapping_probe import utils
88
  importlib.reload(utils)
89
+ utils.dbg("test message")
90
+ captured = capsys.readouterr()
91
+ assert "[DEBUG] test message" in captured.err
92
 
93
+ monkeypatch.delenv("CMP_DEBUG", raising=False)
94
+ importlib.reload(utils)
95
+ utils.dbg("should not be printed")
96
  captured = capsys.readouterr()
 
97
  assert captured.err == ""
 
 
 
 
 
tests/test_integration.py DELETED
@@ -1,36 +0,0 @@
1
- import pytest
2
- import pandas as pd
3
-
4
- # KORREKTUR: Importiere den neuen, korrekten Funktionsnamen
5
- from app import run_single_analysis_display
6
- from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
7
-
8
- def test_end_to_end_with_mock_llm(mock_llm, mocker):
9
- """
10
- Ein End-to-End-Integrationstest, der den gesamten Datenfluss validiert.
11
- """
12
- # 1. Führe den Orchestrator mit dem `mock_llm` aus.
13
- results = run_seismic_analysis(
14
- model_id="mock_model",
15
- prompt_type="control_long_prose",
16
- seed=42,
17
- num_steps=5,
18
- concept_to_inject="test_concept",
19
- injection_strength=1.0,
20
- progress_callback=mocker.MagicMock()
21
- )
22
-
23
- assert "stats" in results
24
- assert len(results["state_deltas"]) == 5
25
-
26
- # 2. Mocke den Orchestrator, um die App-Logik zu testen
27
- mocker.patch('app.run_seismic_analysis', return_value=results)
28
-
29
- # 3. Führe die App-Logik (umbenannte Funktion) aus
30
- _, plot_df, _ = run_single_analysis_display(
31
- "mock_model", "control_long_prose", 42, 5, "test_concept", 1.0, progress=mocker.MagicMock()
32
- )
33
-
34
- assert isinstance(plot_df, pd.DataFrame)
35
- assert len(plot_df) == 5
36
- assert "State Change (Delta)" in plot_df.columns
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
tests/test_orchestration.py CHANGED
@@ -1,67 +1,74 @@
1
- import numpy as np
2
  import pytest
3
  import torch
4
- from types import SimpleNamespace
5
 
6
  from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
 
7
 
8
- def test_seismic_analysis_orchestrator_no_injection(mocker, mock_llm):
9
- """
10
- Testet den Orchestrator im Baseline-Modus (ohne Injektion).
11
- """
12
- mock_deltas = [1.0, 2.0, 3.0]
13
- mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=mock_deltas)
14
 
15
- mock_progress = mocker.MagicMock()
 
 
 
 
16
 
17
- results = run_seismic_analysis(
18
- model_id="mock_model",
19
- prompt_type="test_prompt",
20
- seed=42,
21
- num_steps=3,
22
- concept_to_inject="", # Kein Konzept
23
- injection_strength=0.0,
24
- progress_callback=mock_progress
25
  )
26
 
27
- # ASSERT: `run_silent_cogitation_seismic` wurde mit `injection_vector=None` aufgerufen
28
- mock_run_seismic.assert_called_once()
29
- call_args, call_kwargs = mock_run_seismic.call_args
30
- assert call_kwargs['injection_vector'] is None
31
 
32
- # ASSERT: Die Statistiken sind korrekt
33
- assert results["stats"]["mean_delta"] == pytest.approx(2.0)
 
 
 
34
 
35
- def test_seismic_analysis_orchestrator_with_injection(mocker, mock_llm):
36
- """
37
- Testet den Orchestrator mit aktivierter Konzeptinjektion.
38
- """
39
- mock_deltas = [5.0, 6.0, 7.0]
40
- mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=mock_deltas)
41
 
42
- # Der `mock_llm` Fixture patcht bereits `get_concept_vector`
43
- mock_get_concept_vector = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector')
 
44
 
45
- mock_progress = mocker.MagicMock()
46
 
47
- results = run_seismic_analysis(
48
- model_id="mock_model",
49
- prompt_type="test_prompt",
50
- seed=42,
51
- num_steps=3,
52
- concept_to_inject="test_concept", # Konzept wird übergeben
53
- injection_strength=1.5,
54
- progress_callback=mock_progress
55
- )
 
 
 
 
 
56
 
57
- # ASSERT: `get_concept_vector` wurde aufgerufen
58
- mock_get_concept_vector.assert_called_once_with(mocker.ANY, "test_concept")
 
 
59
 
60
- # ASSERT: `run_silent_cogitation_seismic` wurde mit einem Vektor und Stärke aufgerufen
61
- mock_run_seismic.assert_called_once()
62
- call_args, call_kwargs = mock_run_seismic.call_args
63
- assert call_kwargs['injection_vector'] is not None
64
- assert call_kwargs['injection_strength'] == 1.5
 
 
65
 
66
- # ASSERT: Die Statistiken sind korrekt
67
- assert results["stats"]["mean_delta"] == pytest.approx(6.0)
 
 
 
 
1
+ import pandas as pd
2
  import pytest
3
  import torch
 
4
 
5
  from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
6
+ from cognitive_mapping_probe.auto_experiment import run_auto_suite, get_curated_experiments
7
 
8
+ # --- Tests for orchestrator_seismograph.py ---
 
 
 
 
 
9
 
10
+ def test_run_seismic_analysis_no_injection(mocker):
11
+ """Testet den Orchestrator im Baseline-Modus (ohne Injektion)."""
12
+ mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=[1.0])
13
+ mock_get_model = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model')
14
+ mock_get_concept = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector')
15
 
16
+ run_seismic_analysis(
17
+ model_id="mock", prompt_type="test", seed=42, num_steps=1,
18
+ concept_to_inject="", injection_strength=0.0, progress_callback=mocker.MagicMock()
 
 
 
 
 
19
  )
20
 
21
+ mock_get_model.assert_called_once()
22
+ mock_run_seismic.assert_called_with(llm=mocker.ANY, prompt_type="test", num_steps=1, temperature=0.1, injection_vector=None, injection_strength=0.0)
23
+ mock_get_concept.assert_not_called()
 
24
 
25
+ def test_run_seismic_analysis_with_injection(mocker):
26
+ """Testet den Orchestrator mit aktivierter Konzeptinjektion."""
27
+ mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=[1.0])
28
+ mock_get_model = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model')
29
+ mock_get_concept = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector', return_value=torch.randn(10))
30
 
31
+ run_seismic_analysis(
32
+ model_id="mock", prompt_type="test", seed=42, num_steps=1,
33
+ concept_to_inject="test_concept", injection_strength=1.5, progress_callback=mocker.MagicMock()
34
+ )
 
 
35
 
36
+ mock_get_model.assert_called_once()
37
+ mock_get_concept.assert_called_once()
38
+ mock_run_seismic.assert_called_with(llm=mocker.ANY, prompt_type="test", num_steps=1, temperature=0.1, injection_vector=mocker.ANY, injection_strength=1.5)
39
 
40
+ # --- Tests for auto_experiment.py ---
41
 
42
+ def test_get_curated_experiments_structure():
43
+ """Testet die Datenstruktur der kuratierten Experimente, inklusive der neuen."""
44
+ experiments = get_curated_experiments()
45
+ assert isinstance(experiments, dict)
46
+ # Teste auf die Existenz der neuen Protokolle
47
+ assert "Subjective Identity Probe" in experiments
48
+ assert "Voight-Kampff Empathy Probe" in experiments
49
+
50
+ protocol = experiments["Voight-Kampff Empathy Probe"]
51
+ assert isinstance(protocol, list)
52
+ assert len(protocol) > 0
53
+ assert all(isinstance(run, dict) for run in protocol)
54
+ assert "label" in protocol[0]
55
+ assert "prompt_type" in protocol[0]
56
 
57
+ def test_run_auto_suite_logic(mocker):
58
+ """Testet die Logik der `run_auto_suite` Funktion."""
59
+ mock_analysis_result = {"stats": {"mean_delta": 1.0}, "state_deltas": [1.0]}
60
+ mock_run_analysis = mocker.patch('cognitive_mapping_probe.auto_experiment.run_seismic_analysis', return_value=mock_analysis_result)
61
 
62
+ experiment_name = "Calm vs. Chaos"
63
+ num_runs = len(get_curated_experiments()[experiment_name])
64
+
65
+ summary_df, plot_df, all_results = run_auto_suite(
66
+ model_id="mock", num_steps=1, seed=42,
67
+ experiment_name=experiment_name, progress_callback=mocker.MagicMock()
68
+ )
69
 
70
+ assert mock_run_analysis.call_count == num_runs
71
+ assert isinstance(summary_df, pd.DataFrame)
72
+ assert len(summary_df) == num_runs
73
+ assert isinstance(plot_df, pd.DataFrame)
74
+ assert len(plot_df) == num_runs