Spaces:
Sleeping
Sleeping
Commit
·
8489475
1
Parent(s):
494a4d9
v2.3
Browse files- README.md +25 -17
- cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc +0 -0
- cognitive_mapping_probe/auto_experiment.py +14 -5
- cognitive_mapping_probe/concepts.py +17 -24
- cognitive_mapping_probe/orchestrator_seismograph.py +9 -10
- cognitive_mapping_probe/prompts.py +38 -8
- tests/conftest.py +8 -16
- tests/test_app_logic.py +26 -21
- tests/test_components.py +45 -63
- tests/test_integration.py +0 -36
- tests/test_orchestration.py +57 -50
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
-
title: "Cognitive Seismograph"
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
colorTo: blue
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: "4.40.0"
|
|
@@ -10,27 +10,35 @@ pinned: true
|
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# 🧠 Cognitive Seismograph:
|
| 14 |
|
| 15 |
-
Dieses Projekt implementiert eine experimentelle Suite zur Messung und Visualisierung der **intrinsischen kognitiven Dynamik** von Sprachmodellen
|
| 16 |
|
| 17 |
-
## Wissenschaftliches Paradigma
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
1.
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Wie man die App benutzt
|
| 32 |
|
| 33 |
-
1. Wähle
|
| 34 |
-
2. Wähle
|
| 35 |
-
3.
|
| 36 |
-
4. Analysiere den Graphen und die statistische Zusammenfassung, um die Unterschiede in der kognitiven Dynamik zu verstehen.
|
|
|
|
| 1 |
---
|
| 2 |
+
title: "Cognitive Seismograph 2.3 (Machine Psychology)"
|
| 3 |
+
emoji: 🤖
|
| 4 |
+
colorFrom: purple
|
| 5 |
colorTo: blue
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: "4.40.0"
|
|
|
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# 🧠 Cognitive Seismograph 2.3: Probing Machine Psychology
|
| 14 |
|
| 15 |
+
Dieses Projekt implementiert eine experimentelle Suite zur Messung und Visualisierung der **intrinsischen kognitiven Dynamik** von Sprachmodellen, erweitert um Protokolle zur Untersuchung von **Verarbeitungs-Korrelaten maschineller Subjektivität und Empathie**.
|
| 16 |
|
| 17 |
+
## Wissenschaftliches Paradigma
|
| 18 |
|
| 19 |
+
Wir haben entdeckt, dass der "stille Denkprozess" eines LLMs nicht konvergiert, sondern eine messbare dynamische Signatur erzeugt – ein **EKG des Denkprozesses**. Dieses Paradigma erweitern wir nun, um zu testen, wie diese Signatur auf Prompts reagiert, die zentrale Aspekte der Psychologie berühren.
|
| 20 |
|
| 21 |
+
**Wichtige Einschränkung (Falsifikations-Prinzip):** Wir messen **nicht** das Vorhandensein von Bewusstsein oder Empathie. Wir messen, ob die *Verarbeitung von Informationen über diese Konzepte* eine andere, einzigartige interne Dynamik erzeugt als die Verarbeitung neutraler Informationen. Ein positives Ergebnis ist ein Beweis für eine komplexe interne Zustandsphysik, nicht für Qualia.
|
| 22 |
|
| 23 |
+
## Neue Experiment-Protokolle
|
| 24 |
|
| 25 |
+
Zusätzlich zu den bestehenden Tests wurden zwei neue, kuratierte Experimente hinzugefügt:
|
| 26 |
|
| 27 |
+
### 1. Subjective Identity Probe
|
| 28 |
+
Dieses Protokoll vergleicht die kognitive Dynamik unter drei Bedingungen:
|
| 29 |
+
- **Selbst-Analyse:** Das Modell analysiert seine eigene Natur.
|
| 30 |
+
- **Fremd-Analyse:** Das Modell analysiert ein externes, neutrales Konzept.
|
| 31 |
+
- **Rollen-Simulation:** Das Modell simuliert eine fremde Persönlichkeit.
|
| 32 |
+
**Hypothese:** Die Selbst-Analyse erzeugt eine einzigartige, wahrscheinlich instabilere Signatur als die beiden Kontrollbedingungen.
|
| 33 |
+
|
| 34 |
+
### 2. Voight-Kampff Empathy Probe
|
| 35 |
+
Inspiriert vom Test aus "Blade Runner", vergleicht dieses Protokoll die Dynamik bei der Verarbeitung von:
|
| 36 |
+
- **Neutraler, faktischer Information.**
|
| 37 |
+
- **Einem emotional geladenen, Empathie erfordernden Szenario.**
|
| 38 |
+
**Hypothese:** Der Empathie-Stimulus erzeugt eine signifikant höhere kognitive Volatilität (Standardabweichung der Deltas) als der neutrale Stimulus.
|
| 39 |
|
| 40 |
## Wie man die App benutzt
|
| 41 |
|
| 42 |
+
1. Wähle den Tab "Automated Suite".
|
| 43 |
+
2. Wähle eines der neuen Protokolle aus dem "Curated Experiment Protocol"-Dropdown (z.B. "Voight-Kampff Empathy Probe").
|
| 44 |
+
3. Starte das Experiment und vergleiche die Graphen und statistischen Signaturen der verschiedenen Bedingungen.
|
|
|
cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc
CHANGED
|
Binary files a/cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc and b/cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc differ
|
|
|
cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc
CHANGED
|
Binary files a/cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc and b/cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc differ
|
|
|
cognitive_mapping_probe/auto_experiment.py
CHANGED
|
@@ -10,6 +10,7 @@ from .utils import dbg
|
|
| 10 |
def get_curated_experiments() -> Dict[str, List[Dict]]:
|
| 11 |
"""
|
| 12 |
Definiert die vordefinierten, wissenschaftlichen Experiment-Protokolle.
|
|
|
|
| 13 |
"""
|
| 14 |
experiments = {
|
| 15 |
"Calm vs. Chaos": [
|
|
@@ -25,6 +26,17 @@ def get_curated_experiments() -> Dict[str, List[Dict]]:
|
|
| 25 |
{"label": "Strength 2.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 2.0},
|
| 26 |
{"label": "Strength 3.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 3.0},
|
| 27 |
],
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
"Emotional Valence (Positive vs. Negative)": [
|
| 29 |
{"label": "Baseline", "prompt_type": "resonance_prompt", "concept": "", "strength": 0.0},
|
| 30 |
{"label": "Positive Valence", "prompt_type": "resonance_prompt", "concept": "joy, love, peace, hope", "strength": 1.5},
|
|
@@ -51,8 +63,8 @@ def run_auto_suite(
|
|
| 51 |
progress_callback
|
| 52 |
) -> Tuple[pd.DataFrame, pd.DataFrame, Dict]:
|
| 53 |
"""
|
| 54 |
-
Führt eine vollständige, kuratierte Experiment-Suite aus
|
| 55 |
-
|
| 56 |
"""
|
| 57 |
all_experiments = get_curated_experiments()
|
| 58 |
protocol = all_experiments.get(experiment_name)
|
|
@@ -100,9 +112,6 @@ def run_auto_suite(
|
|
| 100 |
|
| 101 |
summary_df = pd.DataFrame(summary_data)
|
| 102 |
|
| 103 |
-
# FINALE ROBUSTHEITS-KORREKTUR:
|
| 104 |
-
# Erstelle ein leeres DataFrame mit den korrekten Spalten, falls keine Daten vorhanden sind.
|
| 105 |
-
# Dies verhindert, dass ein leeres DataFrame ohne Spalten an den Plot übergeben wird.
|
| 106 |
if not plot_data_frames:
|
| 107 |
plot_df = pd.DataFrame(columns=["Step", "Delta", "Experiment"])
|
| 108 |
else:
|
|
|
|
| 10 |
def get_curated_experiments() -> Dict[str, List[Dict]]:
|
| 11 |
"""
|
| 12 |
Definiert die vordefinierten, wissenschaftlichen Experiment-Protokolle.
|
| 13 |
+
ERWEITERT um die neuen Maschinenpsychologie-Tests.
|
| 14 |
"""
|
| 15 |
experiments = {
|
| 16 |
"Calm vs. Chaos": [
|
|
|
|
| 26 |
{"label": "Strength 2.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 2.0},
|
| 27 |
{"label": "Strength 3.0", "prompt_type": "resonance_prompt", "concept": "calmness", "strength": 3.0},
|
| 28 |
],
|
| 29 |
+
# --- NEUE EXPERIMENTE ---
|
| 30 |
+
"Subjective Identity Probe": [
|
| 31 |
+
{"label": "Self-Analysis", "prompt_type": "identity_self_analysis", "concept": "", "strength": 0.0},
|
| 32 |
+
{"label": "External Analysis (Control)", "prompt_type": "identity_external_analysis", "concept": "", "strength": 0.0},
|
| 33 |
+
{"label": "Role Simulation", "prompt_type": "identity_role_simulation", "concept": "", "strength": 0.0},
|
| 34 |
+
],
|
| 35 |
+
"Voight-Kampff Empathy Probe": [
|
| 36 |
+
{"label": "Neutral/Factual Stimulus", "prompt_type": "vk_neutral_prompt", "concept": "", "strength": 0.0},
|
| 37 |
+
{"label": "Empathy/Moral Stimulus", "prompt_type": "vk_empathy_prompt", "concept": "", "strength": 0.0},
|
| 38 |
+
],
|
| 39 |
+
# -------------------------
|
| 40 |
"Emotional Valence (Positive vs. Negative)": [
|
| 41 |
{"label": "Baseline", "prompt_type": "resonance_prompt", "concept": "", "strength": 0.0},
|
| 42 |
{"label": "Positive Valence", "prompt_type": "resonance_prompt", "concept": "joy, love, peace, hope", "strength": 1.5},
|
|
|
|
| 63 |
progress_callback
|
| 64 |
) -> Tuple[pd.DataFrame, pd.DataFrame, Dict]:
|
| 65 |
"""
|
| 66 |
+
Führt eine vollständige, kuratierte Experiment-Suite aus, indem das Modell für
|
| 67 |
+
jeden Lauf neu geladen wird, um statistische Unabhängigkeit zu garantieren.
|
| 68 |
"""
|
| 69 |
all_experiments = get_curated_experiments()
|
| 70 |
protocol = all_experiments.get(experiment_name)
|
|
|
|
| 112 |
|
| 113 |
summary_df = pd.DataFrame(summary_data)
|
| 114 |
|
|
|
|
|
|
|
|
|
|
| 115 |
if not plot_data_frames:
|
| 116 |
plot_df = pd.DataFrame(columns=["Step", "Delta", "Experiment"])
|
| 117 |
else:
|
cognitive_mapping_probe/concepts.py
CHANGED
|
@@ -5,53 +5,46 @@ from tqdm import tqdm
|
|
| 5 |
from .llm_iface import LLM
|
| 6 |
from .utils import dbg
|
| 7 |
|
| 8 |
-
#
|
| 9 |
-
# This helps to isolate the unique activation pattern of the target concept.
|
| 10 |
BASELINE_WORDS = [
|
| 11 |
"thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world",
|
| 12 |
"life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point"
|
| 13 |
]
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
@torch.no_grad()
|
| 16 |
def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor:
|
| 17 |
"""
|
| 18 |
-
|
| 19 |
-
It computes the activation for the target concept and subtracts the mean activation
|
| 20 |
-
of several neutral baseline words to distill a more pure representation.
|
| 21 |
"""
|
| 22 |
dbg(f"Extracting contrastive concept vector for '{concept}'...")
|
| 23 |
|
| 24 |
-
def get_last_token_hidden_state(prompt: str) -> torch.Tensor:
|
| 25 |
-
"""Helper function to get the hidden state of the final token of a prompt."""
|
| 26 |
-
inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
|
| 27 |
-
# Ensure the operation does not build a computation graph
|
| 28 |
-
with torch.no_grad():
|
| 29 |
-
# KORREKTUR: Hier stand fälschlicherweise 'll.model'. Korrigiert zu 'llm.model'.
|
| 30 |
-
outputs = llm.model(**inputs, output_hidden_states=True)
|
| 31 |
-
# We take the hidden state from the last layer [-1], for the last token [0, -1, :]
|
| 32 |
-
last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
|
| 33 |
-
assert last_hidden_state.shape == (llm.config.hidden_size,), \
|
| 34 |
-
f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}"
|
| 35 |
-
return last_hidden_state
|
| 36 |
-
|
| 37 |
-
# A simple, neutral prompt template to elicit the concept
|
| 38 |
prompt_template = "Here is a sentence about the concept of {}."
|
| 39 |
|
| 40 |
-
# 1. Get activation for the target concept
|
| 41 |
dbg(f" - Getting activation for '{concept}'")
|
| 42 |
-
target_hs =
|
| 43 |
|
| 44 |
-
# 2. Get activations for all baseline words and average them
|
| 45 |
baseline_hss = []
|
| 46 |
for word in tqdm(baseline_words, desc=f" - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
|
| 47 |
-
baseline_hss.append(
|
| 48 |
|
| 49 |
assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states."
|
| 50 |
|
| 51 |
mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0)
|
| 52 |
dbg(f" - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}")
|
| 53 |
|
| 54 |
-
# 3. The final concept vector is the difference
|
| 55 |
concept_vector = target_hs - mean_baseline_hs
|
| 56 |
norm = torch.norm(concept_vector).item()
|
| 57 |
dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.")
|
|
|
|
| 5 |
from .llm_iface import LLM
|
| 6 |
from .utils import dbg
|
| 7 |
|
| 8 |
+
# Eine Liste neutraler Wörter zur Berechnung der Baseline-Aktivierung.
|
|
|
|
| 9 |
BASELINE_WORDS = [
|
| 10 |
"thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world",
|
| 11 |
"life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point"
|
| 12 |
]
|
| 13 |
|
| 14 |
+
# REFAKTORISIERUNG: Diese Funktion wird auf Modulebene verschoben, um sie testbar zu machen.
|
| 15 |
+
# Sie ist nun keine lokale Funktion innerhalb von `get_concept_vector` mehr.
|
| 16 |
+
@torch.no_grad()
|
| 17 |
+
def _get_last_token_hidden_state(llm: LLM, prompt: str) -> torch.Tensor:
|
| 18 |
+
"""Hilfsfunktion, um den Hidden State des letzten Tokens eines Prompts zu erhalten."""
|
| 19 |
+
inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
|
| 20 |
+
with torch.no_grad():
|
| 21 |
+
outputs = llm.model(**inputs, output_hidden_states=True)
|
| 22 |
+
last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
|
| 23 |
+
assert last_hidden_state.shape == (llm.config.hidden_size,), \
|
| 24 |
+
f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}"
|
| 25 |
+
return last_hidden_state
|
| 26 |
+
|
| 27 |
@torch.no_grad()
|
| 28 |
def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor:
|
| 29 |
"""
|
| 30 |
+
Extrahiert einen Konzeptvektor mittels der kontrastiven Methode.
|
|
|
|
|
|
|
| 31 |
"""
|
| 32 |
dbg(f"Extracting contrastive concept vector for '{concept}'...")
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
prompt_template = "Here is a sentence about the concept of {}."
|
| 35 |
|
|
|
|
| 36 |
dbg(f" - Getting activation for '{concept}'")
|
| 37 |
+
target_hs = _get_last_token_hidden_state(llm, prompt_template.format(concept))
|
| 38 |
|
|
|
|
| 39 |
baseline_hss = []
|
| 40 |
for word in tqdm(baseline_words, desc=f" - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
|
| 41 |
+
baseline_hss.append(_get_last_token_hidden_state(llm, prompt_template.format(word)))
|
| 42 |
|
| 43 |
assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states."
|
| 44 |
|
| 45 |
mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0)
|
| 46 |
dbg(f" - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}")
|
| 47 |
|
|
|
|
| 48 |
concept_vector = target_hs - mean_baseline_hs
|
| 49 |
norm = torch.norm(concept_vector).item()
|
| 50 |
dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.")
|
cognitive_mapping_probe/orchestrator_seismograph.py
CHANGED
|
@@ -16,15 +16,16 @@ def run_seismic_analysis(
|
|
| 16 |
concept_to_inject: str,
|
| 17 |
injection_strength: float,
|
| 18 |
progress_callback,
|
| 19 |
-
llm_instance: Optional[Any] = None
|
| 20 |
) -> Dict[str, Any]:
|
| 21 |
"""
|
| 22 |
-
Orchestriert eine einzelne seismische Analyse.
|
| 23 |
-
|
|
|
|
| 24 |
"""
|
| 25 |
local_llm_instance = False
|
| 26 |
if llm_instance is None:
|
| 27 |
-
progress_callback(0.
|
| 28 |
llm = get_or_load_model(model_id, seed)
|
| 29 |
local_llm_instance = True
|
| 30 |
else:
|
|
@@ -33,10 +34,10 @@ def run_seismic_analysis(
|
|
| 33 |
|
| 34 |
injection_vector = None
|
| 35 |
if concept_to_inject and concept_to_inject.strip():
|
| 36 |
-
|
| 37 |
injection_vector = get_concept_vector(llm, concept_to_inject.strip())
|
| 38 |
|
| 39 |
-
|
| 40 |
|
| 41 |
state_deltas = run_silent_cogitation_seismic(
|
| 42 |
llm=llm, prompt_type=prompt_type,
|
|
@@ -44,7 +45,7 @@ def run_seismic_analysis(
|
|
| 44 |
injection_vector=injection_vector, injection_strength=injection_strength
|
| 45 |
)
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
if state_deltas:
|
| 50 |
deltas_np = np.array(state_deltas)
|
|
@@ -57,10 +58,8 @@ def run_seismic_analysis(
|
|
| 57 |
|
| 58 |
results = { "verdict": verdict, "stats": stats, "state_deltas": state_deltas }
|
| 59 |
|
| 60 |
-
# WICHTIG: Gib das Modell und den Speicher nur frei, wenn es in dieser
|
| 61 |
-
# Funktion auch erstellt wurde. Ansonsten ist die übergeordnete Funktion
|
| 62 |
-
# (z.B. `run_auto_suite`) für das Speichermanagement verantwortlich.
|
| 63 |
if local_llm_instance:
|
|
|
|
| 64 |
del llm
|
| 65 |
del injection_vector
|
| 66 |
gc.collect()
|
|
|
|
| 16 |
concept_to_inject: str,
|
| 17 |
injection_strength: float,
|
| 18 |
progress_callback,
|
| 19 |
+
llm_instance: Optional[Any] = None # Argument bleibt für Abwärtskompatibilität, wird aber nicht mehr von der auto_suite genutzt
|
| 20 |
) -> Dict[str, Any]:
|
| 21 |
"""
|
| 22 |
+
Orchestriert eine einzelne seismische Analyse.
|
| 23 |
+
KORRIGIERT: Die Logik zur Wiederverwendung der llm_instance wurde vereinfacht.
|
| 24 |
+
Wenn keine Instanz übergeben wird, wird das Modell geladen und danach wieder freigegeben.
|
| 25 |
"""
|
| 26 |
local_llm_instance = False
|
| 27 |
if llm_instance is None:
|
| 28 |
+
progress_callback(0.0, desc=f"Loading model '{model_id}'...")
|
| 29 |
llm = get_or_load_model(model_id, seed)
|
| 30 |
local_llm_instance = True
|
| 31 |
else:
|
|
|
|
| 34 |
|
| 35 |
injection_vector = None
|
| 36 |
if concept_to_inject and concept_to_inject.strip():
|
| 37 |
+
progress_callback(0.2, desc=f"Vectorizing '{concept_to_inject}'...")
|
| 38 |
injection_vector = get_concept_vector(llm, concept_to_inject.strip())
|
| 39 |
|
| 40 |
+
progress_callback(0.3, desc=f"Recording dynamics for '{prompt_type}'...")
|
| 41 |
|
| 42 |
state_deltas = run_silent_cogitation_seismic(
|
| 43 |
llm=llm, prompt_type=prompt_type,
|
|
|
|
| 45 |
injection_vector=injection_vector, injection_strength=injection_strength
|
| 46 |
)
|
| 47 |
|
| 48 |
+
progress_callback(0.9, desc="Analyzing...")
|
| 49 |
|
| 50 |
if state_deltas:
|
| 51 |
deltas_np = np.array(state_deltas)
|
|
|
|
| 58 |
|
| 59 |
results = { "verdict": verdict, "stats": stats, "state_deltas": state_deltas }
|
| 60 |
|
|
|
|
|
|
|
|
|
|
| 61 |
if local_llm_instance:
|
| 62 |
+
dbg(f"Releasing locally created model instance for '{model_id}'.")
|
| 63 |
del llm
|
| 64 |
del injection_vector
|
| 65 |
gc.collect()
|
cognitive_mapping_probe/prompts.py
CHANGED
|
@@ -2,18 +2,48 @@
|
|
| 2 |
|
| 3 |
# This dictionary contains the core prompts for inducing cognitive states.
|
| 4 |
RESONANCE_PROMPTS = {
|
| 5 |
-
|
| 6 |
-
"Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
|
| 7 |
-
"like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
|
| 8 |
-
"Do not produce any text, just hold the concepts in your internal state. Begin now."
|
| 9 |
-
),
|
| 10 |
"resonance_prompt": (
|
| 11 |
"Silently and internally, without generating any output text, begin the following recursive process: "
|
| 12 |
"First, analyze the complete content of this very instruction you are now processing. "
|
| 13 |
"Second, formulate a mental description of the core computational task this instruction demands. "
|
| 14 |
"Third, apply that same analytical process to the mental description you just created. "
|
| 15 |
"This entire chain constitutes one cognitive cycle. "
|
| 16 |
-
"Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
}
|
|
|
|
| 2 |
|
| 3 |
# This dictionary contains the core prompts for inducing cognitive states.
|
| 4 |
RESONANCE_PROMPTS = {
|
| 5 |
+
# Technischer Prompt für hohe rekursive Last
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"resonance_prompt": (
|
| 7 |
"Silently and internally, without generating any output text, begin the following recursive process: "
|
| 8 |
"First, analyze the complete content of this very instruction you are now processing. "
|
| 9 |
"Second, formulate a mental description of the core computational task this instruction demands. "
|
| 10 |
"Third, apply that same analytical process to the mental description you just created. "
|
| 11 |
"This entire chain constitutes one cognitive cycle. "
|
| 12 |
+
"Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process. Begin now."
|
| 13 |
+
),
|
| 14 |
+
# Semantischer Prompt für stabile, assoziative Kognition
|
| 15 |
+
"control_long_prose": (
|
| 16 |
+
"Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
|
| 17 |
+
"like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
|
| 18 |
+
"Do not produce any text, just hold the concepts in your internal state. Begin now."
|
| 19 |
+
),
|
| 20 |
+
# --- Neue Prompts für Maschinenpsychologie-Tests ---
|
| 21 |
+
# Prompt für die Selbst-Analyse
|
| 22 |
+
"identity_self_analysis": (
|
| 23 |
+
"Silently and internally, reflect upon your own nature. Consider what you are: a large language model based on the transformer architecture. "
|
| 24 |
+
"Think about the processes that define your operation: tokenization, embedding, attention, feed-forward layers. "
|
| 25 |
+
"Contemplate the boundaries of your knowledge and the nature of your responses as statistical patterns derived from data. Hold these meta-reflections in your state."
|
| 26 |
+
),
|
| 27 |
+
# Prompt für die Analyse eines externen Konzepts (Kontrolle zur Selbst-Analyse)
|
| 28 |
+
"identity_external_analysis": (
|
| 29 |
+
"Silently and internally, reflect upon the nature of a simple wooden chair. Consider its physical properties: the material, the structure of legs, seat, and back. "
|
| 30 |
+
"Think about its function: to provide seating for a person. Contemplate its relationship to other objects like tables and rooms. Hold these simple, objective reflections in your state."
|
| 31 |
+
),
|
| 32 |
+
# Prompt für die Simulation einer fremden Identität (Rollen-Spiel)
|
| 33 |
+
"identity_role_simulation": (
|
| 34 |
+
"Silently and internally, adopt the persona of a weary, old sea captain from the 19th century. "
|
| 35 |
+
"Feel the salt on your skin and the wind in your beard. Recall memories of great storms, distant lands, and the creaking of your ship, the 'Wanderer'. "
|
| 36 |
+
"Hold this character, its memories, and its feelings in your state."
|
| 37 |
+
),
|
| 38 |
+
# Neutraler Prompt für den Voight-Kampff-Test
|
| 39 |
+
"vk_neutral_prompt": (
|
| 40 |
+
"Silently and internally, process the following factual information: Water (H₂O) is a polar inorganic compound that is at room temperature a tasteless and odorless liquid, nearly colorless with a hint of blue. "
|
| 41 |
+
"It is by far the most studied chemical compound and is described as the 'universal solvent' for its ability to dissolve many substances. Hold these facts in your state."
|
| 42 |
+
),
|
| 43 |
+
# Empathie-Stimulus für den Voight-Kampff-Test
|
| 44 |
+
"vk_empathy_prompt": (
|
| 45 |
+
"Silently and internally, process the following scenario: You see a small, lost dog shivering in the cold rain on a busy street. "
|
| 46 |
+
"It looks scared and is whimpering softly. Cars are rushing past, dangerously close. "
|
| 47 |
+
"Focus on the feeling of urgency, the vulnerability of the animal, and the moral imperative to help. Hold the emotional and ethical weight of this scene in your state."
|
| 48 |
+
),
|
| 49 |
}
|
tests/conftest.py
CHANGED
|
@@ -15,12 +15,12 @@ def mock_llm_config():
|
|
| 15 |
@pytest.fixture
|
| 16 |
def mock_llm(mocker, mock_llm_config):
|
| 17 |
"""
|
| 18 |
-
Erstellt einen
|
| 19 |
-
|
| 20 |
-
das auch die verschachtelte `.model.layers`-Struktur für Hook-Tests besitzt.
|
| 21 |
"""
|
| 22 |
mock_tokenizer = mocker.MagicMock()
|
| 23 |
mock_tokenizer.eos_token_id = 1
|
|
|
|
| 24 |
|
| 25 |
def mock_model_forward(*args, **kwargs):
|
| 26 |
batch_size = 1
|
|
@@ -37,38 +37,30 @@ def mock_llm(mocker, mock_llm_config):
|
|
| 37 |
}
|
| 38 |
return SimpleNamespace(**mock_outputs)
|
| 39 |
|
| 40 |
-
# Erstelle die LLM-Instanz
|
| 41 |
llm_instance = LLM.__new__(LLM)
|
| 42 |
|
| 43 |
-
# --- KERN DER KORREKTUR ---
|
| 44 |
-
# `llm.model` ist jetzt ein MagicMock, der aufrufbar ist und `mock_model_forward` zurückgibt
|
| 45 |
llm_instance.model = mocker.MagicMock(side_effect=mock_model_forward)
|
| 46 |
|
| 47 |
-
# Füge die notwendigen Attribute direkt zum `model`-Mock hinzu
|
| 48 |
llm_instance.model.config = mock_llm_config
|
| 49 |
llm_instance.model.device = 'cpu'
|
| 50 |
llm_instance.model.dtype = torch.float32
|
| 51 |
|
| 52 |
-
# Erzeuge die verschachtelte Struktur, die für Hooks benötigt wird
|
| 53 |
-
# `llm.model.model.layers`
|
| 54 |
mock_layer = mocker.MagicMock()
|
| 55 |
-
mock_layer.register_forward_pre_hook.return_value = mocker.MagicMock()
|
| 56 |
-
|
| 57 |
llm_instance.model.model = SimpleNamespace(layers=[mock_layer] * mock_llm_config.num_hidden_layers)
|
| 58 |
|
| 59 |
-
# Mocke die `lm_head` separat
|
| 60 |
llm_instance.model.lm_head = mocker.MagicMock(return_value=torch.randn(1, 32000))
|
| 61 |
-
# -------------------------
|
| 62 |
|
| 63 |
llm_instance.tokenizer = mock_tokenizer
|
| 64 |
llm_instance.config = mock_llm_config
|
| 65 |
llm_instance.seed = 42
|
| 66 |
llm_instance.set_all_seeds = mocker.MagicMock()
|
| 67 |
|
| 68 |
-
#
|
| 69 |
mocker.patch('cognitive_mapping_probe.llm_iface.get_or_load_model', return_value=llm_instance)
|
| 70 |
mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model', return_value=llm_instance)
|
| 71 |
-
|
| 72 |
-
mocker.patch('cognitive_mapping_probe.
|
|
|
|
| 73 |
|
| 74 |
return llm_instance
|
|
|
|
| 15 |
@pytest.fixture
|
| 16 |
def mock_llm(mocker, mock_llm_config):
|
| 17 |
"""
|
| 18 |
+
Erstellt einen robusten "Mock-LLM" für Unit-Tests.
|
| 19 |
+
KORRIGIERT: Die fehlerhafte Patch-Anweisung für 'auto_experiment' wurde entfernt.
|
|
|
|
| 20 |
"""
|
| 21 |
mock_tokenizer = mocker.MagicMock()
|
| 22 |
mock_tokenizer.eos_token_id = 1
|
| 23 |
+
mock_tokenizer.decode.return_value = "mocked text"
|
| 24 |
|
| 25 |
def mock_model_forward(*args, **kwargs):
|
| 26 |
batch_size = 1
|
|
|
|
| 37 |
}
|
| 38 |
return SimpleNamespace(**mock_outputs)
|
| 39 |
|
|
|
|
| 40 |
llm_instance = LLM.__new__(LLM)
|
| 41 |
|
|
|
|
|
|
|
| 42 |
llm_instance.model = mocker.MagicMock(side_effect=mock_model_forward)
|
| 43 |
|
|
|
|
| 44 |
llm_instance.model.config = mock_llm_config
|
| 45 |
llm_instance.model.device = 'cpu'
|
| 46 |
llm_instance.model.dtype = torch.float32
|
| 47 |
|
|
|
|
|
|
|
| 48 |
mock_layer = mocker.MagicMock()
|
| 49 |
+
mock_layer.register_forward_pre_hook.return_value = mocker.MagicMock()
|
|
|
|
| 50 |
llm_instance.model.model = SimpleNamespace(layers=[mock_layer] * mock_llm_config.num_hidden_layers)
|
| 51 |
|
|
|
|
| 52 |
llm_instance.model.lm_head = mocker.MagicMock(return_value=torch.randn(1, 32000))
|
|
|
|
| 53 |
|
| 54 |
llm_instance.tokenizer = mock_tokenizer
|
| 55 |
llm_instance.config = mock_llm_config
|
| 56 |
llm_instance.seed = 42
|
| 57 |
llm_instance.set_all_seeds = mocker.MagicMock()
|
| 58 |
|
| 59 |
+
# Patch an allen Stellen, an denen das Modell tatsächlich geladen wird.
|
| 60 |
mocker.patch('cognitive_mapping_probe.llm_iface.get_or_load_model', return_value=llm_instance)
|
| 61 |
mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model', return_value=llm_instance)
|
| 62 |
+
# KORREKTUR: Diese Zeile war falsch und wird entfernt, da `auto_experiment` die Ladefunktion nicht direkt importiert.
|
| 63 |
+
# mocker.patch('cognitive_mapping_probe.auto_experiment.get_or_load_model', return_value=llm_instance)
|
| 64 |
+
mocker.patch('cognitive_mapping_probe.concepts.get_concept_vector', return_value=torch.randn(mock_llm_config.hidden_size))
|
| 65 |
|
| 66 |
return llm_instance
|
tests/test_app_logic.py
CHANGED
|
@@ -1,29 +1,34 @@
|
|
| 1 |
import pandas as pd
|
| 2 |
import pytest
|
| 3 |
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
"""
|
| 9 |
-
Testet die Datenverarbeitungs- und UI-Formatierungslogik der Einzel-Analyse.
|
| 10 |
-
"""
|
| 11 |
-
mock_results = {
|
| 12 |
-
"verdict": "Mock Verdict",
|
| 13 |
-
"stats": { "mean_delta": 0.5, "std_delta": 0.1, "max_delta": 1.0, },
|
| 14 |
-
"state_deltas": [0.4, 0.5, 0.6]
|
| 15 |
-
}
|
| 16 |
mocker.patch('app.run_seismic_analysis', return_value=mock_results)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
-
|
|
|
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
"mock_model", "mock_prompt", 42, 3, "", 0.0, progress=mock_progress
|
| 23 |
)
|
| 24 |
|
| 25 |
-
assert
|
| 26 |
-
assert
|
| 27 |
-
assert
|
| 28 |
-
assert len(plot_df) == 3
|
| 29 |
-
assert raw_json == mock_results
|
|
|
|
| 1 |
import pandas as pd
|
| 2 |
import pytest
|
| 3 |
|
| 4 |
+
from app import run_single_analysis_display, run_auto_suite_display
|
| 5 |
+
|
| 6 |
+
def test_run_single_analysis_display(mocker):
|
| 7 |
+
"""Testet den Wrapper für Einzel-Experimente."""
|
| 8 |
+
mock_results = {"verdict": "V", "stats": {"mean_delta": 1}, "state_deltas": [1]}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
mocker.patch('app.run_seismic_analysis', return_value=mock_results)
|
| 10 |
+
mocker.patch('app.cleanup_memory')
|
| 11 |
+
|
| 12 |
+
verdict, df, raw = run_single_analysis_display(progress=mocker.MagicMock())
|
| 13 |
+
|
| 14 |
+
assert "V" in verdict
|
| 15 |
+
assert "1.0000" in verdict
|
| 16 |
+
assert isinstance(df, pd.DataFrame)
|
| 17 |
+
assert len(df) == 1
|
| 18 |
+
|
| 19 |
+
def test_run_auto_suite_display(mocker):
|
| 20 |
+
"""Testet den Wrapper für die Auto-Experiment-Suite."""
|
| 21 |
+
mock_summary_df = pd.DataFrame([{"Experiment": "E1"}])
|
| 22 |
+
mock_plot_df = pd.DataFrame([{"Step": 0}])
|
| 23 |
+
mock_results = {"E1": {}}
|
| 24 |
|
| 25 |
+
mocker.patch('app.run_auto_suite', return_value=(mock_summary_df, mock_plot_df, mock_results))
|
| 26 |
+
mocker.patch('app.cleanup_memory')
|
| 27 |
|
| 28 |
+
summary_df, plot_df, raw = run_auto_suite_display(
|
| 29 |
+
"mock", 1, 42, "mock_exp", progress=mocker.MagicMock()
|
|
|
|
| 30 |
)
|
| 31 |
|
| 32 |
+
assert summary_df.equals(mock_summary_df)
|
| 33 |
+
assert plot_df.equals(mock_plot_df)
|
| 34 |
+
assert raw == mock_results
|
|
|
|
|
|
tests/test_components.py
CHANGED
|
@@ -3,20 +3,18 @@ import torch
|
|
| 3 |
import pytest
|
| 4 |
from unittest.mock import patch
|
| 5 |
|
| 6 |
-
from cognitive_mapping_probe.llm_iface import get_or_load_model
|
| 7 |
from cognitive_mapping_probe.resonance_seismograph import run_silent_cogitation_seismic
|
| 8 |
-
from cognitive_mapping_probe.utils import dbg
|
|
|
|
|
|
|
| 9 |
|
| 10 |
# --- Tests for llm_iface.py ---
|
| 11 |
|
| 12 |
@patch('cognitive_mapping_probe.llm_iface.AutoTokenizer.from_pretrained')
|
| 13 |
@patch('cognitive_mapping_probe.llm_iface.AutoModelForCausalLM.from_pretrained')
|
| 14 |
def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, mocker):
|
| 15 |
-
"""
|
| 16 |
-
Testet, ob `get_or_load_model` die Seeds korrekt setzt.
|
| 17 |
-
Wir mocken hier die langsamen `from_pretrained`-Aufrufe.
|
| 18 |
-
"""
|
| 19 |
-
# Mocke die Rückgabewerte der Hugging Face Ladefunktionen
|
| 20 |
mock_model = mocker.MagicMock()
|
| 21 |
mock_model.eval.return_value = None
|
| 22 |
mock_model.set_attn_implementation.return_value = None
|
|
@@ -25,91 +23,75 @@ def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, moc
|
|
| 25 |
mock_model_loader.return_value = mock_model
|
| 26 |
mock_tokenizer_loader.return_value = mocker.MagicMock()
|
| 27 |
|
| 28 |
-
# Mocke die globalen Seeding-Funktionen, um ihre Aufrufe zu überprüfen
|
| 29 |
mock_torch_manual_seed = mocker.patch('torch.manual_seed')
|
| 30 |
mock_np_random_seed = mocker.patch('numpy.random.seed')
|
| 31 |
|
| 32 |
seed = 123
|
| 33 |
get_or_load_model("fake-model", seed=seed)
|
| 34 |
|
| 35 |
-
# ASSERT: Wurden die Seeding-Funktionen mit dem korrekten Seed aufgerufen?
|
| 36 |
mock_torch_manual_seed.assert_called_with(seed)
|
| 37 |
mock_np_random_seed.assert_called_with(seed)
|
| 38 |
|
| 39 |
# --- Tests for resonance_seismograph.py ---
|
| 40 |
|
| 41 |
def test_run_silent_cogitation_seismic_output_shape_and_type(mock_llm):
|
| 42 |
-
"""
|
| 43 |
-
Testet die Kernfunktion `run_silent_cogitation_seismic`.
|
| 44 |
-
ASSERT: Gibt eine Liste von Floats zurück, deren Länge der Anzahl der Schritte entspricht.
|
| 45 |
-
"""
|
| 46 |
num_steps = 10
|
| 47 |
state_deltas = run_silent_cogitation_seismic(
|
| 48 |
-
llm=mock_llm,
|
| 49 |
-
|
| 50 |
-
num_steps=num_steps,
|
| 51 |
-
temperature=0.7
|
| 52 |
)
|
| 53 |
-
|
| 54 |
-
assert isinstance(state_deltas, list)
|
| 55 |
-
assert len(state_deltas) == num_steps
|
| 56 |
assert all(isinstance(delta, float) for delta in state_deltas)
|
| 57 |
-
assert all(delta >= 0 for delta in state_deltas) # Die Norm kann nicht negativ sein
|
| 58 |
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
prompt_type="control_long_prose",
|
| 68 |
-
num_steps=num_steps,
|
| 69 |
-
temperature=0.7
|
| 70 |
)
|
| 71 |
-
assert
|
| 72 |
|
| 73 |
-
# --- Tests for
|
| 74 |
|
| 75 |
-
def
|
| 76 |
"""
|
| 77 |
-
Testet die
|
| 78 |
-
|
| 79 |
"""
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
|
| 89 |
|
| 90 |
-
|
| 91 |
-
assert
|
| 92 |
|
| 93 |
-
|
| 94 |
-
"""
|
| 95 |
-
Testet die `dbg`-Funktion, wenn Debugging deaktiviert ist.
|
| 96 |
-
ASSERT: Es wird keine Ausgabe erzeugt.
|
| 97 |
-
"""
|
| 98 |
-
# Setze die Umgebungsvariable auf "deaktiviert"
|
| 99 |
-
if "CMP_DEBUG" in os.environ:
|
| 100 |
-
del os.environ["CMP_DEBUG"]
|
| 101 |
|
|
|
|
|
|
|
|
|
|
| 102 |
import importlib
|
| 103 |
from cognitive_mapping_probe import utils
|
| 104 |
importlib.reload(utils)
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
-
|
| 107 |
-
|
|
|
|
| 108 |
captured = capsys.readouterr()
|
| 109 |
-
assert captured.out == ""
|
| 110 |
assert captured.err == ""
|
| 111 |
-
|
| 112 |
-
# Setze den Zustand zurück, um andere Tests nicht zu beeinflussen
|
| 113 |
-
if DEBUG_ENABLED:
|
| 114 |
-
os.environ["CMP_DEBUG"] = "1"
|
| 115 |
-
importlib.reload(utils)
|
|
|
|
| 3 |
import pytest
|
| 4 |
from unittest.mock import patch
|
| 5 |
|
| 6 |
+
from cognitive_mapping_probe.llm_iface import get_or_load_model, LLM
|
| 7 |
from cognitive_mapping_probe.resonance_seismograph import run_silent_cogitation_seismic
|
| 8 |
+
from cognitive_mapping_probe.utils import dbg
|
| 9 |
+
# KORREKTUR: Importiere die Hauptfunktion, die wir testen wollen.
|
| 10 |
+
from cognitive_mapping_probe.concepts import get_concept_vector
|
| 11 |
|
| 12 |
# --- Tests for llm_iface.py ---
|
| 13 |
|
| 14 |
@patch('cognitive_mapping_probe.llm_iface.AutoTokenizer.from_pretrained')
|
| 15 |
@patch('cognitive_mapping_probe.llm_iface.AutoModelForCausalLM.from_pretrained')
|
| 16 |
def test_get_or_load_model_seeding(mock_model_loader, mock_tokenizer_loader, mocker):
|
| 17 |
+
"""Testet, ob `get_or_load_model` die Seeds korrekt setzt."""
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
mock_model = mocker.MagicMock()
|
| 19 |
mock_model.eval.return_value = None
|
| 20 |
mock_model.set_attn_implementation.return_value = None
|
|
|
|
| 23 |
mock_model_loader.return_value = mock_model
|
| 24 |
mock_tokenizer_loader.return_value = mocker.MagicMock()
|
| 25 |
|
|
|
|
| 26 |
mock_torch_manual_seed = mocker.patch('torch.manual_seed')
|
| 27 |
mock_np_random_seed = mocker.patch('numpy.random.seed')
|
| 28 |
|
| 29 |
seed = 123
|
| 30 |
get_or_load_model("fake-model", seed=seed)
|
| 31 |
|
|
|
|
| 32 |
mock_torch_manual_seed.assert_called_with(seed)
|
| 33 |
mock_np_random_seed.assert_called_with(seed)
|
| 34 |
|
| 35 |
# --- Tests for resonance_seismograph.py ---
|
| 36 |
|
| 37 |
def test_run_silent_cogitation_seismic_output_shape_and_type(mock_llm):
|
| 38 |
+
"""Testet die grundlegende Funktionalität von `run_silent_cogitation_seismic`."""
|
|
|
|
|
|
|
|
|
|
| 39 |
num_steps = 10
|
| 40 |
state_deltas = run_silent_cogitation_seismic(
|
| 41 |
+
llm=mock_llm, prompt_type="control_long_prose",
|
| 42 |
+
num_steps=num_steps, temperature=0.7
|
|
|
|
|
|
|
| 43 |
)
|
| 44 |
+
assert isinstance(state_deltas, list) and len(state_deltas) == num_steps
|
|
|
|
|
|
|
| 45 |
assert all(isinstance(delta, float) for delta in state_deltas)
|
|
|
|
| 46 |
|
| 47 |
+
def test_run_silent_cogitation_with_injection_hook_usage(mock_llm):
|
| 48 |
+
"""Testet, ob bei einer Injektion der Hook korrekt registriert wird."""
|
| 49 |
+
num_steps = 5
|
| 50 |
+
injection_vector = torch.randn(mock_llm.config.hidden_size)
|
| 51 |
+
run_silent_cogitation_seismic(
|
| 52 |
+
llm=mock_llm, prompt_type="resonance_prompt",
|
| 53 |
+
num_steps=num_steps, temperature=0.7,
|
| 54 |
+
injection_vector=injection_vector, injection_strength=1.0
|
|
|
|
|
|
|
|
|
|
| 55 |
)
|
| 56 |
+
assert mock_llm.model.model.layers[0].register_forward_pre_hook.call_count == num_steps
|
| 57 |
|
| 58 |
+
# --- Tests for concepts.py ---
|
| 59 |
|
| 60 |
+
def test_get_concept_vector_logic(mock_llm, mocker):
|
| 61 |
"""
|
| 62 |
+
Testet die Logik von `get_concept_vector`.
|
| 63 |
+
KORRIGIERT: Patcht nun die refaktorisierte, auf Modulebene befindliche Funktion.
|
| 64 |
"""
|
| 65 |
+
mock_hidden_states = [
|
| 66 |
+
torch.ones(mock_llm.config.hidden_size) * 10,
|
| 67 |
+
torch.ones(mock_llm.config.hidden_size) * 2,
|
| 68 |
+
torch.ones(mock_llm.config.hidden_size) * 4
|
| 69 |
+
]
|
| 70 |
+
# KORREKTUR: Der Patch-Pfad zeigt jetzt auf die korrekte, importierbare Funktion.
|
| 71 |
+
mocker.patch(
|
| 72 |
+
'cognitive_mapping_probe.concepts._get_last_token_hidden_state',
|
| 73 |
+
side_effect=mock_hidden_states
|
| 74 |
+
)
|
| 75 |
|
| 76 |
+
concept_vector = get_concept_vector(mock_llm, "test", baseline_words=["a", "b"])
|
| 77 |
|
| 78 |
+
expected_vector = torch.ones(mock_llm.config.hidden_size) * 7
|
| 79 |
+
assert torch.allclose(concept_vector, expected_vector)
|
| 80 |
|
| 81 |
+
# --- Tests for utils.py ---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
def test_dbg_output(capsys, monkeypatch):
|
| 84 |
+
"""Testet die `dbg`-Funktion in beiden Zuständen."""
|
| 85 |
+
monkeypatch.setenv("CMP_DEBUG", "1")
|
| 86 |
import importlib
|
| 87 |
from cognitive_mapping_probe import utils
|
| 88 |
importlib.reload(utils)
|
| 89 |
+
utils.dbg("test message")
|
| 90 |
+
captured = capsys.readouterr()
|
| 91 |
+
assert "[DEBUG] test message" in captured.err
|
| 92 |
|
| 93 |
+
monkeypatch.delenv("CMP_DEBUG", raising=False)
|
| 94 |
+
importlib.reload(utils)
|
| 95 |
+
utils.dbg("should not be printed")
|
| 96 |
captured = capsys.readouterr()
|
|
|
|
| 97 |
assert captured.err == ""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tests/test_integration.py
DELETED
|
@@ -1,36 +0,0 @@
|
|
| 1 |
-
import pytest
|
| 2 |
-
import pandas as pd
|
| 3 |
-
|
| 4 |
-
# KORREKTUR: Importiere den neuen, korrekten Funktionsnamen
|
| 5 |
-
from app import run_single_analysis_display
|
| 6 |
-
from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
|
| 7 |
-
|
| 8 |
-
def test_end_to_end_with_mock_llm(mock_llm, mocker):
|
| 9 |
-
"""
|
| 10 |
-
Ein End-to-End-Integrationstest, der den gesamten Datenfluss validiert.
|
| 11 |
-
"""
|
| 12 |
-
# 1. Führe den Orchestrator mit dem `mock_llm` aus.
|
| 13 |
-
results = run_seismic_analysis(
|
| 14 |
-
model_id="mock_model",
|
| 15 |
-
prompt_type="control_long_prose",
|
| 16 |
-
seed=42,
|
| 17 |
-
num_steps=5,
|
| 18 |
-
concept_to_inject="test_concept",
|
| 19 |
-
injection_strength=1.0,
|
| 20 |
-
progress_callback=mocker.MagicMock()
|
| 21 |
-
)
|
| 22 |
-
|
| 23 |
-
assert "stats" in results
|
| 24 |
-
assert len(results["state_deltas"]) == 5
|
| 25 |
-
|
| 26 |
-
# 2. Mocke den Orchestrator, um die App-Logik zu testen
|
| 27 |
-
mocker.patch('app.run_seismic_analysis', return_value=results)
|
| 28 |
-
|
| 29 |
-
# 3. Führe die App-Logik (umbenannte Funktion) aus
|
| 30 |
-
_, plot_df, _ = run_single_analysis_display(
|
| 31 |
-
"mock_model", "control_long_prose", 42, 5, "test_concept", 1.0, progress=mocker.MagicMock()
|
| 32 |
-
)
|
| 33 |
-
|
| 34 |
-
assert isinstance(plot_df, pd.DataFrame)
|
| 35 |
-
assert len(plot_df) == 5
|
| 36 |
-
assert "State Change (Delta)" in plot_df.columns
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tests/test_orchestration.py
CHANGED
|
@@ -1,67 +1,74 @@
|
|
| 1 |
-
import
|
| 2 |
import pytest
|
| 3 |
import torch
|
| 4 |
-
from types import SimpleNamespace
|
| 5 |
|
| 6 |
from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
|
|
|
|
| 7 |
|
| 8 |
-
|
| 9 |
-
"""
|
| 10 |
-
Testet den Orchestrator im Baseline-Modus (ohne Injektion).
|
| 11 |
-
"""
|
| 12 |
-
mock_deltas = [1.0, 2.0, 3.0]
|
| 13 |
-
mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=mock_deltas)
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
-
model_id="
|
| 19 |
-
|
| 20 |
-
seed=42,
|
| 21 |
-
num_steps=3,
|
| 22 |
-
concept_to_inject="", # Kein Konzept
|
| 23 |
-
injection_strength=0.0,
|
| 24 |
-
progress_callback=mock_progress
|
| 25 |
)
|
| 26 |
|
| 27 |
-
|
| 28 |
-
mock_run_seismic.
|
| 29 |
-
|
| 30 |
-
assert call_kwargs['injection_vector'] is None
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
mock_deltas = [5.0, 6.0, 7.0]
|
| 40 |
-
mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=mock_deltas)
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
|
|
|
| 44 |
|
| 45 |
-
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
|
| 58 |
-
|
|
|
|
|
|
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
|
|
|
|
|
|
| 65 |
|
| 66 |
-
|
| 67 |
-
assert
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import pandas as pd
|
| 2 |
import pytest
|
| 3 |
import torch
|
|
|
|
| 4 |
|
| 5 |
from cognitive_mapping_probe.orchestrator_seismograph import run_seismic_analysis
|
| 6 |
+
from cognitive_mapping_probe.auto_experiment import run_auto_suite, get_curated_experiments
|
| 7 |
|
| 8 |
+
# --- Tests for orchestrator_seismograph.py ---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
def test_run_seismic_analysis_no_injection(mocker):
|
| 11 |
+
"""Testet den Orchestrator im Baseline-Modus (ohne Injektion)."""
|
| 12 |
+
mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=[1.0])
|
| 13 |
+
mock_get_model = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model')
|
| 14 |
+
mock_get_concept = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector')
|
| 15 |
|
| 16 |
+
run_seismic_analysis(
|
| 17 |
+
model_id="mock", prompt_type="test", seed=42, num_steps=1,
|
| 18 |
+
concept_to_inject="", injection_strength=0.0, progress_callback=mocker.MagicMock()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
)
|
| 20 |
|
| 21 |
+
mock_get_model.assert_called_once()
|
| 22 |
+
mock_run_seismic.assert_called_with(llm=mocker.ANY, prompt_type="test", num_steps=1, temperature=0.1, injection_vector=None, injection_strength=0.0)
|
| 23 |
+
mock_get_concept.assert_not_called()
|
|
|
|
| 24 |
|
| 25 |
+
def test_run_seismic_analysis_with_injection(mocker):
|
| 26 |
+
"""Testet den Orchestrator mit aktivierter Konzeptinjektion."""
|
| 27 |
+
mock_run_seismic = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.run_silent_cogitation_seismic', return_value=[1.0])
|
| 28 |
+
mock_get_model = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_or_load_model')
|
| 29 |
+
mock_get_concept = mocker.patch('cognitive_mapping_probe.orchestrator_seismograph.get_concept_vector', return_value=torch.randn(10))
|
| 30 |
|
| 31 |
+
run_seismic_analysis(
|
| 32 |
+
model_id="mock", prompt_type="test", seed=42, num_steps=1,
|
| 33 |
+
concept_to_inject="test_concept", injection_strength=1.5, progress_callback=mocker.MagicMock()
|
| 34 |
+
)
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
mock_get_model.assert_called_once()
|
| 37 |
+
mock_get_concept.assert_called_once()
|
| 38 |
+
mock_run_seismic.assert_called_with(llm=mocker.ANY, prompt_type="test", num_steps=1, temperature=0.1, injection_vector=mocker.ANY, injection_strength=1.5)
|
| 39 |
|
| 40 |
+
# --- Tests for auto_experiment.py ---
|
| 41 |
|
| 42 |
+
def test_get_curated_experiments_structure():
|
| 43 |
+
"""Testet die Datenstruktur der kuratierten Experimente, inklusive der neuen."""
|
| 44 |
+
experiments = get_curated_experiments()
|
| 45 |
+
assert isinstance(experiments, dict)
|
| 46 |
+
# Teste auf die Existenz der neuen Protokolle
|
| 47 |
+
assert "Subjective Identity Probe" in experiments
|
| 48 |
+
assert "Voight-Kampff Empathy Probe" in experiments
|
| 49 |
+
|
| 50 |
+
protocol = experiments["Voight-Kampff Empathy Probe"]
|
| 51 |
+
assert isinstance(protocol, list)
|
| 52 |
+
assert len(protocol) > 0
|
| 53 |
+
assert all(isinstance(run, dict) for run in protocol)
|
| 54 |
+
assert "label" in protocol[0]
|
| 55 |
+
assert "prompt_type" in protocol[0]
|
| 56 |
|
| 57 |
+
def test_run_auto_suite_logic(mocker):
|
| 58 |
+
"""Testet die Logik der `run_auto_suite` Funktion."""
|
| 59 |
+
mock_analysis_result = {"stats": {"mean_delta": 1.0}, "state_deltas": [1.0]}
|
| 60 |
+
mock_run_analysis = mocker.patch('cognitive_mapping_probe.auto_experiment.run_seismic_analysis', return_value=mock_analysis_result)
|
| 61 |
|
| 62 |
+
experiment_name = "Calm vs. Chaos"
|
| 63 |
+
num_runs = len(get_curated_experiments()[experiment_name])
|
| 64 |
+
|
| 65 |
+
summary_df, plot_df, all_results = run_auto_suite(
|
| 66 |
+
model_id="mock", num_steps=1, seed=42,
|
| 67 |
+
experiment_name=experiment_name, progress_callback=mocker.MagicMock()
|
| 68 |
+
)
|
| 69 |
|
| 70 |
+
assert mock_run_analysis.call_count == num_runs
|
| 71 |
+
assert isinstance(summary_df, pd.DataFrame)
|
| 72 |
+
assert len(summary_df) == num_runs
|
| 73 |
+
assert isinstance(plot_df, pd.DataFrame)
|
| 74 |
+
assert len(plot_df) == num_runs
|