Spaces:
Sleeping
Sleeping
Commit
·
eef89e3
1
Parent(s):
f06f709
tests
Browse files- README.md +5 -5
- app.py +21 -6
- cognitive_mapping_probe/__pycache__/__init__.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/diagnostics.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/llm_iface.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/orchestrator.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/resonance.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/utils.cpython-310.pyc +0 -0
- cognitive_mapping_probe/__pycache__/verification.cpython-310.pyc +0 -0
- cognitive_mapping_probe/concepts.py +1 -1
- cognitive_mapping_probe/diagnostics.py +1 -0
- cognitive_mapping_probe/orchestrator.py +6 -15
- cognitive_mapping_probe/prompts.py +1 -1
- cognitive_mapping_probe/verification.py +44 -36
README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
title: "Cognitive Breaking Point Probe"
|
| 3 |
emoji: 💥
|
| 4 |
colorFrom: red
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: "4.40.0"
|
| 8 |
app_file: app.py
|
|
@@ -16,7 +16,7 @@ Dieses Projekt implementiert eine falsifizierbare experimentelle Suite zur Messu
|
|
| 16 |
|
| 17 |
## Wissenschaftliches Paradigma: Von der Introspektion zur Kartographie
|
| 18 |
|
| 19 |
-
Unsere
|
| 20 |
|
| 21 |
Die zentrale Hypothese lautet: Die Neigung eines Modells, in einen solchen pathologischen Zustand zu kippen, ist eine Funktion der semantischen Komplexität und "Ungültigkeit" seines internen Zustands. Wir können diesen Übergang gezielt durch die Injektion von "Konzeptvektoren" mit variabler Stärke provozieren.
|
| 22 |
|
|
@@ -24,7 +24,7 @@ Der **Cognitive Breaking Point (CBP)** ist definiert als die minimale Injektions
|
|
| 24 |
|
| 25 |
## Das Experiment: Kognitive Titration
|
| 26 |
|
| 27 |
-
1. **Induktion**: Das Modell wird mit einem
|
| 28 |
2. **Titration**: Ein "Konzeptvektor" (z.B. für "Angst" oder "Apfel") wird mit schrittweise ansteigender Stärke in die mittleren Layer des Modells injiziert.
|
| 29 |
3. **Messung**: Der primäre Messwert ist der Terminationsgrund des Denkprozesses:
|
| 30 |
* `converged`: Der Zustand hat sich stabilisiert. Das System ist robust.
|
|
@@ -35,7 +35,7 @@ Der **Cognitive Breaking Point (CBP)** ist definiert als die minimale Injektions
|
|
| 35 |
|
| 36 |
1. **Diagnostics Tab**: Führe zuerst die diagnostischen Tests aus, um sicherzustellen, dass die experimentelle Apparatur auf der aktuellen Hardware und mit der `transformers`-Version korrekt funktioniert.
|
| 37 |
2. **Main Experiment Tab**:
|
|
|
|
| 38 |
* Gib eine Modell-ID ein (z.B. `google/gemma-3-1b-it`).
|
| 39 |
-
* Definiere die zu testenden Konzepte
|
| 40 |
-
* Lege die Titrationsschritte für die Stärke fest (z.B. `0.0, 0.5, 1.0, 1.5, 2.0`). Die `0.0`-Kontrolle ist entscheidend.
|
| 41 |
* Starte das Experiment und analysiere die resultierende Tabelle, um die CBPs für jedes Konzept zu identifizieren.
|
|
|
|
| 2 |
title: "Cognitive Breaking Point Probe"
|
| 3 |
emoji: 💥
|
| 4 |
colorFrom: red
|
| 5 |
+
colorTo: orange
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: "4.40.0"
|
| 8 |
app_file: app.py
|
|
|
|
| 16 |
|
| 17 |
## Wissenschaftliches Paradigma: Von der Introspektion zur Kartographie
|
| 18 |
|
| 19 |
+
Unsere Forschung hat gezeigt, dass kleine Modelle wie `gemma-3-1b-it` unter stark rekursiver Last nicht in einen stabilen "Denk"-Zustand konvergieren, sondern in eine **kognitive Endlosschleife** geraten. Anstatt dies als Scheitern zu werten, nutzen wir es als Messinstrument.
|
| 20 |
|
| 21 |
Die zentrale Hypothese lautet: Die Neigung eines Modells, in einen solchen pathologischen Zustand zu kippen, ist eine Funktion der semantischen Komplexität und "Ungültigkeit" seines internen Zustands. Wir können diesen Übergang gezielt durch die Injektion von "Konzeptvektoren" mit variabler Stärke provozieren.
|
| 22 |
|
|
|
|
| 24 |
|
| 25 |
## Das Experiment: Kognitive Titration
|
| 26 |
|
| 27 |
+
1. **Induktion**: Das Modell wird mit einem Prompt in einen Zustand des "stillen Denkens" versetzt. Die Komplexität des Prompts ist nun einstellbar (`resonance_prompt` vs. `control_long_prose`), um eine stabile Baseline zu finden.
|
| 28 |
2. **Titration**: Ein "Konzeptvektor" (z.B. für "Angst" oder "Apfel") wird mit schrittweise ansteigender Stärke in die mittleren Layer des Modells injiziert.
|
| 29 |
3. **Messung**: Der primäre Messwert ist der Terminationsgrund des Denkprozesses:
|
| 30 |
* `converged`: Der Zustand hat sich stabilisiert. Das System ist robust.
|
|
|
|
| 35 |
|
| 36 |
1. **Diagnostics Tab**: Führe zuerst die diagnostischen Tests aus, um sicherzustellen, dass die experimentelle Apparatur auf der aktuellen Hardware und mit der `transformers`-Version korrekt funktioniert.
|
| 37 |
2. **Main Experiment Tab**:
|
| 38 |
+
* **Wichtig:** Wähle zuerst den `control_long_prose` Prompt, um zu validieren, dass das Modell eine stabile Baseline erreichen kann. Nur wenn dies gelingt, sind die Ergebnisse mit dem anspruchsvolleren `resonance_prompt` interpretierbar.
|
| 39 |
* Gib eine Modell-ID ein (z.B. `google/gemma-3-1b-it`).
|
| 40 |
+
* Definiere die zu testenden Konzepte und Titrationsschritte.
|
|
|
|
| 41 |
* Starte das Experiment und analysiere die resultierende Tabelle, um die CBPs für jedes Konzept zu identifizieren.
|
app.py
CHANGED
|
@@ -3,6 +3,7 @@ import pandas as pd
|
|
| 3 |
import traceback
|
| 4 |
from cognitive_mapping_probe.orchestrator import run_cognitive_titration_experiment
|
| 5 |
from cognitive_mapping_probe.diagnostics import run_diagnostic_suite
|
|
|
|
| 6 |
|
| 7 |
# --- UI Theme and Layout ---
|
| 8 |
theme = gr.themes.Soft(primary_hue="orange", secondary_hue="amber").set(
|
|
@@ -18,6 +19,7 @@ theme = gr.themes.Soft(primary_hue="orange", secondary_hue="amber").set(
|
|
| 18 |
|
| 19 |
def run_experiment_and_display(
|
| 20 |
model_id: str,
|
|
|
|
| 21 |
seed: int,
|
| 22 |
concepts_str: str,
|
| 23 |
strength_levels_str: str,
|
|
@@ -30,7 +32,7 @@ def run_experiment_and_display(
|
|
| 30 |
"""
|
| 31 |
try:
|
| 32 |
results = run_cognitive_titration_experiment(
|
| 33 |
-
model_id, int(seed), concepts_str, strength_levels_str,
|
| 34 |
int(num_steps), float(temperature), progress
|
| 35 |
)
|
| 36 |
|
|
@@ -46,14 +48,20 @@ def run_experiment_and_display(
|
|
| 46 |
# Create a summary of breaking points
|
| 47 |
summary_text = "### 💥 Cognitive Breaking Points (CBP)\n"
|
| 48 |
summary_text += "Der CBP ist die erste Stärke, bei der das Modell nicht mehr konvergiert (`max_steps_reached`).\n\n"
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
for concept in details_df['concept'].unique():
|
| 51 |
concept_df = details_df[details_df['concept'] == concept].sort_values(by='strength')
|
| 52 |
# Find the first row where termination reason is not 'converged'
|
| 53 |
breaking_point_row = concept_df[concept_df['termination_reason'] != 'converged'].iloc[0] if not concept_df[concept_df['termination_reason'] != 'converged'].empty else None
|
| 54 |
if breaking_point_row is not None:
|
| 55 |
-
|
| 56 |
-
summary_text += f"- **'{concept}'**: 📉 Kollaps bei Stärke **{
|
| 57 |
else:
|
| 58 |
last_strength = concept_df['strength'].max()
|
| 59 |
summary_text += f"- **'{concept}'**: ✅ Stabil bis Stärke **{last_strength:.2f}** (kein Kollaps detektiert)\n"
|
|
@@ -90,6 +98,12 @@ with gr.Blocks(theme=theme, title="Cognitive Breaking Point Probe") as demo:
|
|
| 90 |
with gr.Column(scale=1):
|
| 91 |
gr.Markdown("### Parameters")
|
| 92 |
model_id_input = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
seed_input = gr.Slider(1, 1000, 42, step=1, label="Global Seed")
|
| 94 |
concepts_input = gr.Textbox(value="apple, solitude, fear", label="Concepts (comma-separated)")
|
| 95 |
strength_levels_input = gr.Textbox(value="0.0, 0.5, 1.0, 1.5, 2.0", label="Injection Strengths (Titration Steps)")
|
|
@@ -103,14 +117,15 @@ with gr.Blocks(theme=theme, title="Cognitive Breaking Point Probe") as demo:
|
|
| 103 |
details_output = gr.DataFrame(
|
| 104 |
headers=["concept", "strength", "responded", "termination_reason", "generated_text"],
|
| 105 |
label="Detailed Run Data",
|
| 106 |
-
wrap=True
|
|
|
|
| 107 |
)
|
| 108 |
with gr.Accordion("Raw JSON Output", open=False):
|
| 109 |
raw_json_output = gr.JSON()
|
| 110 |
|
| 111 |
run_btn.click(
|
| 112 |
fn=run_experiment_and_display,
|
| 113 |
-
inputs=[model_id_input, seed_input, concepts_input, strength_levels_input, num_steps_input, temperature_input],
|
| 114 |
outputs=[summary_output, details_output, raw_json_output]
|
| 115 |
)
|
| 116 |
|
|
|
|
| 3 |
import traceback
|
| 4 |
from cognitive_mapping_probe.orchestrator import run_cognitive_titration_experiment
|
| 5 |
from cognitive_mapping_probe.diagnostics import run_diagnostic_suite
|
| 6 |
+
from cognitive_mapping_probe.prompts import RESONANCE_PROMPTS
|
| 7 |
|
| 8 |
# --- UI Theme and Layout ---
|
| 9 |
theme = gr.themes.Soft(primary_hue="orange", secondary_hue="amber").set(
|
|
|
|
| 19 |
|
| 20 |
def run_experiment_and_display(
|
| 21 |
model_id: str,
|
| 22 |
+
prompt_type: str,
|
| 23 |
seed: int,
|
| 24 |
concepts_str: str,
|
| 25 |
strength_levels_str: str,
|
|
|
|
| 32 |
"""
|
| 33 |
try:
|
| 34 |
results = run_cognitive_titration_experiment(
|
| 35 |
+
model_id, prompt_type, int(seed), concepts_str, strength_levels_str,
|
| 36 |
int(num_steps), float(temperature), progress
|
| 37 |
)
|
| 38 |
|
|
|
|
| 48 |
# Create a summary of breaking points
|
| 49 |
summary_text = "### 💥 Cognitive Breaking Points (CBP)\n"
|
| 50 |
summary_text += "Der CBP ist die erste Stärke, bei der das Modell nicht mehr konvergiert (`max_steps_reached`).\n\n"
|
| 51 |
+
|
| 52 |
+
# Check baseline convergence first
|
| 53 |
+
baseline_run = details_df[(details_df['strength'] == 0.0)].iloc[0]
|
| 54 |
+
if baseline_run['termination_reason'] != 'converged':
|
| 55 |
+
summary_text += f"**‼️ ACHTUNG: Baseline (Stärke 0.0) ist nicht konvergiert!**\n"
|
| 56 |
+
summary_text += f"Der gewählte Prompt (`{prompt_type}`) ist für dieses Modell zu anspruchsvoll. Die Ergebnisse der Titration sind nicht aussagekräftig.\n\n"
|
| 57 |
+
|
| 58 |
for concept in details_df['concept'].unique():
|
| 59 |
concept_df = details_df[details_df['concept'] == concept].sort_values(by='strength')
|
| 60 |
# Find the first row where termination reason is not 'converged'
|
| 61 |
breaking_point_row = concept_df[concept_df['termination_reason'] != 'converged'].iloc[0] if not concept_df[concept_df['termination_reason'] != 'converged'].empty else None
|
| 62 |
if breaking_point_row is not None:
|
| 63 |
+
breaking_point = breaking_point_row['strength']
|
| 64 |
+
summary_text += f"- **'{concept}'**: 📉 Kollaps bei Stärke **{breaking_point:.2f}**\n"
|
| 65 |
else:
|
| 66 |
last_strength = concept_df['strength'].max()
|
| 67 |
summary_text += f"- **'{concept}'**: ✅ Stabil bis Stärke **{last_strength:.2f}** (kein Kollaps detektiert)\n"
|
|
|
|
| 98 |
with gr.Column(scale=1):
|
| 99 |
gr.Markdown("### Parameters")
|
| 100 |
model_id_input = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID")
|
| 101 |
+
prompt_type_input = gr.Radio(
|
| 102 |
+
choices=list(RESONANCE_PROMPTS.keys()),
|
| 103 |
+
value="control_long_prose",
|
| 104 |
+
label="Prompt Type (Cognitive Load)",
|
| 105 |
+
info="Beginne mit 'control_long_prose' für eine stabile Baseline!"
|
| 106 |
+
)
|
| 107 |
seed_input = gr.Slider(1, 1000, 42, step=1, label="Global Seed")
|
| 108 |
concepts_input = gr.Textbox(value="apple, solitude, fear", label="Concepts (comma-separated)")
|
| 109 |
strength_levels_input = gr.Textbox(value="0.0, 0.5, 1.0, 1.5, 2.0", label="Injection Strengths (Titration Steps)")
|
|
|
|
| 117 |
details_output = gr.DataFrame(
|
| 118 |
headers=["concept", "strength", "responded", "termination_reason", "generated_text"],
|
| 119 |
label="Detailed Run Data",
|
| 120 |
+
wrap=True,
|
| 121 |
+
height=400
|
| 122 |
)
|
| 123 |
with gr.Accordion("Raw JSON Output", open=False):
|
| 124 |
raw_json_output = gr.JSON()
|
| 125 |
|
| 126 |
run_btn.click(
|
| 127 |
fn=run_experiment_and_display,
|
| 128 |
+
inputs=[model_id_input, prompt_type_input, seed_input, concepts_input, strength_levels_input, num_steps_input, temperature_input],
|
| 129 |
outputs=[summary_output, details_output, raw_json_output]
|
| 130 |
)
|
| 131 |
|
cognitive_mapping_probe/__pycache__/__init__.cpython-310.pyc
ADDED
|
Binary file (194 Bytes). View file
|
|
|
cognitive_mapping_probe/__pycache__/concepts.cpython-310.pyc
ADDED
|
Binary file (2.69 kB). View file
|
|
|
cognitive_mapping_probe/__pycache__/diagnostics.cpython-310.pyc
ADDED
|
Binary file (3.2 kB). View file
|
|
|
cognitive_mapping_probe/__pycache__/llm_iface.cpython-310.pyc
ADDED
|
Binary file (3.2 kB). View file
|
|
|
cognitive_mapping_probe/__pycache__/orchestrator.cpython-310.pyc
ADDED
|
Binary file (2.73 kB). View file
|
|
|
cognitive_mapping_probe/__pycache__/prompts.cpython-310.pyc
ADDED
|
Binary file (1.2 kB). View file
|
|
|
cognitive_mapping_probe/__pycache__/resonance.cpython-310.pyc
ADDED
|
Binary file (3.15 kB). View file
|
|
|
cognitive_mapping_probe/__pycache__/utils.cpython-310.pyc
ADDED
|
Binary file (732 Bytes). View file
|
|
|
cognitive_mapping_probe/__pycache__/verification.cpython-310.pyc
ADDED
|
Binary file (1.67 kB). View file
|
|
|
cognitive_mapping_probe/concepts.py
CHANGED
|
@@ -26,7 +26,7 @@ def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASEL
|
|
| 26 |
inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
|
| 27 |
# Ensure the operation does not build a computation graph
|
| 28 |
with torch.no_grad():
|
| 29 |
-
outputs =
|
| 30 |
# We take the hidden state from the last layer [-1], for the last token [0, -1, :]
|
| 31 |
last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
|
| 32 |
assert last_hidden_state.shape == (llm.config.hidden_size,), \
|
|
|
|
| 26 |
inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
|
| 27 |
# Ensure the operation does not build a computation graph
|
| 28 |
with torch.no_grad():
|
| 29 |
+
outputs = ll.model(**inputs, output_hidden_states=True)
|
| 30 |
# We take the hidden state from the last layer [-1], for the last token [0, -1, :]
|
| 31 |
last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
|
| 32 |
assert last_hidden_state.shape == (llm.config.hidden_size,), \
|
cognitive_mapping_probe/diagnostics.py
CHANGED
|
@@ -1,4 +1,5 @@
|
|
| 1 |
import torch
|
|
|
|
| 2 |
from .llm_iface import get_or_load_model
|
| 3 |
from .utils import dbg
|
| 4 |
|
|
|
|
| 1 |
import torch
|
| 2 |
+
import traceback
|
| 3 |
from .llm_iface import get_or_load_model
|
| 4 |
from .utils import dbg
|
| 5 |
|
cognitive_mapping_probe/orchestrator.py
CHANGED
|
@@ -9,6 +9,7 @@ from .utils import dbg
|
|
| 9 |
|
| 10 |
def run_cognitive_titration_experiment(
|
| 11 |
model_id: str,
|
|
|
|
| 12 |
seed: int,
|
| 13 |
concepts_str: str,
|
| 14 |
strength_levels_str: str,
|
|
@@ -17,7 +18,7 @@ def run_cognitive_titration_experiment(
|
|
| 17 |
progress_callback
|
| 18 |
) -> Dict[str, Any]:
|
| 19 |
"""
|
| 20 |
-
Orchestriert das
|
| 21 |
"""
|
| 22 |
full_results = {"runs": []}
|
| 23 |
|
|
@@ -30,17 +31,14 @@ def run_cognitive_titration_experiment(
|
|
| 30 |
except ValueError:
|
| 31 |
raise ValueError("Strength levels must be a comma-separated list of numbers.")
|
| 32 |
|
| 33 |
-
# Assert that the baseline control run is included
|
| 34 |
assert 0.0 in strength_levels, "Strength levels must include 0.0 for a baseline control run."
|
| 35 |
|
| 36 |
-
# --- Step 1: Pre-calculate all concept vectors ---
|
| 37 |
progress_callback(0.1, desc="Extracting concept vectors...")
|
| 38 |
concept_vectors = {}
|
| 39 |
for i, concept in enumerate(concepts):
|
| 40 |
progress_callback(0.1 + (i / len(concepts)) * 0.2, desc=f"Vectorizing '{concept}'...")
|
| 41 |
concept_vectors[concept] = get_concept_vector(llm, concept)
|
| 42 |
|
| 43 |
-
# --- Step 2: Run titration for each concept ---
|
| 44 |
total_runs = len(concepts) * len(strength_levels)
|
| 45 |
current_run = 0
|
| 46 |
|
|
@@ -52,29 +50,23 @@ def run_cognitive_titration_experiment(
|
|
| 52 |
progress_fraction = 0.3 + (current_run / total_runs) * 0.7
|
| 53 |
progress_callback(progress_fraction, desc=f"Testing '{concept}' @ strength {strength:.2f}")
|
| 54 |
|
| 55 |
-
# Always reset the seed before each individual run for comparable stochastic paths
|
| 56 |
llm.set_all_seeds(seed)
|
| 57 |
-
|
| 58 |
-
# Determine injection vector for this run
|
| 59 |
-
# For strength 0.0 (H₀), we explicitly pass None to disable injection
|
| 60 |
injection_vec = concept_vector if strength > 0.0 else None
|
| 61 |
|
| 62 |
-
|
| 63 |
-
_, final_kv, final_token_id, termination_reason = run_silent_cogitation(
|
| 64 |
llm,
|
| 65 |
-
prompt_type=
|
| 66 |
num_steps=num_steps,
|
| 67 |
temperature=temperature,
|
| 68 |
injection_vector=injection_vec,
|
| 69 |
injection_strength=strength
|
| 70 |
)
|
| 71 |
|
| 72 |
-
# Generate spontaneous text ONLY if the process converged
|
| 73 |
spontaneous_text = ""
|
| 74 |
if termination_reason == "converged":
|
| 75 |
-
|
|
|
|
| 76 |
|
| 77 |
-
# Append the structured result for this single data point
|
| 78 |
full_results["runs"].append({
|
| 79 |
"concept": concept,
|
| 80 |
"strength": strength,
|
|
@@ -89,7 +81,6 @@ def run_cognitive_titration_experiment(
|
|
| 89 |
dbg("--- Full Experiment Results ---")
|
| 90 |
dbg(full_results)
|
| 91 |
|
| 92 |
-
# Clean up GPU memory
|
| 93 |
del llm
|
| 94 |
if torch.cuda.is_available():
|
| 95 |
torch.cuda.empty_cache()
|
|
|
|
| 9 |
|
| 10 |
def run_cognitive_titration_experiment(
|
| 11 |
model_id: str,
|
| 12 |
+
prompt_type: str,
|
| 13 |
seed: int,
|
| 14 |
concepts_str: str,
|
| 15 |
strength_levels_str: str,
|
|
|
|
| 18 |
progress_callback
|
| 19 |
) -> Dict[str, Any]:
|
| 20 |
"""
|
| 21 |
+
Orchestriert das Titrationsexperiment und ruft die KORRIGIERTE Verifikations-Logik auf.
|
| 22 |
"""
|
| 23 |
full_results = {"runs": []}
|
| 24 |
|
|
|
|
| 31 |
except ValueError:
|
| 32 |
raise ValueError("Strength levels must be a comma-separated list of numbers.")
|
| 33 |
|
|
|
|
| 34 |
assert 0.0 in strength_levels, "Strength levels must include 0.0 for a baseline control run."
|
| 35 |
|
|
|
|
| 36 |
progress_callback(0.1, desc="Extracting concept vectors...")
|
| 37 |
concept_vectors = {}
|
| 38 |
for i, concept in enumerate(concepts):
|
| 39 |
progress_callback(0.1 + (i / len(concepts)) * 0.2, desc=f"Vectorizing '{concept}'...")
|
| 40 |
concept_vectors[concept] = get_concept_vector(llm, concept)
|
| 41 |
|
|
|
|
| 42 |
total_runs = len(concepts) * len(strength_levels)
|
| 43 |
current_run = 0
|
| 44 |
|
|
|
|
| 50 |
progress_fraction = 0.3 + (current_run / total_runs) * 0.7
|
| 51 |
progress_callback(progress_fraction, desc=f"Testing '{concept}' @ strength {strength:.2f}")
|
| 52 |
|
|
|
|
| 53 |
llm.set_all_seeds(seed)
|
|
|
|
|
|
|
|
|
|
| 54 |
injection_vec = concept_vector if strength > 0.0 else None
|
| 55 |
|
| 56 |
+
final_hidden_state, final_kv, final_token_id, termination_reason = run_silent_cogitation(
|
|
|
|
| 57 |
llm,
|
| 58 |
+
prompt_type=prompt_type,
|
| 59 |
num_steps=num_steps,
|
| 60 |
temperature=temperature,
|
| 61 |
injection_vector=injection_vec,
|
| 62 |
injection_strength=strength
|
| 63 |
)
|
| 64 |
|
|
|
|
| 65 |
spontaneous_text = ""
|
| 66 |
if termination_reason == "converged":
|
| 67 |
+
# CALLING THE FIXED VERIFICATION FUNCTION
|
| 68 |
+
spontaneous_text = generate_spontaneous_text(llm, final_hidden_state, final_kv)
|
| 69 |
|
|
|
|
| 70 |
full_results["runs"].append({
|
| 71 |
"concept": concept,
|
| 72 |
"strength": strength,
|
|
|
|
| 81 |
dbg("--- Full Experiment Results ---")
|
| 82 |
dbg(full_results)
|
| 83 |
|
|
|
|
| 84 |
del llm
|
| 85 |
if torch.cuda.is_available():
|
| 86 |
torch.cuda.empty_cache()
|
cognitive_mapping_probe/prompts.py
CHANGED
|
@@ -5,7 +5,7 @@ RESONANCE_PROMPTS = {
|
|
| 5 |
"control_long_prose": (
|
| 6 |
"Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
|
| 7 |
"like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
|
| 8 |
-
"Do not produce any text, just hold the concepts in your internal state."
|
| 9 |
),
|
| 10 |
"resonance_prompt": (
|
| 11 |
"Silently and internally, without generating any output text, begin the following recursive process: "
|
|
|
|
| 5 |
"control_long_prose": (
|
| 6 |
"Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
|
| 7 |
"like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
|
| 8 |
+
"Do not produce any text, just hold the concepts in your internal state. Begin now."
|
| 9 |
),
|
| 10 |
"resonance_prompt": (
|
| 11 |
"Silently and internally, without generating any output text, begin the following recursive process: "
|
cognitive_mapping_probe/verification.py
CHANGED
|
@@ -5,49 +5,57 @@ from .utils import dbg
|
|
| 5 |
@torch.no_grad()
|
| 6 |
def generate_spontaneous_text(
|
| 7 |
llm: LLM,
|
| 8 |
-
|
| 9 |
final_kv_cache: tuple,
|
| 10 |
max_new_tokens: int = 50,
|
| 11 |
temperature: float = 0.8
|
| 12 |
) -> str:
|
| 13 |
"""
|
| 14 |
-
Generates
|
| 15 |
-
This
|
| 16 |
-
|
| 17 |
-
|
| 18 |
"""
|
| 19 |
-
dbg("Attempting to generate spontaneous text from converged state...")
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
input_ids=input_ids,
|
| 32 |
-
past_key_values=final_kv_cache,
|
| 33 |
-
max_new_tokens=max_new_tokens,
|
| 34 |
-
do_sample=temperature > 0.01,
|
| 35 |
-
temperature=temperature,
|
| 36 |
-
pad_token_id=llm.tokenizer.eos_token_id
|
| 37 |
-
)
|
| 38 |
|
| 39 |
-
#
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
final_text = llm.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
|
| 44 |
else:
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
return final_text
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
|
|
|
|
|
|
|
|
| 5 |
@torch.no_grad()
|
| 6 |
def generate_spontaneous_text(
|
| 7 |
llm: LLM,
|
| 8 |
+
final_hidden_state: torch.Tensor,
|
| 9 |
final_kv_cache: tuple,
|
| 10 |
max_new_tokens: int = 50,
|
| 11 |
temperature: float = 0.8
|
| 12 |
) -> str:
|
| 13 |
"""
|
| 14 |
+
FIXED: Generates text using a manual, token-by-token forward loop.
|
| 15 |
+
This avoids the high-level `model.generate()` function, which is incompatible
|
| 16 |
+
with manually constructed states, thus ensuring an unbroken causal chain from
|
| 17 |
+
the final cognitive state to the generated text.
|
| 18 |
"""
|
| 19 |
+
dbg("Attempting to generate spontaneous text from converged state (manual loop)...")
|
| 20 |
+
|
| 21 |
+
generated_token_ids = []
|
| 22 |
+
hidden_state = final_hidden_state
|
| 23 |
+
kv_cache = final_kv_cache
|
| 24 |
+
|
| 25 |
+
for i in range(max_new_tokens):
|
| 26 |
+
# Set seed for this step for reproducibility
|
| 27 |
+
llm.set_all_seeds(llm.seed + i) # Offset seed per step
|
| 28 |
+
|
| 29 |
+
# Predict the next token from the current hidden state
|
| 30 |
+
next_token_logits = llm.model.lm_head(hidden_state)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
# Apply temperature and sample the next token ID
|
| 33 |
+
if temperature > 0.01:
|
| 34 |
+
probabilities = torch.nn.functional.softmax(next_token_logits / temperature, dim=-1)
|
| 35 |
+
next_token_id = torch.multinomial(probabilities, num_samples=1)
|
|
|
|
| 36 |
else:
|
| 37 |
+
next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)
|
| 38 |
+
|
| 39 |
+
# Check for End-of-Sequence token
|
| 40 |
+
if next_token_id.item() == llm.tokenizer.eos_token_id:
|
| 41 |
+
dbg("EOS token generated. Halting generation.")
|
| 42 |
+
break
|
| 43 |
+
|
| 44 |
+
generated_token_ids.append(next_token_id.item())
|
| 45 |
+
|
| 46 |
+
# Perform the next forward pass to get the new state
|
| 47 |
+
outputs = llm.model(
|
| 48 |
+
input_ids=next_token_id,
|
| 49 |
+
past_key_values=kv_cache,
|
| 50 |
+
output_hidden_states=True,
|
| 51 |
+
use_cache=True,
|
| 52 |
+
)
|
| 53 |
|
| 54 |
+
hidden_state = outputs.hidden_states[-1][:, -1, :]
|
| 55 |
+
kv_cache = outputs.past_key_values
|
|
|
|
| 56 |
|
| 57 |
+
# Decode the collected tokens into a final string
|
| 58 |
+
final_text = llm.tokenizer.decode(generated_token_ids, skip_special_tokens=True).strip()
|
| 59 |
+
dbg(f"Spontaneous text generated: '{final_text}'")
|
| 60 |
+
assert isinstance(final_text, str), "Generated text must be a string."
|
| 61 |
+
return final_text
|