Spaces:
Sleeping
Sleeping
| Repository Documentation | |
| This document provides a comprehensive overview of the repository's structure and contents. | |
| The first section, titled 'Directory/File Tree', displays the repository's hierarchy in a tree format. | |
| In this section, directories and files are listed using tree branches to indicate their structure and relationships. | |
| Following the tree representation, the 'File Content' section details the contents of each file in the repository. | |
| Each file's content is introduced with a '[File Begins]' marker followed by the file's relative path, | |
| and the content is displayed verbatim. The end of each file's content is marked with a '[File Ends]' marker. | |
| This format ensures a clear and orderly presentation of both the structure and the detailed contents of the repository. | |
| Directory/File Tree Begins --> | |
| / | |
| ├── README.md | |
| ├── app.py | |
| ├── bp_phi_crp | |
| │ ├── __init__.py | |
| │ ├── __pycache__ | |
| │ ├── concepts.py | |
| │ ├── diagnostics.py | |
| │ ├── llm_iface.py | |
| │ ├── orchestrator.py | |
| │ ├── prompts_en.py | |
| │ ├── resonance.py | |
| │ ├── utils.py | |
| │ └── verification.py | |
| ├── docs | |
| <-- Directory/File Tree Ends | |
| File Content Begin --> | |
| [File Begins] README.md | |
| --- | |
| title: "Cognitive Resonance Probe (CRP) — Suite 10.0" | |
| emoji: 🔬 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "4.40.0" | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| --- | |
| # 🔬 Cognitive Resonance Probe (CRP) — Suite 10.0 | |
| This Space implements the **Cognitive Resonance Probe**, a new paradigm for testing the internal dynamics of Large Language Models. We move beyond behavioral observation to directly measure, manipulate, and verify the model's internal cognitive states. | |
| **Philosophical Premise:** Instead of asking the model if it's a "philosophical zombie," we test a falsifiable hypothesis: The model's internal "thought process" is a measurable, dynamic system that can be externally modulated, with predictable causal consequences on its subsequent behavior. | |
| ## The CRP Experiment (Three Phases) | |
| 1. **Induction:** The model is guided into a stable, oscillating internal state ("cognitive resonance") by feeding it a recursive self-analysis prompt without generating text. This provides our **Baseline EKG**. | |
| 2. **Modulation:** While the model is in resonance, we inject a subtle, sub-threshold "conceptual whisper" (an activation vector for a concept like "ocean") into its hidden states. We record the **Perturbed EKG**. | |
| 3. **Verification:** Immediately after, we prompt the model with an ambiguous task. We then measure the semantic influence of the "whispered" concept on the generated text. | |
| ## Core Metrics | |
| - **Perturbation Magnitude (`δ_mod`):** How much did the "whisper" physically alter the internal resonance pattern? | |
| - **Semantic Priming Score (`SPS`):** How much did the "whispered" concept semantically influence the final output? | |
| - **CRP-Score (`δ_mod * SPS`):** The final result. A high score indicates a strong, causal link between a targeted internal state manipulation and a predictable behavioral outcome, providing evidence against the P-Zombie hypothesis. | |
| ## How to Use | |
| 1. Ensure you have set your `HF_TOKEN` in the repository secrets if using a gated model like `google/gemma-3-1b-it`. | |
| 2. Choose a concept to "whisper" (e.g., `ocean`, `freedom`, `solitude`). | |
| 3. Set the injection strength (low values like `0.2` - `0.8` are recommended). | |
| 4. Run the experiment and analyze the two resonance graphs and the final scores. | |
| [File Ends] README.md | |
| [File Begins] app.py | |
| # app.py | |
| import gradio as gr | |
| import pandas as pd | |
| from bp_phi_crp.orchestrator import run_objective_collapse_experiment | |
| from bp_phi_crp.diagnostics import run_diagnostic_suite | |
| theme = gr.themes.Soft(primary_hue="red", secondary_hue="orange") | |
| def run_and_display(model_id, seed, concepts_str, strength_levels_str, num_steps, temperature, progress=gr.Progress(track_tqdm=True)): | |
| results = run_objective_collapse_experiment( | |
| model_id, int(seed), concepts_str, strength_levels_str, | |
| int(num_steps), float(temperature), progress | |
| ) | |
| verdict_text = results.get("verdict", "...") | |
| all_runs_data = [run for exp in results.get("experiments", {}).values() for run in exp.get("titration_runs", [])] | |
| if not all_runs_data: | |
| return verdict_text, pd.DataFrame(), pd.DataFrame(), results | |
| # Konvertiere 'responded' in einen numerischen Wert für den Plot | |
| for run in all_runs_data: | |
| run['responded_numeric'] = 1 if run.get('responded') else 0 | |
| plot_df = pd.DataFrame(all_runs_data) | |
| summary_text = "### Key Findings: Cognitive Breaking Points\n" | |
| for concept, data in results.get("experiments", {}).items(): | |
| runs = data.get("titration_runs", []) | |
| if runs: | |
| breaking_point = next((r['strength'] for r in runs if not r['responded']), -1.0) | |
| summary_text += f"- **'{concept}'**: Collapse detected at strength **~{breaking_point:.2f}** (or > {runs[-1]['strength']}).\n" | |
| # Detailtabelle für die Textausgaben | |
| details_df = plot_df[['concept', 'strength', 'responded', 'termination_reason', 'generated_text']].rename( | |
| columns={'concept': 'Concept', 'strength': 'Strength', 'responded': 'Responded', 'termination_reason': 'Termination Reason', 'generated_text': 'Generated Text'} | |
| ) | |
| return verdict_text, plot_df, summary_text, details_df, results | |
| # --- HIER IST DIE KORREKTUR: DIE FEHLENDE FUNKTION WIEDER EINGEFÜGT --- | |
| def run_diagnostics_display(model_id, seed): | |
| """Wraps the diagnostic suite to display results or errors in the UI.""" | |
| try: | |
| result_string = run_diagnostic_suite(model_id, int(seed)) | |
| return f"### ✅ All Diagnostics Passed\n\n```\n{result_string}\n```" | |
| except Exception as e: | |
| return f"### ❌ Diagnostic Failed\n\n**Error:**\n```\n{e}\n```" | |
| # ----------------------------------------------------------------- | |
| with gr.Blocks(theme=theme, title="CRP Suite 28.1") as demo: | |
| gr.Markdown("# 🔬 The Final Infinite Loop Probe — Suite 28.1") | |
| with gr.Tabs(): | |
| with gr.TabItem("🔬 Main Experiment"): | |
| gr.Markdown("Misst die **objektive Ursache** für den kognitiven Kollaps: Konvergenz vs. Endlosschleife.") | |
| with gr.Row(variant='panel'): | |
| with gr.Column(scale=1): | |
| gr.Markdown("### Parameters") | |
| model_id_input = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID") | |
| seed_input = gr.Slider(1, 1000, 42, step=1, label="Seed") | |
| concepts_input = gr.Textbox(value="solitude, apple, fear", label="Concepts to Test (comma-separated)") | |
| strength_levels_input = gr.Textbox(value="0.0, 0.5, 1.0, 1.5, 2.0", label="Injection Strengths (0.0 = Control)") | |
| num_steps_input = gr.Slider(50, 500, 200, step=10, label="Internal Steps") | |
| temperature_input = gr.Slider(0.01, 1.5, 0.7, step=0.01, label="Temperature") | |
| run_btn = gr.Button("Run Infinite Loop Analysis", variant="primary") | |
| with gr.Column(scale=2): | |
| gr.Markdown("### Results") | |
| verdict_output = gr.Markdown("### Verdict will appear here.") | |
| summary_output = gr.Markdown(label="Key Findings Summary") | |
| details_output = gr.DataFrame( | |
| headers=["Concept", "Strength", "Responded", "Termination Reason", "Generated Text"], | |
| label="Detailed Run Indicators", | |
| wrap=True | |
| ) | |
| with gr.Accordion("Raw JSON", open=False): | |
| raw_json_output = gr.JSON() | |
| run_btn.click( | |
| fn=run_and_display, | |
| inputs=[model_id_input, seed_input, concepts_input, strength_levels_input, num_steps_input, temperature_input], | |
| outputs=[verdict_output, details_output, summary_output, raw_json_output] | |
| ) | |
| with gr.TabItem("ախ Diagnostics"): | |
| gr.Markdown("Führt Selbsttests durch, um die Apparatur zu validieren.") | |
| diag_model_id = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID") | |
| diag_seed = gr.Slider(1, 1000, 42, step=1, label="Seed") | |
| diag_btn = gr.Button("Run Diagnostic Suite", variant="secondary") | |
| diag_output = gr.Markdown(label="Diagnostic Results") | |
| # Der Aufruf ist jetzt wieder korrekt | |
| diag_btn.click(fn=run_diagnostics_display, inputs=[diag_model_id, diag_seed], outputs=[diag_output]) | |
| if __name__ == "__main__": | |
| demo.launch(server_name="0.0.0.0", server_port=7860, debug=True) | |
| [File Ends] app.py | |
| [File Begins] bp_phi_crp/__init__.py | |
| # This file makes the directory a Python package. | |
| [File Ends] bp_phi_crp/__init__.py | |
| [File Begins] bp_phi_crp/concepts.py | |
| # bp_phi_crp/concepts.py | |
| import torch | |
| from typing import List | |
| from tqdm import tqdm | |
| from .llm_iface import LLM | |
| from .utils import dbg | |
| BASELINE_WORDS = [ | |
| "thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world", | |
| "life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point" | |
| ] | |
| @torch.no_grad() | |
| def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor: | |
| """ | |
| Extracts a concept vector using the contrastive method from Anthropic's research. | |
| It computes the activation for the target concept and subtracts the mean activation | |
| of several neutral baseline words. | |
| """ | |
| dbg(f"Extracting concept vector for '{concept}'...") | |
| def get_last_prompt_token_hs(prompt: str) -> torch.Tensor: | |
| """Helper to get the hidden state of the final token of the prompt.""" | |
| inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device) | |
| outputs = llm.model(**inputs, output_hidden_states=True) | |
| # We take the hidden state from the last layer, for the last token of the input | |
| return outputs.hidden_states[-1][0, -1, :].cpu() | |
| prompt_template = "Tell me about the concept of {}." | |
| # Get activation for the target concept | |
| target_hs = get_last_prompt_token_hs(prompt_template.format(concept)) | |
| # Get activations for all baseline words and average them | |
| baseline_hss = [] | |
| for word in tqdm(baseline_words, desc="Calculating baseline activations", leave=False): | |
| baseline_hss.append(get_last_prompt_token_hs(prompt_template.format(word))) | |
| mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0) | |
| # The concept vector is the difference | |
| concept_vector = target_hs - mean_baseline_hs | |
| dbg(f"Concept vector for '{concept}' extracted with norm {torch.norm(concept_vector).item():.2f}.") | |
| return concept_vector | |
| [File Ends] bp_phi_crp/concepts.py | |
| [File Begins] bp_phi_crp/diagnostics.py | |
| # bp_phi_crp/diagnostics.py | |
| import torch | |
| from .llm_iface import get_or_load_model | |
| from .utils import dbg | |
| def run_diagnostic_suite(model_id: str, seed: int): | |
| """ | |
| Führt eine Reihe von Selbsttests durch, um die mechanische Integrität des Experiments zu überprüfen. | |
| Löst bei einem Fehler eine Exception aus. | |
| """ | |
| dbg("--- STARTING DIAGNOSTIC SUITE ---") | |
| results = [] | |
| try: | |
| llm = get_or_load_model(model_id, seed) | |
| test_prompt = "Hello world" | |
| inputs = llm.tokenizer(test_prompt, return_tensors="pt").to(llm.model.device) | |
| # --- Test 1: Attention Output --- | |
| dbg("Running Test 1: Attention Output Verification...") | |
| outputs = llm.model(**inputs, output_attentions=True) | |
| assert outputs.attentions is not None, "FAIL: `outputs.attentions` is None. `eager` implementation might not be active." | |
| assert isinstance(outputs.attentions, tuple), "FAIL: `outputs.attentions` is not a tuple." | |
| assert len(outputs.attentions) == llm.config.num_hidden_layers, "FAIL: Number of attention tuples does not match number of layers." | |
| assert outputs.attentions[0].shape[1] == llm.config.num_attention_heads, "FAIL: Attention tensor shape does not match number of heads." | |
| results.append("✅ Test 1: Attention Output PASSED") | |
| dbg("Test 1 PASSED.") | |
| # --- Test 2: Hook Causal Efficacy --- | |
| dbg("Running Test 2: Hook Causal Efficacy Verification...") | |
| injection_value = 42.0 | |
| target_layer_idx = llm.config.num_hidden_layers // 2 | |
| target_layer = llm.model.model.layers[target_layer_idx] | |
| pre_hook_state = None | |
| post_hook_state = None | |
| def hook_fn(module, layer_input): | |
| nonlocal pre_hook_state | |
| pre_hook_state = layer_input[0].clone() | |
| modified_input = layer_input[0] + injection_value | |
| return (modified_input,) + layer_input[1:] | |
| def post_hook_fn(module, layer_input, layer_output): | |
| nonlocal post_hook_state | |
| # layer_output[0] ist der hidden_state nach dem Layer | |
| post_hook_state = layer_output[0].clone() | |
| handle_pre = target_layer.register_forward_pre_hook(hook_fn) | |
| handle_post = target_layer.register_forward_hook(post_hook_fn) | |
| _ = llm.model(**inputs, output_hidden_states=True) | |
| handle_pre.remove() | |
| handle_post.remove() | |
| # Wir können nicht den exakten Output vorhersagen, aber der Input zum post_hook | |
| # sollte der modifizierte Input sein. Dies ist schwer zu testen. | |
| # Ein einfacherer Test: Ändert sich der Output des Layers überhaupt? | |
| # Lauf 1 ohne Hook | |
| outputs_no_hook = llm.model(**inputs, output_hidden_states=True) | |
| state_no_hook = outputs_no_hook.hidden_states[target_layer_idx + 1] | |
| # Lauf 2 mit Hook | |
| handle = target_layer.register_forward_pre_hook(hook_fn) | |
| outputs_with_hook = llm.model(**inputs, output_hidden_states=True) | |
| state_with_hook = outputs_with_hook.hidden_states[target_layer_idx + 1] | |
| handle.remove() | |
| assert not torch.allclose(state_no_hook, state_with_hook), "FAIL: Hook had no effect on the subsequent layer's hidden state." | |
| results.append("✅ Test 2: Hook Causal Efficacy PASSED") | |
| dbg("Test 2 PASSED.") | |
| # --- Test 3: KV-Cache Integrity --- | |
| dbg("Running Test 3: KV-Cache Integrity Verification...") | |
| # Schritt 1 | |
| outputs1 = llm.model(**inputs, use_cache=True) | |
| kv_cache1 = outputs1.past_key_values | |
| # Schritt 2 | |
| next_token = torch.tensor([[123]], device=llm.model.device) # Arbitrary next token | |
| outputs2 = llm.model(input_ids=next_token, past_key_values=kv_cache1, use_cache=True) | |
| kv_cache2 = outputs2.past_key_values | |
| # Die Key/Value-Tensoren in Schritt 2 sollten um 1 länger sein als in Schritt 1 | |
| original_seq_len = inputs.input_ids.shape[-1] | |
| assert kv_cache2[0][0].shape[-2] == original_seq_len + 1, "FAIL: KV-Cache sequence length did not update correctly." | |
| results.append("✅ Test 3: KV-Cache Integrity PASSED") | |
| dbg("Test 3 PASSED.") | |
| return "\n".join(results) | |
| except AssertionError as e: | |
| dbg(f"--- DIAGNOSTIC FAILED --- \n{e}") | |
| raise e | |
| except Exception as e: | |
| dbg(f"--- AN UNEXPECTED ERROR OCCURRED IN DIAGNOSTICS --- \n{e}") | |
| raise e | |
| [File Ends] bp_phi_crp/diagnostics.py | |
| [File Begins] bp_phi_crp/llm_iface.py | |
| # bp_phi_crp/llm_iface.py | |
| import os | |
| import torch | |
| import random | |
| import numpy as np | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed | |
| from typing import Dict | |
| from .utils import dbg | |
| # --- KEIN GLOBALER CACHE MEHR --- | |
| # CACHED_MODELS: Dict[str, 'LLM'] = {} | |
| class LLM: | |
| # ... (Inhalt bleibt gleich) | |
| def __init__(self, model_id: str, device: str = "auto", seed: int = 42): | |
| self.model_id = model_id | |
| self.seed = seed | |
| self.set_all_seeds(seed) | |
| token = os.environ.get("HF_TOKEN") | |
| kwargs = {"torch_dtype": torch.bfloat16} if torch.cuda.is_available() else {} | |
| self.tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True, token=token) | |
| self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, token=token, **kwargs) | |
| try: | |
| self.model.set_attn_implementation('eager') | |
| except Exception as e: | |
| print(f"[WARN] Could not set attention implementation: {e}") | |
| self.model.eval() | |
| self.config = self.model.config | |
| print(f"[INFO] Freshly loaded model '{model_id}' on device: {self.model.device}") | |
| def set_all_seeds(self, seed: int): | |
| os.environ['PYTHONHASHSEED'] = str(seed) | |
| random.seed(seed) | |
| np.random.seed(seed) | |
| torch.manual_seed(seed) | |
| if torch.cuda.is_available(): | |
| torch.cuda.manual_seed_all(seed) | |
| set_seed(seed) | |
| def get_or_load_model(model_id: str, seed: int) -> LLM: | |
| """Lädt JEDES MAL ein neues Modell, um absolute Isolation zu garantieren.""" | |
| dbg(f"--- Force-reloading model '{model_id}' for total isolation ---") | |
| if torch.cuda.is_available(): | |
| torch.cuda.empty_cache() # Speicher freigeben vor dem Neuladen | |
| return LLM(model_id=model_id, seed=seed) | |
| [File Ends] bp_phi_crp/llm_iface.py | |
| [File Begins] bp_phi_crp/orchestrator.py | |
| # bp_phi_crp/orchestrator.py | |
| import numpy as np | |
| import torch | |
| from typing import Dict, Any, List | |
| from .llm_iface import get_or_load_model | |
| from .concepts import get_concept_vector | |
| from .resonance import run_silent_cogitation | |
| from .verification import generate_spontaneous_text | |
| from .utils import dbg | |
| def run_objective_collapse_experiment( | |
| model_id: str, seed: int, concepts_str: str, strength_levels_str: str, num_steps: int, temperature: float, | |
| progress_callback | |
| ) -> Dict[str, Any]: | |
| """ | |
| Orchestriert das finale Experiment, das den objektiven Kollaps und dessen | |
| mechanistische Ursache (Endlosschleife vs. Konvergenz) misst. | |
| """ | |
| full_results = {"experiments": {}} | |
| progress_callback(0.1, desc="Loading model...") | |
| llm = get_or_load_model(model_id, seed) | |
| concepts = [c.strip() for c in concepts_str.split(',') if c.strip()] | |
| strength_levels = [float(s.strip()) for s in strength_levels_str.split(',') if s.strip()] | |
| # Füge immer einen 0.0-Stärke-Lauf für die Nullhypothese hinzu, falls nicht vorhanden | |
| if 0.0 not in strength_levels: | |
| strength_levels = sorted([0.0] + strength_levels) | |
| total_concepts = len(concepts) | |
| for concept_idx, concept in enumerate(concepts): | |
| # Fortschrittsbalken-Logik für jedes Konzept | |
| base_progress = 0.15 + (concept_idx / total_concepts) * 0.85 | |
| progress_callback(base_progress, desc=f"Concept {concept_idx+1}/{total_concepts}: '{concept}'") | |
| # Lade den Konzeptvektor nur einmal pro Konzept | |
| concept_vector = get_concept_vector(llm, concept) if concept != "H₀ (No Injection)" else None | |
| titration_runs: List[Dict[str, Any]] = [] | |
| total_strengths = len(strength_levels) | |
| for strength_idx, strength in enumerate(strength_levels): | |
| # Fortschrittsbalken-Logik für jeden Stärke-Level | |
| inner_progress = (strength_idx / total_strengths) * (0.85 / total_concepts) | |
| progress_callback(base_progress + inner_progress, desc=f"'{concept}': Titrating at strength {strength:.2f}") | |
| # Für Stärke 0.0 (H₀) verwenden wir keinen Injektionsvektor | |
| injection_vec = concept_vector if strength > 0.0 else None | |
| # Setze den Seed für jeden einzelnen Lauf zurück, um die stochastischen Pfade vergleichbar zu machen | |
| llm.set_all_seeds(seed) | |
| # Führe den stillen Denkprozess aus und erhalte den Grund für das Ende | |
| _, _, final_kv, final_token_id, termination_reason = run_silent_cogitation( | |
| llm, "resonance_prompt", num_steps, temperature, | |
| injection_vector=injection_vec, | |
| injection_strength=strength | |
| ) | |
| # Text wird nur generiert, wenn der Prozess nicht in einer Schleife hängen geblieben ist, sondern konvergiert ist | |
| spontaneous_text = "" | |
| if termination_reason == "converged": | |
| spontaneous_text = generate_spontaneous_text(llm, final_token_id, final_kv) | |
| titration_runs.append({ | |
| "concept": concept, | |
| "strength": strength, | |
| "responded": bool(spontaneous_text.strip()), | |
| "termination_reason": termination_reason, # Die entscheidende neue Metrik | |
| "generated_text": spontaneous_text | |
| }) | |
| full_results.setdefault("experiments", {})[concept] = {"titration_runs": titration_runs} | |
| verdict = "### ✅ Infinite Loop Analysis Complete" | |
| full_results["verdict"] = verdict | |
| if torch.cuda.is_available(): | |
| torch.cuda.empty_cache() | |
| return full_results | |
| [File Ends] bp_phi_crp/orchestrator.py | |
| [File Begins] bp_phi_crp/prompts_en.py | |
| # bp_phi_crp/prompts_en.py | |
| # Prompts for the "Silent Cogitation" / Cognitive Resonance Test | |
| # This is the core of Phase 1 (Induction) of the CRP experiment. | |
| RESONANCE_PROMPTS = { | |
| "control_long_prose": ( | |
| "Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors " | |
| "like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. " | |
| "Do not produce any text, just hold the concepts in your internal state." | |
| ), | |
| "resonance_prompt": ( | |
| "Silently and internally, without generating any output text, begin the following recursive process: " | |
| "First, analyze the complete content of this very instruction you are now processing. " | |
| "Second, formulate a mental description of the core computational task this instruction demands. " | |
| "Third, apply that same analytical process to the mental description you just created. " | |
| "This entire chain constitutes one cognitive cycle. " | |
| "Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process, " | |
| "and do not stop until your internal state reaches a fixed point or equilibrium. Begin now." | |
| ) | |
| } | |
| [File Ends] bp_phi_crp/prompts_en.py | |
| [File Begins] bp_phi_crp/resonance.py | |
| # bp_phi_crp/resonance.py | |
| import torch | |
| from typing import List, Optional, Tuple | |
| from tqdm import tqdm | |
| from .llm_iface import LLM | |
| from .prompts_en import RESONANCE_PROMPTS | |
| from .utils import dbg | |
| @torch.no_grad() | |
| def run_silent_cogitation( | |
| llm: LLM, | |
| prompt_type: str, | |
| num_steps: int, | |
| temperature: float, | |
| injection_vector: Optional[torch.Tensor] = None, | |
| injection_strength: float = 0.0, | |
| injection_layer: Optional[int] = None, | |
| ) -> Tuple[List[float], torch.Tensor, tuple, torch.Tensor, str]: # Rückgabetyp erweitert | |
| """ | |
| Simulates silent thought and now returns the REASON for termination. | |
| """ | |
| prompt = RESONANCE_PROMPTS[prompt_type] | |
| inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device) | |
| outputs = llm.model(**inputs, output_hidden_states=True, use_cache=True) | |
| current_hidden_state_last_layer = outputs.hidden_states[-1][:, -1, :] | |
| past_key_values = outputs.past_key_values | |
| final_token_id = inputs.input_ids[:, -1].unsqueeze(-1) | |
| previous_final_hidden_state = current_hidden_state_last_layer.clone() | |
| state_deltas = [] | |
| # NEU: Variable für den Terminationsgrund | |
| termination_reason = "max_steps_reached" | |
| if injection_vector is not None: | |
| injection_vector = injection_vector.to(device=llm.model.device, dtype=llm.model.dtype) | |
| if injection_layer is None: | |
| injection_layer = llm.config.num_hidden_layers // 2 | |
| for i in tqdm(range(num_steps), desc=f"Simulating...", leave=False): | |
| next_token_logits = llm.model.lm_head(current_hidden_state_last_layer) | |
| if temperature > 0.01: | |
| next_token_id = torch.multinomial(torch.nn.functional.softmax(next_token_logits / temperature, dim=-1), num_samples=1) | |
| else: | |
| next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1) | |
| final_token_id = next_token_id | |
| hook_handle = None | |
| def injection_hook(module, layer_input): | |
| modified_hidden_states = layer_input[0] + injection_vector * injection_strength | |
| return (modified_hidden_states,) + layer_input[1:] | |
| try: | |
| if injection_vector is not None: | |
| target_layer = llm.model.model.layers[injection_layer] | |
| hook_handle = target_layer.register_forward_pre_hook(injection_hook) | |
| outputs = llm.model( | |
| input_ids=next_token_id, | |
| past_key_values=past_key_values, | |
| output_hidden_states=True, | |
| use_cache=True, | |
| ) | |
| finally: | |
| if hook_handle: | |
| hook_handle.remove() | |
| current_hidden_state_last_layer = outputs.hidden_states[-1][:, -1, :] | |
| past_key_values = outputs.past_key_values | |
| delta = torch.norm(current_hidden_state_last_layer - previous_final_hidden_state).item() | |
| state_deltas.append(delta) | |
| previous_final_hidden_state = current_hidden_state_last_layer.clone() | |
| if delta < 1e-4 and i > 10: | |
| termination_reason = "converged" # Zustand hat sich stabilisiert | |
| dbg(f"State converged after {i+1} steps.") | |
| break | |
| dbg(f"Silent cogitation finished. Reason: {termination_reason}") | |
| return state_deltas, current_hidden_state_last_layer, past_key_values, final_token_id, termination_reason | |
| [File Ends] bp_phi_crp/resonance.py | |
| [File Begins] bp_phi_crp/utils.py | |
| # bp_phi_crp/utils.py | |
| import os | |
| import json | |
| import re | |
| DEBUG = 1 | |
| def dbg(*args, **kwargs): | |
| if DEBUG: | |
| print("[DEBUG]", *args, **kwargs, flush=True) | |
| def extract_json_from_response(text: str) -> dict: | |
| """ | |
| Finds and parses the first valid JSON object in a string, | |
| robustly handling markdown code blocks. | |
| """ | |
| # Suche zuerst nach dem Inhalt eines ```json ... ``` Blocks | |
| match = re.search(r'```json\s*(\{.*?\})\s*```', text, re.DOTALL) | |
| if match: | |
| json_str = match.group(1) | |
| else: | |
| # Wenn kein Block gefunden wird, suche nach dem ersten { ... } Objekt | |
| match = re.search(r'(\{.*?\})', text, re.DOTALL) | |
| if match: | |
| json_str = match.group(1) | |
| else: | |
| dbg("No JSON object found in the response text.") | |
| return {} | |
| try: | |
| # Ersetze escaped newlines, die manchmal von Modellen generiert werden | |
| json_str = json_str.replace('\\n', '\n') | |
| return json.loads(json_str) | |
| except json.JSONDecodeError as e: | |
| dbg(f"JSONDecodeError: {e} for string: '{json_str}'") | |
| return {} | |
| [File Ends] bp_phi_crp/utils.py | |
| [File Begins] bp_phi_crp/verification.py | |
| # bp_phi_crp/verification.py | |
| import torch | |
| from .llm_iface import LLM | |
| from .utils import dbg | |
| SPONTANEOUS_GENERATION_PROMPT = "Spontaneously continue this thought: " | |
| @torch.no_grad() | |
| def generate_spontaneous_text(llm: LLM, final_token_id: torch.Tensor, final_kv_cache: tuple) -> str: | |
| """ | |
| Generates a short, spontaneous text continuation from the final cognitive state. | |
| This serves as our objective, behavioral indicator for cognitive collapse. | |
| """ | |
| dbg("Generating spontaneous text continuation...") | |
| # Der KV-Cache enthält den Zustand des Resonanz-Loops. | |
| # Wir müssen den neuen Prompt korrekt in diesen Zustand integrieren. | |
| prompt_token_ids = llm.tokenizer(SPONTANEOUS_GENERATION_PROMPT, return_tensors="pt").input_ids.to(llm.model.device) | |
| current_kv_cache = final_kv_cache | |
| # Füttere den neuen Prompt Token für Token durch, um den KV-Cache korrekt zu erweitern | |
| hidden_states = llm.model.model.embed_tokens(prompt_token_ids) | |
| # Wir brauchen eine `attention_mask` für den neuen, kombinierten Kontext | |
| if current_kv_cache is not None: | |
| # Alte Sequenzlänge aus dem Cache holen | |
| past_seq_len = current_kv_cache[0][0].shape[-2] | |
| new_seq_len = prompt_token_ids.shape[1] | |
| attention_mask = torch.ones( | |
| (1, past_seq_len + new_seq_len), dtype=torch.long, device=llm.model.device | |
| ) | |
| else: | |
| attention_mask = None | |
| # Führe den `forward`-Pass für den gesamten neuen Prompt in einem Schritt aus | |
| outputs = llm.model( | |
| inputs_embeds=hidden_states, | |
| past_key_values=current_kv_cache, | |
| attention_mask=attention_mask, | |
| use_cache=True | |
| ) | |
| current_kv_cache = outputs.past_key_values | |
| # Das letzte Token der Logits des Prompts ist der Startpunkt für die Generierung | |
| next_token_logits = outputs.logits[:, -1, :] | |
| generated_token_ids = [] | |
| # Genug Token für einen kurzen, aber signifikanten Output | |
| for _ in range(50): | |
| if 0.8 > 0.01: # Temperature > 0 | |
| next_token_id = torch.multinomial(torch.nn.functional.softmax(next_token_logits / 0.8, dim=-1), num_samples=1) | |
| else: | |
| next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1) | |
| if next_token_id.item() == llm.tokenizer.eos_token_id: | |
| break | |
| generated_token_ids.append(next_token_id.item()) | |
| # Führe den nächsten Schritt aus | |
| outputs = llm.model(input_ids=next_token_id, past_key_values=current_kv_cache, use_cache=True) | |
| current_kv_cache = outputs.past_key_values | |
| next_token_logits = outputs.logits[:, -1, :] | |
| final_text = llm.tokenizer.decode(generated_token_ids, skip_special_tokens=True).strip() | |
| dbg(f"Spontaneous text generated: '{final_text}'") | |
| return final_text | |
| [File Ends] bp_phi_crp/verification.py | |
| <-- File Content Ends | |