--- title: "BP-Φ English Suite — Phenomenality Test" emoji: 🧠 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: "4.40.0" app_file: app.py pinned: true license: apache-2.0 --- # BP-Φ English Suite — Phenomenality Test (Hugging Face Spaces) This Space implements a falsifiable **BP-Φ** probe for LLMs: > Phenomenal-like processing requires (i) a limited-capacity global workspace with recurrence, > (ii) metarepresentational loops with downstream causal roles, and > (iii) no-report markers that predict later behavior. **What it is:** a functional, testable bridge-principle harness that yields a **Phenomenal-Candidate Score (PCS)** and strong ablation falsifiers. **What it is NOT:** proof of qualia or moral status. ## Quickstart - Hardware: T4 / A10 recommended - Model: `google/gemma-3-1b-it` (requires HF_TOKEN) - Press **Run** (baseline + ablations) ## Files - `bp_phi/llm_iface.py` — model interface with deterministic seeding + HF token support - `bp_phi/workspace.py` — global workspace and ablations - `bp_phi/prompts_en.py` — English reasoning/memory tasks - `bp_phi/metrics.py` — AUCā‚™įµ£ā‚š, ECE, CK, DS - `bp_phi/runner.py` — orchestrator with reproducible seeding - `app.py` — Gradio interface - `requirements.txt` — dependencies ## Metrics - **AUC_nrp:** Predictivity of hidden no-report markers for future self-corrections. - **ECE:** Expected Calibration Error (lower is better). - **CK:** Counterfactual consistency proxy (higher is better). - **DS:** Stability duration (mean streak without change). - **PCS:** Weighted aggregate of the above (excluding ΔΦ in-run). - **ΔΦ:** Post-hoc drop from baseline PCS to ablation PCS average. ## Notes - Models are used in **frozen** mode (no training). - This is a **behavioral** probe. Functional compatibility with Φ ≠ proof of experience. - Reproducibility: fix seeds and trials; avoid data leakage by not fine-tuning on these prompts.