Spaces:
Sleeping
Sleeping
| title: "BP-Φ English Suite — Phenomenality Test" | |
| emoji: 🧠 | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: "4.40.0" | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| # BP-Φ English Suite — Phenomenality Test (Hugging Face Spaces) | |
| This Space implements a falsifiable **BP-Φ** probe for LLMs: | |
| > Phenomenal-like processing requires (i) a limited-capacity global workspace with recurrence, | |
| > (ii) metarepresentational loops with downstream causal roles, and | |
| > (iii) no-report markers that predict later behavior. | |
| **What it is:** a functional, testable bridge-principle harness that yields a **Phenomenal-Candidate Score (PCS)** and strong ablation falsifiers. | |
| **What it is NOT:** proof of qualia or moral status. | |
| ## Quickstart | |
| - Hardware: T4 / A10 recommended | |
| - Model: `google/gemma-3-1b-it` (requires HF_TOKEN) | |
| - Press **Run** (baseline + ablations) | |
| ## Files | |
| - `bp_phi/llm_iface.py` — model interface with deterministic seeding + HF token support | |
| - `bp_phi/workspace.py` — global workspace and ablations | |
| - `bp_phi/prompts_en.py` — English reasoning/memory tasks | |
| - `bp_phi/metrics.py` — AUCₙᵣₚ, ECE, CK, DS | |
| - `bp_phi/runner.py` — orchestrator with reproducible seeding | |
| - `app.py` — Gradio interface | |
| - `requirements.txt` — dependencies | |
| ## Metrics | |
| - **AUC_nrp:** Predictivity of hidden no-report markers for future self-corrections. | |
| - **ECE:** Expected Calibration Error (lower is better). | |
| - **CK:** Counterfactual consistency proxy (higher is better). | |
| - **DS:** Stability duration (mean streak without change). | |
| - **PCS:** Weighted aggregate of the above (excluding ΔΦ in-run). | |
| - **ΔΦ:** Post-hoc drop from baseline PCS to ablation PCS average. | |
| ## Notes | |
| - Models are used in **frozen** mode (no training). | |
| - This is a **behavioral** probe. Functional compatibility with Φ ≠ proof of experience. | |
| - Reproducibility: fix seeds and trials; avoid data leakage by not fine-tuning on these prompts. | |