Spaces:
Sleeping
Sleeping
BP-Φ English Suite — Phenomenality Test (Hugging Face Spaces)
This Space implements a falsifiable BP-Φ probe for LLMs:
Phenomenal-like processing requires (i) a limited-capacity global workspace with recurrence, (ii) metarepresentational loops with downstream causal roles, and (iii) no-report markers that predict later behavior.
What it is: a functional, testable bridge-principle harness that yields a Phenomenal-Candidate Score (PCS) and strong ablation falsifiers.
What it is NOT: proof of Qualia or moral status.
Quickstart (Spaces)
- Hardware: T4 / A10 recommended
- In the UI: set
Model IDto e.g.google/gemma-3-2b-it - Press Run (baseline + ablations)
Files
bp_phi/llm_iface.py— auto-detects chat template (IT vs base)bp_phi/workspace.py— global workspace with capacity limit and random ablationbp_phi/prompts_en.py— English task poolbp_phi/metrics.py— AUC^nrp, ECE, CK, DSbp_phi/runner.py— full suite + metrics + PCSapp.py— Gradio app integrating runs + ablation comparison
Metrics
- AUC_nrp: Predictivity of hidden no-report markers for future self-corrections.
- ECE: Expected Calibration Error (lower is better).
- CK: Counterfactual consistency proxy (higher is better).
- DS: Stability duration (mean streak without change).
- PCS: Weighted aggregate of the above (excluding ΔΦ in-run).
- ΔΦ: Post-hoc drop from baseline PCS to ablation PCS average.
Notes
- Models are used in frozen mode (no training).
- This is a behavioral probe. Functional compatibility with Φ ≠ proof of experience.
- Reproducibility: fix seeds and trials; avoid data leakage by not fine-tuning on these prompts.