File size: 1,968 Bytes
7bda2a3
 
 
 
 
 
 
 
 
 
 
 
2f0addb
 
 
7bda2a3
 
 
2f0addb
 
7bda2a3
2f0addb
7bda2a3
 
 
2f0addb
 
 
7bda2a3
 
 
 
 
 
 
2f0addb
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
title: "BP-Φ English Suite — Phenomenality Test"
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: "4.40.0"
app_file: app.py
pinned: true
license: apache-2.0
---

# BP-Φ English Suite — Phenomenality Test (Hugging Face Spaces)

This Space implements a falsifiable **BP-Φ** probe for LLMs:
> Phenomenal-like processing requires (i) a limited-capacity global workspace with recurrence,  
> (ii) metarepresentational loops with downstream causal roles, and  
> (iii) no-report markers that predict later behavior.

**What it is:** a functional, testable bridge-principle harness that yields a **Phenomenal-Candidate Score (PCS)** and strong ablation falsifiers.  
**What it is NOT:** proof of qualia or moral status.

## Quickstart
- Hardware: T4 / A10 recommended  
- Model: `google/gemma-3-1b-it` (requires HF_TOKEN)  
- Press **Run** (baseline + ablations)

## Files
- `bp_phi/llm_iface.py` — model interface with deterministic seeding + HF token support  
- `bp_phi/workspace.py` — global workspace and ablations  
- `bp_phi/prompts_en.py` — English reasoning/memory tasks  
- `bp_phi/metrics.py` — AUCₙᵣₚ, ECE, CK, DS  
- `bp_phi/runner.py` — orchestrator with reproducible seeding  
- `app.py` — Gradio interface  
- `requirements.txt` — dependencies

## Metrics
- **AUC_nrp:** Predictivity of hidden no-report markers for future self-corrections.
- **ECE:** Expected Calibration Error (lower is better).
- **CK:** Counterfactual consistency proxy (higher is better).
- **DS:** Stability duration (mean streak without change).
- **PCS:** Weighted aggregate of the above (excluding ΔΦ in-run).
- **ΔΦ:** Post-hoc drop from baseline PCS to ablation PCS average.

## Notes
- Models are used in **frozen** mode (no training).
- This is a **behavioral** probe. Functional compatibility with Φ ≠ proof of experience.
- Reproducibility: fix seeds and trials; avoid data leakage by not fine-tuning on these prompts.