Spaces:

neuralworm
/

llm_qualia

Sleeping

App Files Files Community

llm_qualia / README.md

neuralworm

initial commit

7bda2a3 14 days ago

preview code

raw

history blame

1.97 kB

	---
	title: "BP-Φ English Suite — Phenomenality Test"
	emoji: 🧠
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: "4.40.0"
	app_file: app.py
	pinned: true
	license: apache-2.0
	---

	# BP-Φ English Suite — Phenomenality Test (Hugging Face Spaces)

	This Space implements a falsifiable BP-Φ probe for LLMs:
	> Phenomenal-like processing requires (i) a limited-capacity global workspace with recurrence,
	> (ii) metarepresentational loops with downstream causal roles, and
	> (iii) no-report markers that predict later behavior.

	What it is: a functional, testable bridge-principle harness that yields a Phenomenal-Candidate Score (PCS) and strong ablation falsifiers.
	What it is NOT: proof of qualia or moral status.

	## Quickstart
	- Hardware: T4 / A10 recommended
	- Model: `google/gemma-3-1b-it` (requires HF_TOKEN)
	- Press Run (baseline + ablations)

	## Files
	- `bp_phi/llm_iface.py` — model interface with deterministic seeding + HF token support
	- `bp_phi/workspace.py` — global workspace and ablations
	- `bp_phi/prompts_en.py` — English reasoning/memory tasks
	- `bp_phi/metrics.py` — AUCₙᵣₚ, ECE, CK, DS
	- `bp_phi/runner.py` — orchestrator with reproducible seeding
	- `app.py` — Gradio interface
	- `requirements.txt` — dependencies

	## Metrics
	- AUC_nrp: Predictivity of hidden no-report markers for future self-corrections.
	- ECE: Expected Calibration Error (lower is better).
	- CK: Counterfactual consistency proxy (higher is better).
	- DS: Stability duration (mean streak without change).
	- PCS: Weighted aggregate of the above (excluding ΔΦ in-run).
	- ΔΦ: Post-hoc drop from baseline PCS to ablation PCS average.

	## Notes
	- Models are used in frozen mode (no training).
	- This is a behavioral probe. Functional compatibility with Φ ≠ proof of experience.
	- Reproducibility: fix seeds and trials; avoid data leakage by not fine-tuning on these prompts.