Spaces:

neuralworm
/

llm_qualia

Sleeping

App Files Files Community

llm_qualia / README.md

neuralworm

initial commit

7bda2a3 13 days ago

preview code

raw

history blame

1.97 kB

metadata

title: BP-Φ English Suite — Phenomenality Test
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 4.40.0
app_file: app.py
pinned: true
license: apache-2.0

BP-Φ English Suite — Phenomenality Test (Hugging Face Spaces)

This Space implements a falsifiable BP-Φ probe for LLMs:

Phenomenal-like processing requires (i) a limited-capacity global workspace with recurrence,
(ii) metarepresentational loops with downstream causal roles, and
(iii) no-report markers that predict later behavior.

What it is: a functional, testable bridge-principle harness that yields a Phenomenal-Candidate Score (PCS) and strong ablation falsifiers.
What it is NOT: proof of qualia or moral status.

Quickstart

Hardware: T4 / A10 recommended
Model: google/gemma-3-1b-it (requires HF_TOKEN)
Press Run (baseline + ablations)

Files

bp_phi/llm_iface.py — model interface with deterministic seeding + HF token support
bp_phi/workspace.py — global workspace and ablations
bp_phi/prompts_en.py — English reasoning/memory tasks
bp_phi/metrics.py — AUCₙᵣₚ, ECE, CK, DS
bp_phi/runner.py — orchestrator with reproducible seeding
app.py — Gradio interface
requirements.txt — dependencies

Metrics

AUC_nrp: Predictivity of hidden no-report markers for future self-corrections.
ECE: Expected Calibration Error (lower is better).
CK: Counterfactual consistency proxy (higher is better).
DS: Stability duration (mean streak without change).
PCS: Weighted aggregate of the above (excluding ΔΦ in-run).
ΔΦ: Post-hoc drop from baseline PCS to ablation PCS average.

Notes

Models are used in frozen mode (no training).
This is a behavioral probe. Functional compatibility with Φ ≠ proof of experience.
Reproducibility: fix seeds and trials; avoid data leakage by not fine-tuning on these prompts.