Spaces:

Lumokinesis
/

embeddings-as-cognition-umap

Sleeping

App Files Files Community

embeddings-as-cognition-umap / README.md

Lumokinesis

Update README.md

7e55d0e verified about 1 month ago

preview code

raw

history blame contribute delete

6.07 kB

metadata

title: Vector Embeddings as a Cognitive-Inspired Representational Framework
tags:
  - embeddings
  - cognitive-science
  - nlp
  - umap
  - data visualization
  - tutorial
license: apache-2.0
emoji: 🚀
colorFrom: green
colorTo: indigo
sdk: docker
pinned: true
short_description: Demo for UMAP visualization of embeddings for cognition
port: 7860

Why embeddings (semantic memory & distributed codes)?

High-dimensional embeddings give you a continuous geometry of meaning: nearby points are related, directions encode relations, and clusters map to human categories. This demo lets you see that geometry and work with it—without getting lost in prompt engineering.

What you’ll notice:

Category “islands” (e.g., bear/wolf near forest/woods/nature).

Prototype effects: some items sit closer to a category centroid (more “typical”).

Model choice matters (small = fast; larger = crisper clusters).

What this Space does

Embeds your words/phrases (select a Sentence-Transformer).

Projects them with UMAP for visualization.

Draws per-category centroids (⭐) and optional labels.

Computes prototype distances (cosine to each category centroid).

Lets you add/remove categories in the sidebar and download CSVs.

How to use: Pick a model → edit categories (one term per line) → Run → explore the plot + prototype table. Use downloads for offline analysis.

Input section → edit terms and categories

Output Section → plot, centroid distances, downloads (csv.) you can also save the graph image directly.

Important note about 2D projections

UMAP (and t-SNE) compress hundreds of dimensions into 2D, which necessarily distorts some distances and neighborhoods. Treat the plots as intuition aids, not ground truth. For decisions, rely on original-space metrics (cosine similarity, centroid distance, k-NN overlap), not the 2D layout. Small changes to n_neighbors/min_dist can shift the picture without changing the underlying semantics.

Where I’m heading (high-level)

This demo is a thin slice of a broader effort toward embedding-native agents that:

retrieve and prune context geometrically,

emit compact semantic hints for downstream steps,

route tasks through auditable procedures with watchers/doubt checks/topic locks,

and learn from logs what actually helped.

Details are intentionally withheld for now; if this direction fits your roadmap, I’m open to discussing under NDA.

Known limits & mitigations (brief)

Polysemy/context mixing → context-conditioned reps; multi-view scoring

Hubness/anisotropy → hubness-aware neighbors; local normalization

Projection artifacts → use 2D only for intuition; score in original space

Domain shift → lightweight adaptation; guarded fallbacks

I’ve explored practical remedies for these and related topics (e.g., geometry-aware retrieval pruning). Serious inquiries welcome.

Collaboration

I published an AI-assisted mock-up of the theory as a public doc: 👉 Vector Embeddings As A Cognitive-inspired Representational Framework

If you’d like to co-author a formal paper (theory, proofs, experiments, benchmarks), DM me with your background and interest area. I’m also open to sharing implementation details under NDA and discussing exclusive/shared-rights collaborations depending on scope.

Responsible use

Embeddings reflect their training data. Treat prototype distances and plots as diagnostics, not verdicts. Validate with task-level metrics, prefer high-quality sources, and keep guardrails on any system with side effects.

Citation

Jaired Hall (2025). Embedding-Native Cognition: Geometry as a Substrate for Retrieval, Planning, and Safety. Demo & preprint, Google Doc.

BibTeX:

@misc{hall2025embeddingnative,

title = {Embedding-Native Cognition: Geometry as a Substrate for Retrieval, Planning, and Safety},

author = {Jaired Hall},

year = {2025},

howpublished = {https://docs.google.com/document/d/e/2PACX-1vR2yfHEJYRxcS1Y756s1KiDKer1DkCHZj95KpYi340tyA8nO5hNVwYRwLkg0TpH_Q/pub }

note = {Demo and AI-assisted mock-up preprint}

}

References (core)

Deerwester, Dumais, Furnas, Landauer, Harshman. Indexing by Latent Semantic Analysis. JASIS (1990). Wiley DOI · PDF Landauer & Dumais. A Solution to Plato’s Problem… Psychological Review (1997). PDF Collins & Loftus. A Spreading-Activation Theory of Semantic Processing. Psychological Review (1975). PDF Collins & Quillian. Retrieval Time from Semantic Memory. (1969). PDF Gärdenfors. Conceptual Spaces: The Geometry of Thought. (2000). MIT Press Rosch. Principles of Categorization. (1978/1988 reprint). eScholarship Mikolov, Yih, Zweig. Linguistic Regularities in Continuous-Space Word Representations. NAACL (2013). ACL Anthology Nickel & Kiela. Poincaré Embeddings for Learning Hierarchical Representations. NeurIPS (2017). Paper · arXiv Kriegeskorte, Mur, Bandettini. Representational Similarity Analysis. Frontiers in Systems Neuroscience (2008). Article Goldstein et al. Alignment of brain embeddings and artificial contextual embeddings… Nature Communications (2024). Article McInnes, Healy, Melville. UMAP: Uniform Manifold Approximation and Projection. JOSS (2018). Article · arXiv Günther, Rinaldi, Marelli. Vector-Space Models… Common Misconceptions. Perspectives on Psychological Science (2019). PubMed Ethayarajh. How Contextual Are Contextualized Word Representations? EMNLP (2019). ACL Anthology Radovanović, Nanopoulos, Ivanović. Hubs in Space… JMLR (2010). PDF van der Maaten & Hinton. Visualizing Data Using t‑SNE. JMLR (2008). PDF Speer, Chin, Havasi. ConceptNet 5.5 / Numberbatch. AAAI (2017). AAAI · arXiv Koll, Matthew B. Information retrieval theory and design based on a model of the user’s concept relations. (1980). ACM DL PDF · Mirror