OnT-MPNet-go / README.md
Hui97's picture
Upload folder using huggingface_hub
f90701e verified
metadata
tags:
  - ontology-embedding
  - hyperbolic-space
  - hierarchical-reasoning
  - biomedical-ontology
  - generated_from_trainer
  - dataset_size:150000
  - loss:HierarchyTransformerLoss
base_model: sentence-transformers/all-mpnet-base-v2
widget:
  - source_sentence: cellular response to stimulus
    sentences:
      - response to stimulus
      - medial transverse frontopolar gyrus
      - biological regulation
  - source_sentence: >-
      regulation of cell differentiation involved in embryonic placenta
      development
    sentences:
      - thoracic wall
      - ectoderm-derived structure
      - regulation of cell differentiation
  - source_sentence: regulation of hippocampal neuron apoptotic process
    sentences:
      - external genitalia morphogenesis
      - compact layer of ventricle
      - biological regulation
  - source_sentence: transitional myocyte of internodal tract
    sentences:
      - secretory epithelial cell
      - internodal tract myocyte
      - insect haltere disc
  - source_sentence: alveolar atrium
    sentences:
      - organ part
      - superior recess of lesser sac
      - foramen of skull
pipeline_tag: sentence-similarity
library_name: sentence-transformers

OnT: Language Models as Ontology Encoders

This is an OnT (Ontology Transformer) model trained on the GO dataset, based on sentence-transformers/all-mpnet-base-v2. OnT is a language model-based framework for ontology embeddings, enabling effective representation of concepts as points in hyperbolic space and axioms as hierarchical relationships between concepts.

Model Details

Model Description

  • Model Type: Ontology Transformer (OnT)
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Training Dataset: GO
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Embedding Space: Hyperbolic Space
  • Key Features:
    • Hyperbolic embeddings for ontology concept encoding
    • Modeling of hierarchical relationships between concepts
    • Support for role embeddings as rotations over hyperbolic spaces
    • Concept rotation, transition, and existential quantifier representation

Model Sources

Available Versions

This model is available in 4 versions (Git branches) to suit different use cases:

Branch Training Type Role Embedding Use Case
main (default) Prediction Dataset ✅ With role embedding Default version: training on prediction dataset, support role embedding
role-free Prediction Dataset ❌ Without role embedding Training on prediction dataset, without role embedding
inference-default Inference Dataset ✅ With role embedding Training on inference dataset, with role support
inference-role-free Inference Dataset ❌ Without role embedding Training on inference dataset, without role embeddings

How to use different versions:

from OnT import OntologyTransformer

# Default version (main branch - OnTr with role embedding)
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go")

# Role-free version (without role embedding)
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="role-free")

# Inference version with role embedding
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-default")

# Inference version without role embedding
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-role-free")

Full Model Architecture

OntologyTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Installation

First, install the required dependencies:

pip install sentence-transformers==3.4.0.dev0

You also need to install HierarchyTransformers following the instructions in their repository.

Direct Usage

Load the model and use it for ontology concept encoding:

import torch
from OnT import OntologyTransformer

# Load the OnT model
path = "Hui97/OnT-MPNet-go"
ont = OntologyTransformer.from_pretrained(path)

# Entity names to be encoded
entity_names = [
    'alveolar atrium',
    'organ part',
    'superior recess of lesser sac',
]

# Get the entity embeddings in hyperbolic space
entity_embeddings = ont.encode_concept(entity_names)
print(entity_embeddings.shape)
# [3, 768]

# Role sentences to be encoded
role_sentences = [
    "application attribute",
    "attribute",
    "chemical modifier"
]

# Get the role embeddings (rotations and scalings)
role_rotations, role_scalings = ont.encode_roles(role_sentences)

Citation

BibTeX

If you use this model, please cite:

@article{yang2025language,
  title={Language Models as Ontology Encoders},
  author={Yang, Hui and Chen, Jiaoyan and He, Yuan and Gao, Yongsheng and Horrocks, Ian},
  journal={arXiv preprint arXiv:2507.14334},
  year={2025}
}