You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

CodeModernBERT-Owl-v1πŸ¦‰

Model Details

  • Model type: Bi-encoder architecture based on ModernBERT
  • Architecture:
    • Hidden size: 768
    • Layers: 22
    • Attention heads: 12
    • Intermediate size: 1,152
    • Max position embeddings: 8,192
    • Local attention window size: 128
    • RoPE positional encoding: ΞΈ = 160,000
    • Local RoPE positional encoding: ΞΈ = 10,000
  • Sequence length: up to 2,048 tokens for code and docstring inputs during pretraining
  • Implementation: Back-end in Python; integrated into OwlSpotLight, a Visual Studio Code extension.

Pretraining

  • Tokenizer: Custom BPE tokenizer trained for code and docstring pairs.

  • Data: Functions and natural language descriptions extracted from GitHub repositories.

  • Masking strategy: Two-phase pretraining.

    • Phase 1: Random Masked Language Modeling (MLM)
      30% of tokens in code functions are randomly masked and predicted using standard MLM.
    • Phase 2: Line-level Span Masking
      Inspired by SpanBERT, continued pretraining on the same data with span masking at line granularity:
      1. Convert input tokens back to strings.
      2. Detect newline tokens with regex and segment inputs by line.
      3. Exclude whitespace-only tokens from masking.
      4. Apply padding to align sequence lengths.
      5. Randomly mask 30% of tokens in each line segment and predict them.
  • Pretraining hyperparameters:

    • Batch size: 20
    • Gradient accumulation steps: 6
    • Effective batch size: 120
    • Optimizer: AdamW
    • Learning rate: 5e-5
    • Scheduler: Cosine
    • Epochs: 2
    • Precision: Mixed precision (fp16) using transformers
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Shuu12121/CodeModernBERT-Owl-v1

Finetunes
2 models

Datasets used to train Shuu12121/CodeModernBERT-Owl-v1

Collection including Shuu12121/CodeModernBERT-Owl-v1