library_name: transformers
license: apache-2.0
language:
  - ja
llm-jp-modernbert-base
This model is based on the modernBERT-base architecture with llm-jp-tokenizer. It was trained using the Japanese subset (3.4TB) of the llm-jp-corpus v4 and supports a max sequence length of 8192.
For detailed information on the training methods, evaluation, and analysis results, please visit at llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length
Usage
Please install the transformers library.
pip install "transformers>=4.48.0"
If your GPU supports flash-attn 2, it is recommended to install flash-attn.
pip install flash-attn --no-build-isolation
Using AutoModelForMaskedLM:
from transformers import AutoTokenizer, AutoModelForMaskedLM
model_id = "llm-jp/llm-jp-modernbert-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)
text = "日本の首都は<MASK|LLM-jp>です。"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# To get predictions for the mask:
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print("Predicted token:", predicted_token)
# Predicted token:  東京
Training
This model was trained with a max_seq_len of 1024 in stage 1, and then with a max_seq_len of 8192 in stage 2.
Training code can be found at https://github.com/llm-jp/llm-jp-modernbert
| Model | stage 1 | stage 2 | 
|---|---|---|
| max_seq_len | 1024 | 8192 | 
| max_steps | 500,000 | 200,000 | 
| Total batch size | 3328 | 384 | 
| Peak LR | 5e-4 | 5e-5 | 
| warmup step | 24,000 | |
| LR schedule | Linear decay | |
| Adam beta 1 | 0.9 | |
| Adam beta 2 | 0.98 | |
| Adam eps | 1e-6 | |
| MLM prob | 0.30 | |
| Gradient clipping | 1.0 | |
| weight decay | 1e-5 | |
| line_by_line | True | 
The blank in stage 2 indicate the same value as in stage 1.
Evaluation
JSTS, JNLI, and JCoLA from JGLUE were used. Evaluation code can be found at https://github.com/llm-jp/llm-jp-modernbert
| Model | JSTS (pearson) | JNLI (accuracy) | JCoLA (accuracy) | Avg | 
|---|---|---|---|---|
| tohoku-nlp/bert-base-japanese-v3 | 0.920 | 0.912 | 0.880 | 0.904 | 
| sbintuitions/modernbert-ja-130m | 0.916 | 0.927 | 0.868 | 0.904 | 
| sbintuitions/modernbert-ja-310m | 0.932 | 0.933 | 0.883 | 0.916 | 
| llm-jp/llm-jp-modernbert-base | 0.918 | 0.913 | 0.844 | 0.892 | 
LICENSE
Citation
@misc{sugiura2025llmjpmodernbertmodernbertmodeltrained,
      title={llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length}, 
      author={Issa Sugiura and Kouta Nakayama and Yusuke Oda},
      year={2025},
      eprint={2504.15544},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.15544}, 
}

