Make compatible with newer transformers
Issue
The model fails to load with new Transformers versions due to removed classes:
ImportError: cannot import name 'LlamaFlashAttention2' from 'transformers.models.llama.modeling_llama'
Root Cause
In modeling_deepseekv2.py (lines 37-39), the code imports:
from transformers.models.llama.modeling_llama import (
    LlamaAttention,
    LlamaFlashAttention2
)
These classes were removed in Transformers 4.47+ as part of the attention refactoring.
Proposed Fix
Since DeepSeek-OCR uses MLA (Multi-head Latent Attention) by default (config.use_mla = True), the Llama attention classes are only used as fallbacks for MHA mode. 
Option 1: Remove MHA support (simplest)
- Remove the imports (lines 37-39)
 - Update 
ATTENTION_CLASSESdict (lines 1022-1029): 
ATTENTION_CLASSES = {
    "eager": DeepseekV2Attention,
    "flash_attention_2": DeepseekV2FlashAttention2,
    "mla_eager": DeepseekV2Attention,
    "mla_flash_attention_2": DeepseekV2FlashAttention2,
    # Removed mha_eager and mha_flash_attention_2
}
Option 2: Use DeepSeek attention for MHA mode (backward compatible)
Keep the same keys but map to DeepSeek classes:
ATTENTION_CLASSES = {
    "eager": DeepseekV2Attention,
    "flash_attention_2": DeepseekV2FlashAttention2,
    "mla_eager": DeepseekV2Attention,
    "mla_flash_attention_2": DeepseekV2FlashAttention2,
    "mha_eager": DeepseekV2Attention,  # Changed
    "mha_flash_attention_2": DeepseekV2FlashAttention2,  # Changed
}
Option 3: Conditional import (most flexible)
try:
    from transformers.models.llama.modeling_llama import (
        LlamaAttention,
        LlamaFlashAttention2
    )
    HAS_LLAMA_ATTENTION = True
except ImportError:
    HAS_LLAMA_ATTENTION = False
ATTENTION_CLASSES = {
    "eager": DeepseekV2Attention,
    "flash_attention_2": DeepseekV2FlashAttention2,
    "mla_eager": DeepseekV2Attention,
    "mla_flash_attention_2": DeepseekV2FlashAttention2,
}
if HAS_LLAMA_ATTENTION:
    ATTENTION_CLASSES.update({
        "mha_eager": LlamaAttention,
        "mha_flash_attention_2": LlamaFlashAttention2
    })
else:
    ATTENTION_CLASSES.update({
        "mha_eager": DeepseekV2Attention,
        "mha_flash_attention_2": DeepseekV2FlashAttention2
    })
This works because DeepSeek-OCR uses MLA by default anyway.
All the same issues are still there nothing will open and nothing works this is a useless app fix it
@harpreetsahota
	 Does it really use MLA by default? Over here it says "use_mla": false, and mapping "mha_flash_attention_2" to DeepseekV2FlashAttention2 still does not work for me, but I am not sure if it is an unrelated issue.
@bigpappic
	 
@mingyi456
	 
@laxmareddyp
	 
@yayoimizuha
	
Hey guys! Hope this post helps with the compatibility issues.
Post: https://huggingface.co/posts/prithivMLmods/374605520852651
Demo: https://huggingface.co/spaces/prithivMLmods/DeepSeek-OCR-experimental
transformers==4.57.1
torch
einops
addict
easydict
matplotlib
import os
import torch
import requests
from transformers import AutoModel, AutoTokenizer
from typing import Iterable
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_name = "prithivMLmods/DeepSeek-OCR-Latest-BF16.I64" # - (https://huggingface.co/deepseek-ai/DeepSeek-OCR)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_name,
    trust_remote_code=True,
    use_safetensors=True,
).to(device).eval()