NV-Reason-CXR-3B Overview

Description:

NV-Reason-CXR-3B is a specialized vision-language model designed for medical reasoning and interpretation of chest X-ray images, with detailed explanations. The model combines visual understanding with medical reasoning capabilities, enabling healthcare professionals to access comprehensive analyses and engage in follow-up discussions about radiological findings. NV-Reason-CXR-3B provides step-by-step reasoning that mirrors clinical thinking patterns, making it valuable for educational and research applications in medical imaging.

This model is for research and development only.

💻 [Github code]
🩻 [Web Demo]

Quick start

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image


# Load the model 
model_name = "nvidia/NV-Reason-CXR-3B"
model = AutoModelForImageTextToText.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
).eval().to("cuda")

processor = AutoProcessor.from_pretrained(model_name)

# Load chest x-ray image
image = Image.open("chest_xray.png")

# Prepare input with clinical context
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image,
            },
            {
                "type": "text",
                "text": "Find abnormalities and support devices."
            }
        ]
    }
]


# Create prompt using chat template
text = processor.apply_chat_template(messages, add_generation_prompt=True)

# Process inputs
inputs = processor(text=text, images=[image], return_tensors="pt")
inputs = inputs.to(model.device)

# Generate 
generated_ids = model.generate(**inputs,  max_new_tokens=2048)

# Trim and decode
trimmed_generated_ids = [
    out_ids[len(in_ids):]
    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
generated_text = processor.batch_decode(
    trimmed_generated_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)[0]


print("Output:")
print(generated_text)

Framework versions

Transformers: 4.56.1
Pytorch: 2.7.1
Tokenizers: 0.22.0

License/Terms of Use:

NVIDIA OneWay Non-Commercial License for academic research purposes

Deployment Geography:

Global

Use Case:

Radiologists, medical students, and medical researchers would be expected to use this system for chest X-ray interpretation with detailed reasoning, educational training with AI-generated explanations, and research applications requiring explainable medical AI analyses.

Important Medical AI Considerations: This model is designed for research and educational purposes only and should not be used for clinical diagnosis or treatment decisions. All outputs should be reviewed by qualified medical professionals. The model's reasoning capabilities are intended to support medical education and research, not replace clinical judgment.

Release Date:

Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA

Model Architecture:

Architecture Type: Transformer
Network Architecture: Vision-Language Model based on Qwen2.5-VL architecture with medical reasoning capabilities

This model was developed by fine-tuning Qwen2.5-VL-3B using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) for enhanced medical reasoning. Number of model parameters: 3B

Input:

Input Type(s): Image, Text
Input Format(s): Medical images (JPEG, PNG), Text prompts (string)
Input Parameters: Two-Dimensional (2D) images with accompanying text queries (1D)
Other Properties Related to Input: Supports frontal chest X-ray images with flexible scaling. Accepts natural language prompts for medical queries, follow-up questions, and reasoning requests. Input images are automatically processed without specific size constraints.

Input Specifications:

Medical Images: Chest X-ray images in standard medical imaging formats
Text Prompts: Natural language queries about radiological findings, diagnostic questions, or requests for detailed explanations
Interactive Dialogue: Support for follow-up questions and clarification requests

Output:

Output Type(s): Text
Output Format: Structured reasoning with XML-like tags
Output Parameters: One-Dimensional (1D) Natural language reasoning and analysis
Other Properties Related to Output: Outputs contain structured thinking processes enclosed in <thinking> tags showing step-by-step medical reasoning, followed by concise answers in <answer> tags. This format enables transparency in the model's diagnostic reasoning process and supports educational use cases.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

PyTorch
Transformers library

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Hopper
NVIDIA Lovelace

Supported Operating System(s):

Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

0.1 - Initial release version for chest X-ray reasoning and interpretation with structured thinking output

Training, Testing, and Evaluation Datasets:

Dataset Overview:

Large-scale chest X-ray datasets including MIMIC-CXR, ChestXRay14, and CheXpert.

Training Dataset:

Data Modality:

Image
Text

Inference:

Acceleration Engine: PyTorch, Transformers Test Hardware:

A100
H100
L40S

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Please report model quality, risk, security vulnerabilities or concerns here.

Downloads last month: 168

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for nvidia/NV-Reason-CXR-3B

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

(525)

this model

Space using nvidia/NV-Reason-CXR-3B 1

Collection including nvidia/NV-Reason-CXR-3B

Clara-Medical

Collection

NVIDIA Clara Open Models for medical imaging AI: segment, generate, and reason across CT, MRI, and X-ray. Built on MONAI by NVIDIA. • 7 items • Updated 6 days ago • 3