NV-Reason-CXR-3B Overview
Description:
NV-Reason-CXR-3B is a specialized vision-language model designed for medical reasoning and interpretation of chest X-ray images, with detailed explanations. The model combines visual understanding with medical reasoning capabilities, enabling healthcare professionals to access comprehensive analyses and engage in follow-up discussions about radiological findings. NV-Reason-CXR-3B provides step-by-step reasoning that mirrors clinical thinking patterns, making it valuable for educational and research applications in medical imaging.
This model is for research and development only.
💻 [Github code]
🩻 [Web Demo]
Quick start
import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
# Load the model
model_name = "nvidia/NV-Reason-CXR-3B"
model = AutoModelForImageTextToText.from_pretrained(
model_name,
torch_dtype=torch.float16,
).eval().to("cuda")
processor = AutoProcessor.from_pretrained(model_name)
# Load chest x-ray image
image = Image.open("chest_xray.png")
# Prepare input with clinical context
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": image,
},
{
"type": "text",
"text": "Find abnormalities and support devices."
}
]
}
]
# Create prompt using chat template
text = processor.apply_chat_template(messages, add_generation_prompt=True)
# Process inputs
inputs = processor(text=text, images=[image], return_tensors="pt")
inputs = inputs.to(model.device)
# Generate
generated_ids = model.generate(**inputs, max_new_tokens=2048)
# Trim and decode
trimmed_generated_ids = [
out_ids[len(in_ids):]
for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
generated_text = processor.batch_decode(
trimmed_generated_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)[0]
print("Output:")
print(generated_text)
Framework versions
- Transformers: 4.56.1
- Pytorch: 2.7.1
- Tokenizers: 0.22.0
License/Terms of Use:
NVIDIA OneWay Non-Commercial License for academic research purposes
Deployment Geography:
Global
Use Case:
Radiologists, medical students, and medical researchers would be expected to use this system for chest X-ray interpretation with detailed reasoning, educational training with AI-generated explanations, and research applications requiring explainable medical AI analyses.
Important Medical AI Considerations: This model is designed for research and educational purposes only and should not be used for clinical diagnosis or treatment decisions. All outputs should be reviewed by qualified medical professionals. The model's reasoning capabilities are intended to support medical education and research, not replace clinical judgment.
Release Date:
Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA
Model Architecture:
- Architecture Type: Transformer
- Network Architecture: Vision-Language Model based on Qwen2.5-VL architecture with medical reasoning capabilities
This model was developed by fine-tuning Qwen2.5-VL-3B using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) for enhanced medical reasoning. Number of model parameters: 3B
Input:
- Input Type(s): Image, Text
- Input Format(s): Medical images (JPEG, PNG), Text prompts (string)
- Input Parameters: Two-Dimensional (2D) images with accompanying text queries (1D)
- Other Properties Related to Input: Supports frontal chest X-ray images with flexible scaling. Accepts natural language prompts for medical queries, follow-up questions, and reasoning requests. Input images are automatically processed without specific size constraints.
Input Specifications:
- Medical Images: Chest X-ray images in standard medical imaging formats
- Text Prompts: Natural language queries about radiological findings, diagnostic questions, or requests for detailed explanations
- Interactive Dialogue: Support for follow-up questions and clarification requests
Output:
- Output Type(s): Text
- Output Format: Structured reasoning with XML-like tags
- Output Parameters: One-Dimensional (1D) Natural language reasoning and analysis
- Other Properties Related to Output: Outputs contain structured thinking processes enclosed in
<thinking>tags showing step-by-step medical reasoning, followed by concise answers in<answer>tags. This format enables transparency in the model's diagnostic reasoning process and supports educational use cases.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engine(s):
- PyTorch
- Transformers library
Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere
- NVIDIA Hopper
- NVIDIA Lovelace
Supported Operating System(s):
- Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
Model Version(s):
0.1 - Initial release version for chest X-ray reasoning and interpretation with structured thinking output
Training, Testing, and Evaluation Datasets:
Dataset Overview:
Large-scale chest X-ray datasets including MIMIC-CXR, ChestXRay14, and CheXpert.
Training Dataset:
Data Modality:
- Image
- Text
Inference:
Acceleration Engine: PyTorch, Transformers Test Hardware:
- A100
- H100
- L40S
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
Please report model quality, risk, security vulnerabilities or concerns here.
- Downloads last month
- 168
Model tree for nvidia/NV-Reason-CXR-3B
Base model
Qwen/Qwen2.5-VL-3B-Instruct