Bee-8B-RL / README.md

update readme about vLLM

17774f4 10 days ago

9.69 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen3-8B
	pipeline_tag: image-text-to-text
	tags:
	- Bee-8B
	- Fully-Open-MLLMs
	datasets:
	- Open-Bee/Honey-Data-15M
	library_name: transformers
	---
	# Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

	[[🏠 Homepage](https://open-bee.github.io/)] [[📖 Arxiv Paper](https://arxiv.org/pdf/2510.13795)] [[🤗 Models](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995)] [[🤗 Datasets(coming soon)](https://huggingface.co/datasets/Open-Bee/Honey-Data-15M)] [[💻 Code(coming soon)](https://github.com/Open-Bee)]

	## Introduction

	We introduce Bee-8B, a new state-of-the-art, fully open 8B Multimodal Large Language Model (MLLM) designed to close the performance gap with proprietary models by focusing on data quality.

	Bee-8B is trained on our new Honey-Data-15M corpus, a high-quality supervised fine-tuning (SFT) dataset of approximately 15 million samples. This dataset was meticulously created with our transparent, adaptable, and open-source data curation pipeline, HoneyPipe, which systematically cleans noisy data and enriches it with a novel dual-level (short and long) Chain-of-Thought (CoT) strategy.

	This dataset enables Bee-8B to achieve exceptional performance, particularly in complex reasoning, establishing a new standard for fully open MLLMs.

	## Key Features

	- High-Quality, Large-Scale Dataset: We release Honey-Data-15M, a new 15M-sample SFT corpus. It has undergone extensive cleaning to remove widespread noise and has been enriched with dual-level CoT reasoning to enhance advanced problem-solving capabilities.
	- Fully Open-Source Data Curation Suite: We provide not just the data, but the entire methodology. HoneyPipe and its underlying framework DataStudio offer the community a transparent and reproducible pipeline, moving beyond static dataset releases.
	- State-of-the-Art Open Model: Our model, Bee-8B, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data.

	## News
	- [2025.10.20] 🚀 vLLM Support is Here! Bee-8B now supports high-performance inference with [vLLM](https://github.com/vllm-project/vllm), enabling faster and more efficient deployment for production use cases.

	- [2025.10.13] 🐝 Bee-8B is Released\! Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995).

	## Quickstart

	> [!NOTE]
	> Below, we provide simple examples to show how to use Bee-8B with 🤗 Transformers.
	> You can dynamically control the model's response by selecting one of two modes: set `enable_thinking=True` for `thinking` mode, or `enable_thinking=False` for `non-thinking` mode. The default is `thinking` mode.


	### Using 🤗 Transformers to Chat

	```python
	import requests
	import torch
	from PIL import Image
	from transformers import AutoModel, AutoProcessor

	model_path = "Open-Bee/Bee-8B-RL"

	# Load model
	model = AutoModel.from_pretrained(
	model_path,
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	).to("cuda")

	# Load processor
	processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

	# Define conversation messages
	messages = [{
	"role":
	"user",
	"content": [
	{
	"type": "image",
	"image": "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png",
	},
	{
	"type": "text",
	"text": "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
	},
	],
	}]

	# Apply chat template
	text = processor.apply_chat_template(messages,
	tokenize=False,
	add_generation_prompt=True,
	enable_thinking=True)

	# Load image
	image_url = "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
	image = Image.open(requests.get(image_url, stream=True).raw)

	# Process inputs
	inputs = processor(images=image, text=text, return_tensors="pt").to("cuda")

	# Generate output
	generated_ids = model.generate(**inputs, max_new_tokens=16384, temperature=0.6)
	output_ids = generated_ids[0][len(inputs.input_ids[0]):]

	# Decode output
	output_text = processor.decode(output_ids, skip_special_tokens=True)

	# Print result
	print(output_text)
	```

	### Using vLLM for High-Performance Inference

	#### Install vLLM

	> [!IMPORTANT]
	> Bee-8B support will be officially available in vLLM v0.11.1. Until then, please install vLLM from source:

	```bash
	git clone https://github.com/vllm-project/vllm.git
	cd vllm
	VLLM_USE_PRECOMPILED=1 uv pip install --editable .
	```

	Once vLLM v0.11.1 is released, you will be able to install it directly via pip:
	```bash
	pip install vllm>=0.11.1
	```


	#### Offline Inference
	```python
	from transformers import AutoProcessor
	from vllm import LLM, SamplingParams
	from PIL import Image
	import requests


	def main():

	model_path = "Open-Bee/Bee-8B-RL"

	llm = LLM(
	model=model_path,
	limit_mm_per_prompt={"image": 5},
	trust_remote_code=True,
	tensor_parallel_size=1,
	gpu_memory_utilization=0.8,
	)

	sampling_params = SamplingParams(
	temperature=0.6,
	max_tokens=16384,
	)

	image_url = "https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
	image = Image.open(requests.get(image_url, stream=True).raw)

	messages = [
	{
	"role":
	"user",
	"content": [
	{
	"type": "image",
	"image": image
	},
	{
	"type":
	"text",
	"text":
	"Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
	},
	],
	},
	]

	processor = AutoProcessor.from_pretrained(model_path,
	trust_remote_code=True)
	prompt = processor.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	enable_thinking=True,
	)

	mm_data = {"image": image}
	llm_inputs = {
	"prompt": prompt,
	"multi_modal_data": mm_data,
	}

	outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
	generated_text = outputs[0].outputs[0].text

	print(generated_text)


	if __name__ == '__main__':
	main()
	```

	#### Online Serving
	- Start the server
	```bash
	vllm serve \
	Open-Bee/Bee-8B-RL \
	--served-model-name bee-8b-rl \
	--tensor-parallel-size 8 \
	--gpu-memory-utilization 0.8 \
	--host 0.0.0.0 \
	--port 8000 \
	--trust-remote-code
	```

	- Using OpenAI Python Client to Query the server
	```python
	from openai import OpenAI

	# Set OpenAI's API key and API base to use vLLM's API server.
	openai_api_key = "EMPTY"
	openai_api_base = "http://localhost:8000/v1"

	client = OpenAI(
	api_key=openai_api_key,
	base_url=openai_api_base,
	)

	# image url
	image_messages = [
	{
	"role":
	"user",
	"content": [
	{
	"type": "image_url",
	"image_url": {
	"url":
	"https://huggingface.co/Open-Bee/Bee-8B-RL/resolve/main/assets/logo.png"
	},
	},
	{
	"type":
	"text",
	"text":
	"Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
	},
	],
	},
	]

	chat_response = client.chat.completions.create(
	model="bee-8b-rl",
	messages=image_messages,
	max_tokens=16384,
	extra_body={
	"chat_template_kwargs": {
	"enable_thinking": True
	},
	},
	)
	print("Chat response:", chat_response.choices[0].message.content)
	```

	## Experimental Results

	<figure align="center">
	<img src="assets/results.png" alt="logo"/>
	<figcaption>Evaluation of Bee-8B against other MLLMs. We distinguish between fully open (*) and semi-open (†) models. The <strong>top</strong> and <strong>second-best</strong> scores for each benchmark are highlighted.</figcaption>
	</figure>

	1. New State-of-the-Art: Bee-8B establishes a new performance standard for fully open MLLMs, proving highly competitive with recent semi-open models across a wide array of benchmarks.
	2. Excellence in Complex Reasoning: Thanks to the CoT-enriched Honey-Data-15M, Bee-8B shows its most significant advancements in complex math and reasoning. It achieves top scores on challenging benchmarks like MathVerse, LogicVista, and DynaMath.
	3. Superior Document and Chart Understanding: The model demonstrates powerful capabilities in analyzing structured visual data, securing the top rank on the CharXiv benchmark for both descriptive and reasoning questions.

	## Acknowledgements

	Bee-8B is developed based on the architectures and codebases of the following projects: [R-4B](https://huggingface.co/YannQi/R-4B), [LLaVA-OneVision](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), and evaluated using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding contributions to the open-source community.