Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

👀 About ThinkMorph

We present ThinkMorph, a unified model fine-tuned on ∼24K high-quality interleaved reasoning traces across tasks, learning to generate progressive text–image reasoning steps that concretely manipulate visual content while maintaining coherent verbal logic.

Beyond strong vision-benchmark performance and robust out-of-domain generalization, ThinkMorph demonstrates emergent multimodal intelligence, including novel visual manipulation skills and so on. These findings suggest promising directions for characterizing the emergent capabilities of unified models for multimodal reasoning.

📊 Benchmarks

Model	Size	VSP	VisPuzzle	ChartQA	VStar	BLINK-J	MMVP	SAT	BLINK	CV-Bench
GPT-4o	–	33.50	43.75	76.34	61.78	72.67	84.67	28.00	60.28	75.61
GPT-5	–	57.33	78.00	80.85	71.73	77.33	86.33	73.30	69.86	85.46
Gemini 2.5 Flash	–	59.33	47.00	83.79	70.68	66.00	80.33	56.00	67.49	85.07
InternVL3.5	8B	8.17	34.75	76.26	68.59	71.33	76.33	45.33	59.60	81.99
	38B	20.16	36.50	80.44	76.96	80.67	80.33	49.33	62.65	85.96
Qwen2.5-VL	7B	2.16	34.75	78.12	76.44	59.33	77.33	51.33	55.92	75.20
	72B	41.83	40.00	82.03	85.86	61.33	82.00	64.67	61.91	82.54
Janus-pro	7B	0.00	33.50	43.08	38.22	50.67	63.33	22.00	38.51	67.83
Chameleon	7B	0.83	30.50	5.74	28.27	0.67	47.67	10.67	16.52	36.52
Bagel	7B	0.83*	35.00*	61.82	55.49	67.33	70.33	44.67	47.66	76.03
ThinkMorph	7B	75.83	79.00	78.10	67.02	72.00	80.33	52.67	60.07	80.82
Δ (vs Bagel)		+75.00	+44.00	+16.28	+11.53	+4.67	+10.00	+8.00	+12.41	+4.79

✍️ Citation

@misc{gu2025thinkmorphemergentpropertiesmultimodal,
      title={ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning}, 
      author={Jiawei Gu and Yunzhuo Hao and Huichen Will Wang and Linjie Li and Michael Qizhe Shieh and Yejin Choi and Ranjay Krishna and Yu Cheng},
      year={2025},
      eprint={2510.27492},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.27492}, 
}

Downloads last month: 6

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ThinkMorph/ThinkMorph-7B

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

ByteDance-Seed/BAGEL-7B-MoT

Finetuned

(13)

this model