- 
	
	
	ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsPaper • 2406.04325 • Published • 75
- 
	
	
	MoE-LLaVA: Mixture of Experts for Large Vision-Language ModelsPaper • 2401.15947 • Published • 53
- 
	
	
	Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionPaper • 2311.10122 • Published • 28
- 
	
	
	Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language ModelsPaper • 2311.16103 • Published • 1
Mikhail
Dremin
		·
				AI & ML interests
None yet
		
		Organizations
None yet
VLM
			
			
	
	- 
	
	
	ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsPaper • 2406.04325 • Published • 75
- 
	
	
	MoE-LLaVA: Mixture of Experts for Large Vision-Language ModelsPaper • 2401.15947 • Published • 53
- 
	
	
	Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionPaper • 2311.10122 • Published • 28
- 
	
	
	Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language ModelsPaper • 2311.16103 • Published • 1
			models
			0
		
			
	None public yet
			datasets
			0
		
			
	None public yet