- 
	
	
	WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion ModelPaper • 2411.17459 • Published • 12
- 
	
	
	MAGVIT: Masked Generative Video TransformerPaper • 2212.05199 • Published
- 
	
	
	Language Model Beats Diffusion -- Tokenizer is Key to Visual GenerationPaper • 2310.05737 • Published • 6
- 
	
	
	Finite Scalar Quantization: VQ-VAE Made SimplePaper • 2309.15505 • Published • 23
Inui
Norm
		AI & ML interests
Video Diffusion; Large Language Model; Object Detection; OCR
		Recent Activity
						upvoted 
								a
								paper
							
						19 days ago
						
					
						
						
						Less is More: Recursive Reasoning with Tiny Networks
						
						liked
								a model
							
						about 1 month ago
						
					
						
						
						
						rednote-hilab/dots.ocr
						
						liked
								a model
							
						2 months ago
						
					
						
						
						
						meituan-longcat/LongCat-Flash-Chat
						Organizations
TI2V Research
			
			
	
	- 
	
	
	CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerPaper • 2408.06072 • Published • 39
- 
	
	
	AtomoVideo: High Fidelity Image-to-Video GenerationPaper • 2403.01800 • Published • 23
- 
	
	
	DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video DiffusionPaper • 2411.04928 • Published • 57
- 
	
	
	AnimateAnything: Consistent and Controllable Animation for Video GenerationPaper • 2411.10836 • Published • 24
Multimodal Language Model
			What does matter besides data receipt when training a Multimodal language model?
			
	
	Language Model
			
			
	
	- 
	
	
	STaR: Bootstrapping Reasoning With ReasoningPaper • 2203.14465 • Published • 9
- 
	
	
	Scaling Laws for Neural Language ModelsPaper • 2001.08361 • Published • 9
- 
	
	
	Byte Latent Transformer: Patches Scale Better Than TokensPaper • 2412.09871 • Published • 108
- 
	
	
	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningPaper • 2501.12948 • Published • 420
Open Datasets
			Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.
			
	
	Video2Video
			
			
	
	Image / Video Gen
			Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion
			
	
	- 
	
	
	Understanding Diffusion Models: A Unified PerspectivePaper • 2208.11970 • Published
- 
	
	
	Tutorial on Diffusion Models for Imaging and VisionPaper • 2403.18103 • Published • 2
- 
	
	
	Denoising Diffusion Probabilistic ModelsPaper • 2006.11239 • Published • 6
- 
	
	
	Denoising Diffusion Implicit ModelsPaper • 2010.02502 • Published • 4
Fundamental Research
			
			
	
	- 
	
	
	Scaling Law with Learning Rate AnnealingPaper • 2408.11029 • Published • 4
- 
	
	
	Token Turing MachinesPaper • 2211.09119 • Published • 1
- 
	
	
	VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingPaper • 2203.12602 • Published
- 
	
	
	Getting ViT in Shape: Scaling Laws for Compute-Optimal Model DesignPaper • 2305.13035 • Published
Computer Vision
			Do we still need a network for specific computer vision tasks anymore today?
			
	
	VAE
			
			
	
	- 
	
	
	WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion ModelPaper • 2411.17459 • Published • 12
- 
	
	
	MAGVIT: Masked Generative Video TransformerPaper • 2212.05199 • Published
- 
	
	
	Language Model Beats Diffusion -- Tokenizer is Key to Visual GenerationPaper • 2310.05737 • Published • 6
- 
	
	
	Finite Scalar Quantization: VQ-VAE Made SimplePaper • 2309.15505 • Published • 23
Video2Video
			
			
	
	TI2V Research
			
			
	
	- 
	
	
	CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerPaper • 2408.06072 • Published • 39
- 
	
	
	AtomoVideo: High Fidelity Image-to-Video GenerationPaper • 2403.01800 • Published • 23
- 
	
	
	DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video DiffusionPaper • 2411.04928 • Published • 57
- 
	
	
	AnimateAnything: Consistent and Controllable Animation for Video GenerationPaper • 2411.10836 • Published • 24
Image / Video Gen
			Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion
			
	
	- 
	
	
	Understanding Diffusion Models: A Unified PerspectivePaper • 2208.11970 • Published
- 
	
	
	Tutorial on Diffusion Models for Imaging and VisionPaper • 2403.18103 • Published • 2
- 
	
	
	Denoising Diffusion Probabilistic ModelsPaper • 2006.11239 • Published • 6
- 
	
	
	Denoising Diffusion Implicit ModelsPaper • 2010.02502 • Published • 4
Multimodal Language Model
			What does matter besides data receipt when training a Multimodal language model?
			
	
	Fundamental Research
			
			
	
	- 
	
	
	Scaling Law with Learning Rate AnnealingPaper • 2408.11029 • Published • 4
- 
	
	
	Token Turing MachinesPaper • 2211.09119 • Published • 1
- 
	
	
	VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingPaper • 2203.12602 • Published
- 
	
	
	Getting ViT in Shape: Scaling Laws for Compute-Optimal Model DesignPaper • 2305.13035 • Published
Language Model
			
			
	
	- 
	
	
	STaR: Bootstrapping Reasoning With ReasoningPaper • 2203.14465 • Published • 9
- 
	
	
	Scaling Laws for Neural Language ModelsPaper • 2001.08361 • Published • 9
- 
	
	
	Byte Latent Transformer: Patches Scale Better Than TokensPaper • 2412.09871 • Published • 108
- 
	
	
	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement LearningPaper • 2501.12948 • Published • 420
Computer Vision
			Do we still need a network for specific computer vision tasks anymore today?
			
	
	Open Datasets
			Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.
			
	
	 
								

 
				