stablegravity
			's Collections
			 
		
			
		aigc
		
	updated
			
 
				
				
	
	
	
			
			VideoBooth: Diffusion-based Video Generation with Image Prompts
		
			Paper
			
•
			2312.00777
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			MotionCtrl: A Unified and Flexible Motion Controller for Video
  Generation
		
			Paper
			
•
			2312.03641
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			GenTron: Delving Deep into Diffusion Transformers for Image and Video
  Generation
		
			Paper
			
•
			2312.04557
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			DreamVideo: Composing Your Dream Videos with Customized Subject and
  Motion
		
			Paper
			
•
			2312.04433
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
  and Adapters with Decoupled Consistency Learning
		
			Paper
			
•
			2402.00769
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Motion-I2V: Consistent and Controllable Image-to-Video Generation with
  Explicit Motion Modeling
		
			Paper
			
•
			2401.15977
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with
  Prototypical Embedding
		
			Paper
			
•
			2401.15708
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent
  Diffusion Models for Virtual Try-All
		
			Paper
			
•
			2401.13795
			
•
			Published
				
			•
				
				68
			
 
	
	 
	
	
	
			
			Deconstructing Denoising Diffusion Models for Self-Supervised Learning
		
			Paper
			
•
			2401.14404
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			BootPIG: Bootstrapping Zero-shot Personalized Image Generation
  Capabilities in Pretrained Diffusion Models
		
			Paper
			
•
			2401.13974
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic
  Image Restoration In the Wild
		
			Paper
			
•
			2401.13627
			
•
			Published
				
			•
				
				77
			
 
	
	 
	
	
	
			
			Lumiere: A Space-Time Diffusion Model for Video Generation
		
			Paper
			
•
			2401.12945
			
•
			Published
				
			•
				
				86
			
 
	
	 
	
	
	
			
			Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated
  Text
		
			Paper
			
•
			2401.12070
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			StreamVoice: Streamable Context-Aware Language Modeling for Real-time
  Zero-Shot Voice Conversion
		
			Paper
			
•
			2401.11053
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
  Diffusion Transformers
		
			Paper
			
•
			2401.11605
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
		
			Paper
			
•
			2401.10891
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			Medusa: Simple LLM Inference Acceleration Framework with Multiple
  Decoding Heads
		
			Paper
			
•
			2401.10774
			
•
			Published
				
			•
				
				59
			
 
	
	 
	
	
	
			
			Synthesizing Moving People with 3D Control
		
			Paper
			
•
			2401.10889
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			WorldDreamer: Towards General World Models for Video Generation via
  Predicting Masked Tokens
		
			Paper
			
•
			2401.09985
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			ActAnywhere: Subject-Aware Video Background Generation
		
			Paper
			
•
			2401.10822
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			VideoCrafter2: Overcoming Data Limitations for High-Quality Video
  Diffusion Models
		
			Paper
			
•
			2401.09047
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			InstantID: Zero-shot Identity-Preserving Generation in Seconds
		
			Paper
			
•
			2401.07519
			
•
			Published
				
			•
				
				57
			
 
	
	 
	
	
	
			
			Chain-of-Thought Reasoning Without Prompting
		
			Paper
			
•
			2402.10200
			
•
			Published
				
			•
				
				109
			
 
	
	 
	
	
	
			
			Design2Code: How Far Are We From Automating Front-End Engineering?
		
			Paper
			
•
			2403.03163
			
•
			Published
				
			•
				
				97
			
 
	
	 
	
	
	
			
			FlashFace: Human Image Personalization with High-fidelity Identity
  Preservation
		
			Paper
			
•
			2403.17008
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			KAN: Kolmogorov-Arnold Networks
		
			Paper
			
•
			2404.19756
			
•
			Published
				
			•
				
				115
			
 
	
	 
	
	
	
			
			InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
		
			Paper
			
•
			2404.19427
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			Octopus v4: Graph of language models
		
			Paper
			
•
			2404.19296
			
•
			Published
				
			•
				
				118
			
 
	
	 
	
	
	
			
			Make Your LLM Fully Utilize the Context
		
			Paper
			
•
			2404.16811
			
•
			Published
				
			•
				
				55
			
 
	
	 
	
	
	
			
			ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity
  Preserving
		
			Paper
			
•
			2404.16771
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			PuLID: Pure and Lightning ID Customization via Contrastive Alignment
		
			Paper
			
•
			2404.16022
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			FlowMind: Automatic Workflow Generation with LLMs
		
			Paper
			
•
			2404.13050
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
  Synthesis
		
			Paper
			
•
			2404.13686
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Dynamic Typography: Bringing Words to Life
		
			Paper
			
•
			2404.11614
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			Toward Self-Improvement of LLMs via Imagination, Searching, and
  Criticizing
		
			Paper
			
•
			2404.12253
			
•
			Published
				
			•
				
				55
			
 
	
	 
	
	
	
			
			ControlNet++: Improving Conditional Controls with Efficient Consistency
  Feedback
		
			Paper
			
•
			2404.07987
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			Rho-1: Not All Tokens Are What You Need
		
			Paper
			
•
			2404.07965
			
•
			Published
				
			•
				
				93
			
 
	
	 
	
	
	
			
			RULER: What's the Real Context Size of Your Long-Context Language
  Models?
		
			Paper
			
•
			2404.06654
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			ByteEdit: Boost, Comply and Accelerate Generative Image Editing
		
			Paper
			
•
			2404.04860
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual
  Editing
		
			Paper
			
•
			2404.05717
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
		
			Paper
			
•
			2404.05014
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			SpatialTracker: Tracking Any 2D Pixels in 3D Space
		
			Paper
			
•
			2404.04319
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
		
			Paper
			
•
			2404.03715
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			Stream of Search (SoS): Learning to Search in Language
		
			Paper
			
•
			2404.03683
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			Social Skill Training with Large Language Models
		
			Paper
			
•
			2404.04204
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
		
			Paper
			
•
			2404.02905
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			Advancing LLM Reasoning Generalists with Preference Trees
		
			Paper
			
•
			2404.02078
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
  Generation
		
			Paper
			
•
			2405.01434
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			MLCM: Multistep Consistency Distillation of Latent Diffusion Model
		
			Paper
			
•
			2406.05768
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
		
			Paper
			
•
			2406.09414
			
•
			Published
				
			•
				
				103
			
 
	
	 
	
	
	
			
			RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D
  Facial Prior-guided Identity Alignment Network
		
			Paper
			
•
			2406.18284
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			GenCA: A Text-conditioned Generative Model for Realistic and Drivable
  Codec Avatars
		
			Paper
			
•
			2408.13674
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Click2Mask: Local Editing with Dynamic Mask Generation
		
			Paper
			
•
			2409.08272
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with
  Mixture of Score Guidance
		
			Paper
			
•
			2412.05355
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			Around the World in 80 Timesteps: A Generative Approach to Global Visual
  Geolocation
		
			Paper
			
•
			2412.06781
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			PanoDreamer: 3D Panorama Synthesis from a Single Image
		
			Paper
			
•
			2412.04827
			
•
			Published
				
			•
				
				11