matlok
			's Collections
			 
		
			
		Papers - Custom Layers
		
	updated
			
 
				
				
	
	
	
			
			Unleashing the Power of Pre-trained Language Models for Offline
  Reinforcement Learning
		
			Paper
			
•
			2310.20587
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
		
			Paper
			
•
			2310.00535
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			Does Circuit Analysis Interpretability Scale? Evidence from Multiple
  Choice Capabilities in Chinchilla
		
			Paper
			
•
			2307.09458
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			The Impact of Depth and Width on Transformer Language Model
  Generalization
		
			Paper
			
•
			2310.19956
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Veagle: Advancements in Multimodal Representation Learning
		
			Paper
			
•
			2403.08773
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			Hash Layers For Large Sparse Models
		
			Paper
			
•
			2106.04426
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
  an Alternative to Attention Layers in Transformers
		
			Paper
			
•
			2311.10642
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			DenseFormer: Enhancing Information Flow in Transformers via Depth
  Weighted Averaging
		
			Paper
			
•
			2402.02622
			
•
			Published
				
			•
				
				3
			
 
	
	 
	
	
	
			
			The Unreasonable Ineffectiveness of the Deeper Layers
		
			Paper
			
•
			2403.17887
			
•
			Published
				
			•
				
				82
			
 
	
	 
	
	
	
			
			Lumiere: A Space-Time Diffusion Model for Video Generation
		
			Paper
			
•
			2401.12945
			
•
			Published
				
			•
				
				86
			
 
	
	 
	
	
	
			
			RWKV: Reinventing RNNs for the Transformer Era
		
			Paper
			
•
			2305.13048
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Condition-Aware Neural Network for Controlled Image Generation
		
			Paper
			
•
			2404.01143
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Locating and Editing Factual Associations in GPT
		
			Paper
			
•
			2202.05262
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			MLP Can Be A Good Transformer Learner
		
			Paper
			
•
			2404.05657
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			Toward a Better Understanding of Fourier Neural Operators: Analysis and
  Improvement from a Spectral Perspective
		
			Paper
			
•
			2404.07200
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			MegaScale: Scaling Large Language Model Training to More Than 10,000
  GPUs
		
			Paper
			
•
			2402.15627
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			Scaling MLPs: A Tale of Inductive Bias
		
			Paper
			
•
			2306.13575
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			GLIGEN: Open-Set Grounded Text-to-Image Generation
		
			Paper
			
•
			2301.07093
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			All you need is a good init
		
			Paper
			
•
			1511.06422
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
		
			Paper
			
•
			2404.16710
			
•
			Published
				
			•
				
				80
			
 
	
	 
	
	
	
			
			Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model
  Editing with Llama-3
		
			Paper
			
•
			2405.00664
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			pyvene: A Library for Understanding and Improving PyTorch Models via
  Interventions
		
			Paper
			
•
			2403.07809
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			TokenFormer: Rethinking Transformer Scaling with Tokenized Model
  Parameters
		
			Paper
			
•
			2410.23168
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Augmenting Self-attention with Persistent Memory
		
			Paper
			
•
			1907.01470
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
		
			Paper
			
•
			2412.09764
			
•
			Published
				
			•
				
				5