Tempo14
			's Collections
			 
		
			
		quantization
		
	updated
			
 
				
				
	
	
	
			
			BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
		
			Paper
			
•
			2402.04291
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			OneBit: Towards Extremely Low-bit Large Language Models
		
			Paper
			
•
			2402.11295
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			A Survey on Transformer Compression
		
			Paper
			
•
			2402.05964
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			Towards Next-Level Post-Training Quantization of Hyper-Scale
  Transformers
		
			Paper
			
•
			2402.08958
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			BitDelta: Your Fine-Tune May Only Be Worth One Bit
		
			Paper
			
•
			2402.10193
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			GPTVQ: The Blessing of Dimensionality for LLM Quantization
		
			Paper
			
•
			2402.15319
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
		
			Paper
			
•
			2403.02775
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			4-bit Shampoo for Memory-Efficient Network Training
		
			Paper
			
•
			2405.18144
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers
  in LLMs
		
			Paper
			
•
			2410.05265
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			BitNet a4.8: 4-bit Activations for 1-bit LLMs
		
			Paper
			
•
			2411.04965
			
•
			Published
				
			•
				
				69
			
 
	
	 
	
	
	
			
			"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM
  Quantization
		
			Paper
			
•
			2411.02355
			
•
			Published
				
			•
				
				51
			
 
	
	 
	
	
	
			
			NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
  of Neural Networks
		
			Paper
			
•
			2410.20650
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			BitStack: Fine-Grained Size Control for Compressed Large Language Models
  in Variable Memory Environments
		
			Paper
			
•
			2410.23918
			
•
			Published
				
			•
				
				21