 mdouglas
			's Collections
			mdouglas
			's Collections
			
			
		Papers: Quantization
		
	updated
			
 
				
				
 - FP8-LM: Training FP8 Large Language Models- 
			Paper
			 •- 
			2310.18313
			 •
			Published
				
			•- 
				33
			 
 - LLM-FP4: 4-Bit Floating-Point Quantized Transformers- 
			Paper
			 •- 
			2310.16836
			 •
			Published
				
			•- 
				14
			 
 - TEQ: Trainable Equivalent Transformation for Quantization of LLMs- 
			Paper
			 •- 
			2310.10944
			 •
			Published
				
			•- 
				10
			 
 - ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with
  Modular Quantizers- 
			Paper
			 •- 
			2309.16119
			 •
			Published
				
			•- 
				1
			 
 - AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration- 
			Paper
			 •- 
			2306.00978
			 •
			Published
				
			•- 
				11
			 
 - LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning- 
			Paper
			 •- 
			2305.18403
			 •
			Published
				
			•- 
				2
			 
 - SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models- 
			Paper
			 •- 
			2211.10438
			 •
			Published
				
			•- 
				6
			 
 - GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
  Transformers- 
			Paper
			 •- 
			2210.17323
			 •
			Published
				
			•- 
				8
			 
 - LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale- 
			Paper
			 •- 
			2208.07339
			 •
			Published
				
			•- 
				5
			 
 - Optimize Weight Rounding via Signed Gradient Descent for the
  Quantization of LLMs- 
			Paper
			 •- 
			2309.05516
			 •
			Published
				
			•- 
				10
			 
 - 
			Paper
			 •- 
			2502.06786
			 •
			Published
				
			•- 
				32
			 
 - MixLLM: LLM Quantization with Global Mixed-precision between
  Output-features and Highly-efficient System Design- 
			Paper
			 •- 
			2412.14590
			 •
			Published
				
			•- 
				14