Papers: Quantization - a mdouglas Collection

mdouglas 's Collections

Datasets: NeurIPS LLM Challenge 2023

Papers

Papers: GEC/Revision

Papers: Instruct

Papers: MoE/Ensemble

Papers: Evaluation

Papers: Quantization

Papers: Pruning

Papers: LLM as a Judge

llm.c

Papers: Quantization

updated May 27

FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 33
LLM-FP4: 4-Bit Floating-Point Quantized Transformers

Paper • 2310.16836 • Published Oct 25, 2023 • 14
TEQ: Trainable Equivalent Transformation for Quantization of LLMs

Paper • 2310.10944 • Published Oct 17, 2023 • 10
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper • 2306.00978 • Published Jun 1, 2023 • 11
LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

Paper • 2305.18403 • Published May 28, 2023 • 2
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Paper • 2211.10438 • Published Nov 18, 2022 • 6
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 8
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Paper • 2208.07339 • Published Aug 15, 2022 • 5
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10
Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10 • 32
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Paper • 2412.14590 • Published Dec 19, 2024 • 14