Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference Paper • 2405.18628 • Published May 28, 2024 • 1
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization Paper • 2503.12649 • Published Mar 16 • 1