We propose a general framework for assessing sparsity robustness in modern LLMs and conduct a systematic study of activation sparsity such models. Our study reveals universal patterns of sparsity in LLMs and provides practical guidelines for model acceleration and design.
Dec 6, 2025
We propose a method to convert dense transformers to dynamic Mixture-of-Experts models, which leverages natural activation sparsity in the neural networks. Crucially, we propose to enforce activation sparsity during short (continual) training process via additional sparsity regularization, and argue for use of dynamic-k expert routing in MoEfied models. Finally, we show how with efficient implementation our method achieves computational efficiency while maintaining the performance.
Dec 1, 2024