We develop an efficient approach LLM input safety moderation using latent prototypes and demonstrate that safe and unsafe inputs are separable in the model's latent space.
Jul 1, 2026
We propose a general framework for assessing sparsity robustness in modern LLMs and conduct a systematic study of activation sparsity such models. Our study reveals universal patterns of sparsity in LLMs and provides practical guidelines for model acceleration and design.
Apr 24, 2026