We propose a novel generative mixture-of-GANs approach for accelerating particle detector simulations that maintains high fidelity while achieving significant computational speedups compared to traditional methods.
Aug 30, 2025
We propose a method to convert dense transformers to dynamic Mixture-of-Experts models, which leverages natural activation sparsity in the neural networks. Crucially, we propose to enforce activation sparsity during short (continual) training process via additional sparsity regularization, and argue for use of dynamic-k expert routing in MoEfied models. Finally, we show how with efficient implementation our method achieves computational efficiency while maintaining the performance.
Dec 1, 2024