Papers

Filip Szatkowski , Patryk Będkowski , Alessio Devoto , Jan Dubiński , Pasquale Minervini , Mikołaj Piórczyński , Simone Scardapane , Bartosz Wójcik (2025). Universal Properties of Activation Sparsity in Modern Large Language Models. In UniReps, NeurIPS 2025.

We propose a general framework for assessing sparsity robustness in modern LLMs and conduct a systematic study of activation sparsity such models. Our study reveals universal patterns of sparsity in LLMs and provides practical guidelines for model acceleration and design.

#Large Language Models #Activation Sparsity #Efficiency #Deep Learning

PDF Preprint

Piotr Kubaty , Filip Szatkowski , Metod Jazbec , Bartosz Wójcik (2025). Failure Prediction Is a Better Performance Proxy for Early-Exit Networks Than Calibration. In SPIGM, NeurIPS 2025.

We challenge the use of calibration metrics in early-exit models and show cases where calibration fails to accurately reflect the network performance. We argue for failure prediction as a more reliable performance proxy that better correlates with efficiency gains in early-exit networks.

#Early-Exits #Adaptive Computation #Calibration #Failure Prediction #Efficiency #Deep Learning

PDF Preprint

Patryk Będkowski , Jan Dubiński , Filip Szatkowski , Kamil Deja , Przemysław Rokita , Tomasz Trzciński (2025). ExpertSim: Fast Particle Detector Simulation Using Mixture-of-Generative-Experts. In ECAI 2025.

We propose a novel generative mixture-of-GANs approach for accelerating particle detector simulations that maintains high fidelity while achieving significant computational speedups compared to traditional methods.

#Particle Physics Simulation #Generative Models #Mixture of Experts #Adaptive Computation #Deep Learning

PDF Code Preprint

Filip Szatkowski , Yaoyue Zheng , Fei Yang , Bartłomiej Twardowski , Tomasz Trzciński , Joost van de Weijer (2025). Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers. In ICML 2025.

We investigates intermediate representations in neural networks during class-incremental learning and propose to leverage them via auxiliary early-exit classifiers. Interestingly, we find out that in continual learning scenarios networks enhanced with such classiers are not only more efficient, but also show improved performance and reduced forgetting across task sequences.

#Continual Learning #Adaptive Computation #Early Exits #Efficiency #Deep Learning

PDF Code Preprint

Maciej Chrabąszcz , Filip Szatkowski , Bartosz Wójcik , Jan Dubiński , Tomasz Trzciński , Sebastian Cygert (2025). Do LLMs Understand the Safety of Their Inputs? Training-Free Moderation via Latent Prototypes. In arXiv.

We develop an efficient approach LLM input safety moderation using latent prototypes and demonstrate that safe and unsafe inputs are separable in the model’s latent space.

#Large Language Models #LLM Safety #Moderation #Deep Learning

PDF Preprint

Wojciech Łapacz , Daniel Marczak , Filip Szatkowski , Tomasz Trzciński (2025). Exploring the Stability Gap in Continual Learning: The Role of the Classification Head. In WACV 2025.

We conduct an investigation into the stability gap in continual learning and identify the critical role of the classification head in continual learning. We then suggest nearest mean classifer as a potential solution for improved model stability.

#Continual Learning #Stability Gap #Deep Learning

PDF Code Preprint

Filip Szatkowski , Bartosz Wójcik , Mikołaj Piórczyński , Simone Scardapane (2024). Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion. In NeurIPS 2024.

We propose a method to convert dense transformers to dynamic Mixture-of-Experts models, which leverages natural activation sparsity in the neural networks. Crucially, we propose to enforce activation sparsity during short (continual) training process via additional sparsity regularization, and argue for use of dynamic-k expert routing in MoEfied models. Finally, we show how with efficient implementation our method achieves computational efficiency while maintaining the performance.

#Mixture of Experts #Adaptive Computation #Activation Sparsity #Efficiency #Deep Learning

PDF Code Preprint

Aleksandra Nowak , Łukasz Gniecki , Filip Szatkowski , Jacek Tabor (2024). Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization. In ICML 2024.

We develop an exact orthogonal initialization technique for static sparse training that enables more robust sparse neural network training.

#Sparse Training #Initialization #Sparsity #Deep Learning

PDF Code Preprint

Filip Szatkowski , Mateusz Pyla , Marcin Przewięźlikowski , Sebastian Cygert , Bartłomiej Twardowski , Tomasz Trzciński (2024). Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-Free Continual Learning. In WACV 2024.

We examine knowledge distillation in exemplar-free continual learning and find out that allowing the adaptation of teacher network during the learning process through batch normalization updates improves knowledge transfer across several continual learning methods.

#Continual Learning #Knowledge Distillation #Deep Learning #Computer Vision

PDF Code Preprint

Bartosz Wójcik , Marcin Przewięźlikowski , Filip Szatkowski , Maciej Wołczyk , Klaudia Bałazy , Igor Podolak , Jacek Tabor , Marek Śmieja , Tomasz Trzciński (2023). Zero Time Waste in Pre-trained Early Exit Neural Networks. In Neural Networks.

We propose Zero-Time Waste, an early exit network architecture that reduces computational waste via cascading connections between early-exit classifiers and ensembling mechanism. ZTW achieves better efficiency-accuracy trade-offs in pre-trained models and offers a practical architectural solution for deployment of early exit neural networks.

#Early Exits #Adaptive Computation #Efficiency #Deep Learning

PDF Code

Filip Szatkowski , Karol J. Piczak , Przemysław Spurek , Jacek Tabor , Tomasz Trzciński (2023). Hypernetworks Build Implicit Neural Representations of Sounds. In ECML-PKDD 2023.

We propose to use hypernetworks to generate implicit neural representations of sound signals, enabling efficient audio compression and high-quality reconstruction.

#Hypernetworks #Implicit Neural Representations #Audio Processing #Deep Learning

PDF Code Preprint

Stanisław Pawlak , Filip Szatkowski , Michał Bortkiewicz , Jan Dubiński , Tomasz Trzciński (2022). Progressive Latent Replay for Efficient Generative Rehearsal. In ICONIP 2022.

We propose progressive latent replay mechanism that enhances generative rehearsal in continual learning by efficiently managing memory and computational resources while maintaining model performance.

#Continual Learning #Generative Replay #Latent Replay #Deep Learning

PDF Preprint