MIT researchers unveiled a training-time compression technique that trims state-space AI models on the fly, promising faster and cheaper training without meaningfully sacrificing accuracy. The approach, dubbed CompreSSM, uses control-theory tools—specifically Hankel singular values—to rank the importance of model states early in training and discard low-value components for the remaining epochs. In tests, compressed models matched the accuracy of full-size counterparts while training up to 1.5x faster on image tasks; applied to Mamba architectures, speedups approached 4x by shrinking a 128-dimensional state to roughly 12. The team argues the method undercuts conventional pruning and knowledge distillation by avoiding the cost of first training a large “teacher” or running expensive spectral regularization each step. The researchers provide theoretical backing that state importance stabilizes early, offering practitioners a checkpointed “safety net” if accuracy dips. While best suited to multi-input, multi-output SSMs, the technique could extend to linear attention and other architectures. The work, accepted to ICLR 2026, was supported by academic and industry partners including Boeing and the U.S. Office of Naval Research.
Related articles:
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Efficiently Modeling Long Sequences with Structured State Spaces (S4)
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Distilling the Knowledge in a Neural Network





























