Posts tagged #KV-cache
-
Phase-Collapse Defragmentation: Why MoE KV-Cache Resists 1-bit Quantization
Attention head activations in Mixture-of-Experts models cluster around expert routing patterns. Quantizing the KV-cache destroys this signal. The MoEGauge framework builds provable bounds on exactly how much.