Compressing MoE Without Lying To Yourself

Phase-collapse defragmentation, moment-ratio bounds, FFN pivots, and zero-sorry Lean 4 theorems — a research arc through 1-bit KV-cache quantization on OLMoE-1B-7B.

Phase-Collapse Defragmentation: Why MoE KV-Cache Resists 1-bit Quantization

Apr 23, 2026

Attention head activations in Mixture-of-Experts models cluster around expert routing patterns. Quantizing the KV-cache destroys this signal. The MoEGauge framework builds provable bounds on exactly how much.

#ml-research #quantization #MoE #compression #KV-cache
Part E Pivot: FFN Rotation and the Narrow-d Falsification

Apr 25, 2026

After the KV-cache gauge, the obvious next move was applying β-lift to FFN weights. We tested it. It failed. Here is what the RAdam convergence probe and the 1-bit generation test actually showed.

#ml-research #quantization #MoE #compression #MLP
Zero-Sorry Discipline: What a Lean 4 Appendix Actually Costs

May 6, 2026

Two theorems in this paper — MoEGauge and JensenFloor — had to reach zero sorries before the paper shipped. What that process looks like, why sorry is dangerous, and what JensenFloor actually says.

#ml-research #lean4 #formal-verification #methodology
How to Honestly Test if a Neural Network Can Be Compressed

Apr 28, 2026

Pre-registration, trap cells, τ-hardened baselines, and kill-fast protocols: a field methodology for compression research that tries to kill its own ideas. With actual results from OLMoE-1B-7B.

#ml-research #compression #methodology #quantization #MoE