Posts tagged #quantization
-
Dense Activation-Fit Recovery: Healing Quantized Layers
How to recover dense performance from quantized layers using activation-fit artifacts and the recovery scripts in lean-mining.
-
The β-lift and FFN Transfer: MoE Compression Part E
Why β transfer in FFNs matters for quantization and the formal 'structure bonus' theorems in MoE compression.
-
How to Honestly Test if a Neural Network Can Be Compressed
Pre-registration, trap cells, τ-hardened baselines, and kill-fast protocols: a field methodology for compression research that tries to kill its own ideas. With actual results from OLMoE-1B-7B.
-
A Catalogue of Symmetries Compression Must Respect
Compression schemes regularly violate algebraic invariants of weight structure—producing models that pass perplexity checks but fail downstream. Here are the five core symmetry types a formally verified survey is collecting.
-
Part E Pivot: FFN Rotation and the Narrow-d Falsification
After the KV-cache gauge, the obvious next move was applying β-lift to FFN weights. We tested it. It failed. Here is what the RAdam convergence probe and the 1-bit generation test actually showed.
-
Phase-Collapse Defragmentation: Why MoE KV-Cache Resists 1-bit Quantization
Attention head activations in Mixture-of-Experts models cluster around expert routing patterns. Quantizing the KV-cache destroys this signal. The MoEGauge framework builds provable bounds on exactly how much.