Posts tagged #MLP
-
Part E Pivot: FFN Rotation and the Narrow-d Falsification
After the KV-cache gauge, the obvious next move was applying β-lift to FFN weights. We tested it. It failed. Here is what the RAdam convergence probe and the 1-bit generation test actually showed.