Explore the Data

Compare stable rank and γ-rudder signal across model scales. Select any combination of models to overlay them on the same chart.

What changes with scale?

Per-layer geometry across 70M → 1B

70M and 160M have DPO adapters only. 410M and 1B have both DPO and CLM. Layer counts differ: 6 / 12 / 24 / 16 layers. Use "fractional depth" axis to align them.

Methods note

Stable rank: numerical protocol

All stable-rank values in these charts use the definition:

$$ \text{srank}(A) = \frac{\|A\|_F^2}{\|A\|_2^2} $$

where the Frobenius norm squared is the sum of squared singular values:

$$ \|A\|_F^2 = \sum_i \sigma_i^2, $$

and the spectral norm squared is the largest squared singular value:

$$ \|A\|_2^2 = \sigma_{\max}^2. $$

The ratio equals the number of directions that carry equal energy — it is always between 1 and rank(A).

Precision and thresholds. Adapters are loaded and SVD is computed in fp32 (upcast from fp16 checkpoint weights). Singular values below machine epsilon × max(shape) × σ_max are treated as zero for rank counting but are included in the Frobenius sum (their contribution is negligible). The stable rank formula itself has no threshold dependence.

Computing script. See papers/lazy-rudder/scripts/spectral_autopsy.py for the reference implementation (function layer_stats).

Seed robustness

Does the signal hold across random seeds?

Two independent 1B runs: seed 42 and seed 117. Independent data draws. The curves track closely — the signal is not a seed artifact.