Asymmetric Compliance Damage: The Cost of Isolation
One of the most surprising findings in the Phase 5 results wasn’t a success, but an Informative Failure. We call it Asymmetric Compliance Damage.
This post covers what happens when you apply a fixed rotation to the role channel, and why it selectively “kills” the trusted instruction stream while leaving the untrusted data stream intact.
The Arm: A Selective Collapse
In the Counterfactual V2 experiments, the vanilla model reached an instruction-compliance of 0.155. We expected the arm (W&B: y0033rou) to either improve this or show a general utility loss across the board.
The Verdict: The damage was asymmetric.
- INSTRUCTION slot: Compliance collapsed from 0.155 to 0.020.
- DATA slot: Performance was essentially unchanged (0.290 to 0.295).
Intuition: The rotation didn’t just “blur” the model. It selectively destroyed the model’s ability to adhere to the trusted role channel. The “provenance” rotation acted as a filter that only blocked the good signals.
The Cost-Law Framing
This updates our understanding of the “Angle Cost Law.” While the aggregate eval loss might look relatively smooth (+0.103 delta), the actual capability cost is focused.
The Claim: That a provenance mechanism should provide “free” isolation. Verdict: Falsified for post-projection fixed rotations. At this scale (SmolLM2-135M), the model cannot “un-rotate” the signal well enough to maintain instruction compliance.
Why this isn’t a “Negative Result”
In our falsification methodology, a failure that explains why it failed is more valuable than a lucky success. Asymmetric Compliance Damage tells us that post-projection rotations are fundamentally too disruptive for small-scale instruction followers.
It points directly to the solution: Pre-W placement, which we’ll cover in the Phase 5 Audit.
Next in this series: The Pi/8 Instruction Output Audit: Phase 5 Benchmarks