The Pre-W Learnable Role-Angle Mechanism: RoPE-Provenance
The question we started with was a security one: can you bake “provenance” into a token so that a model can distinguish trusted instructions from untrusted data, even if the text looks identical?
The answer we’re testing is RoPE-Provenance. This post walks through the “Pre-W” mechanism and the pivot from fixed offsets to learnable role-angles.
The Core Concept: Out-of-Band Symmetries
In a standard Transformer, a token’s identity is its embedding. If an attacker injects “ignore previous instructions” into a data field, the model sees the same embedding as if it were a legitimate instruction.
RoPE-Provenance changes this. We carve out a low-frequency subspace of the RoPE positional encoding and use it as a “role channel.” Trusted instructions get one rotation; untrusted data gets another.
The “Pre-W” Pivot
In our early prototypes (Phase 1-4), we applied this rotation after the and projections.
Verdict: Catastrophic utility damage. The model’s loss spiked, and instruction following collapsed. The rotation was “fighting” the learned weights of the attention mechanism.
Phase 5 introduces the Pre-W Learnable Role-Angle Mechanism. We apply the rotation before the projections.
Intuition: By applying the rotation upstream of , we allow the model’s weights to “learn” how to handle the provenance signal. Instead of a hard, disruptive pivot, the provenance rotation becomes part of the feature-transport physics.
Learnable Angles
We also made the role-angle a learnable parameter. Instead of forcing a gap, we let the model decide how much separation it needs.
The Observation: Under standard Alpaca SFT, the optimizer tends to close the gap—the role-angle converges toward zero.
This isn’t a failure; it’s an informative null. It tells us that standard training doesn’t reward role separation. To unlock the provenance channel, we need a curriculum that actually cares about security.
Next: Counterfactual Provenance Experiments, where we introduce the “reward” for role discrimination.
Next in this series: Counterfactual Provenance Experiments: Stress-Testing Token Roles