The Pre-W Learnable Role-Angle Mechanism: RoPE-Provenance


The question we started with was a security one: can you bake “provenance” into a token so that a model can distinguish trusted instructions from untrusted data, even if the text looks identical?

The answer we’re testing is RoPE-Provenance. This post walks through the “Pre-W” mechanism and the pivot from fixed offsets to learnable role-angles.


The Core Concept: Out-of-Band Symmetries

In a standard Transformer, a token’s identity is its embedding. If an attacker injects “ignore previous instructions” into a data field, the model sees the same embedding as if it were a legitimate instruction.

RoPE-Provenance changes this. We carve out a low-frequency subspace of the RoPE positional encoding and use it as a “role channel.” Trusted instructions get one rotation; untrusted data gets another.

The “Pre-W” Pivot

In our early prototypes (Phase 1-4), we applied this rotation after the WqW_q and WkW_k projections.

Verdict: Catastrophic utility damage. The model’s loss spiked, and instruction following collapsed. The rotation was “fighting” the learned weights of the attention mechanism.

Phase 5 introduces the Pre-W Learnable Role-Angle Mechanism. We apply the rotation before the projections.

Intuition: By applying the rotation upstream of Wq/WkW_q/W_k, we allow the model’s weights to “learn” how to handle the provenance signal. Instead of a hard, disruptive pivot, the provenance rotation becomes part of the feature-transport physics.

Learnable Angles

We also made the role-angle θ\theta a learnable parameter. Instead of forcing a π/2\pi/2 gap, we let the model decide how much separation it needs.

The Observation: Under standard Alpaca SFT, the optimizer tends to close the gap—the role-angle converges toward zero.

This isn’t a failure; it’s an informative null. It tells us that standard training doesn’t reward role separation. To unlock the provenance channel, we need a curriculum that actually cares about security.

Next: Counterfactual Provenance Experiments, where we introduce the “reward” for role discrimination.

Next in this series: Counterfactual Provenance Experiments: Stress-Testing Token Roles