Token Savings

Butterflow reduces LLM token spend by clustering flows that share a common system-prompt prefix and scheduling them back-to-back so the prefix stays hot in the provider’s prompt cache.

How cache clustering works

  1. Fingerprinting — Every flow is converted into a RequestFingerprint from its intent, input, and expectations.
  2. Prefix detection — Flows are sorted and grouped by longest shared text prefix.
  3. Clustering — Groups whose shared prefix exceeds the similarity threshold (default 50 %) become a CacheCluster.
  4. Scheduling — The token-aware scheduler reorders execution so cluster members run consecutively, maximizing cache hits.

View clusters

butterflow plan examples/ --show-cache-clusters

Example output:

Cache Clusters:
  cluster-4:
    flows: ['token plan: partial credit', 'token plan: issue refund', 'token plan: refund lookup']
    shared_prefix_tokens: 46
    cache_variables: ['expectations', 'input', 'intent']

Maximizing savings

  1. Share a system prompt — Put common instructions in every flow’s intent() so the fingerprint prefix is long and stable.
  2. Group by intent, not by input — Keep input() short and variable; the cache engine treats varying input as a dynamic variable while keeping the intent prefix cacheable.
  3. Use subsets wisely — Run all flows in a cache cluster in the same invocation so the scheduler can reorder them together.
  4. Avoid over-fragmentation — Too many unique intents reduce prefix overlap. Prefer reusable intent templates.

Estimates

The plan JSON includes token estimates:

{
  "estimates": {
    "total_input_tokens": 1200,
    "cached_tokens": 420,
    "uncached_tokens": 780
  }
}

cached_tokens is the approximate savings from prompt-cache reuse across the clustered flows.