Token Savings
Butterflow reduces LLM token spend by clustering flows that share a common system-prompt prefix and scheduling them back-to-back so the prefix stays hot in the provider’s prompt cache.
How cache clustering works
- Fingerprinting — Every flow is converted into a
RequestFingerprintfrom itsintent,input, and expectations. - Prefix detection — Flows are sorted and grouped by longest shared text prefix.
- Clustering — Groups whose shared prefix exceeds the similarity threshold
(default 50 %) become a
CacheCluster. - Scheduling — The token-aware scheduler reorders execution so cluster members run consecutively, maximizing cache hits.
View clusters
butterflow plan examples/ --show-cache-clusters
Example output:
Cache Clusters:
cluster-4:
flows: ['token plan: partial credit', 'token plan: issue refund', 'token plan: refund lookup']
shared_prefix_tokens: 46
cache_variables: ['expectations', 'input', 'intent']
Maximizing savings
- Share a system prompt — Put common instructions in every flow’s
intent()so the fingerprint prefix is long and stable. - Group by intent, not by input — Keep
input()short and variable; the cache engine treats varyinginputas a dynamic variable while keeping the intent prefix cacheable. - Use subsets wisely — Run all flows in a cache cluster in the same invocation so the scheduler can reorder them together.
- Avoid over-fragmentation — Too many unique intents reduce prefix overlap. Prefer reusable intent templates.
Estimates
The plan JSON includes token estimates:
{
"estimates": {
"total_input_tokens": 1200,
"cached_tokens": 420,
"uncached_tokens": 780
}
}
cached_tokens is the approximate savings from prompt-cache reuse across the
clustered flows.