Spec Authoring Guide
This guide covers the full Butterflow spec authoring API.
Flow block
A flow is the smallest unit of testable behavior. Declare one with the
flow(...) context manager:
from butterflow import flow
with flow("name", subset="happy", depends_on="other_flow") as f:
...
| Parameter | Type | Description |
|---|---|---|
name |
str |
Human-readable flow name (must be unique in a suite). |
subset |
str \| None |
Tag for filtering (e.g. happy, known-fail, smoke). |
depends_on |
str \| list[str] \| None |
Flow(s) that must run before this one. |
Intent and input
f.intent("What this flow is supposed to do.")
f.input("The user message that triggers it.")
Both are optional but strongly recommended. They appear in reports, docs, and cache-cluster analysis.
Expectations
Expectations are deterministic assertions checked against the normalized event stream.
Agent selection
from butterflow import expect
f.expect(expect.agent("router").selects("billing"))
Matches an AgentSelected event where agent_name == "billing".
Tool called (any args)
f.expect(expect.tool("issue_refund").called())
Matches any ToolCalled event where tool_name == "issue_refund".
Tool called with specific arguments
f.expect(expect.tool("lookup_invoice").called_with(invoice_id="123"))
Matches a ToolCalled event with the exact args dict. Mismatches produce a
readable key-by-key diff.
Final response contains text
f.expect(expect.final_response().contains("refund has been issued"))
Matches the last assistant MessageSent whose content contains the text.
Complete example
from butterflow import expect, flow
with flow("refund happy path", subset="happy") as f:
f.intent("A valid invoice refund is routed to billing and completed.")
f.input("I need a refund for invoice 123")
f.expect(expect.agent("router").selects("billing"))
f.expect(expect.tool("lookup_invoice").called_with(invoice_id="123"))
f.expect(expect.tool("issue_refund").called())
f.expect(expect.final_response().contains("refund has been issued"))
Subsets and filtering
Run only happy flows:
butterflow run specs/ --subset happy
Known-failure flows are useful for pinning regressions:
with flow("refund denied known-fail", subset="known-fail") as f:
f.intent("This flow is expected to fail until BUG-42 is fixed.")
f.input("I need a refund for invoice 999")
f.expect(expect.final_response().contains("cannot be refunded"))
Dependencies
When flows must run in order:
with flow("setup") as f:
f.intent("Create a test account.")
with flow("teardown", depends_on="setup") as f:
f.intent("Delete the test account.")
Dependencies affect execution order and cache-batch grouping.