Spec Authoring Guide

This guide covers the full Butterflow spec authoring API.

Flow block

A flow is the smallest unit of testable behavior. Declare one with the flow(...) context manager:

from butterflow import flow

with flow("name", subset="happy", depends_on="other_flow") as f:
    ...
Parameter Type Description
name str Human-readable flow name (must be unique in a suite).
subset str \| None Tag for filtering (e.g. happy, known-fail, smoke).
depends_on str \| list[str] \| None Flow(s) that must run before this one.

Intent and input

f.intent("What this flow is supposed to do.")
f.input("The user message that triggers it.")

Both are optional but strongly recommended. They appear in reports, docs, and cache-cluster analysis.

Expectations

Expectations are deterministic assertions checked against the normalized event stream.

Agent selection

from butterflow import expect

f.expect(expect.agent("router").selects("billing"))

Matches an AgentSelected event where agent_name == "billing".

Tool called (any args)

f.expect(expect.tool("issue_refund").called())

Matches any ToolCalled event where tool_name == "issue_refund".

Tool called with specific arguments

f.expect(expect.tool("lookup_invoice").called_with(invoice_id="123"))

Matches a ToolCalled event with the exact args dict. Mismatches produce a readable key-by-key diff.

Final response contains text

f.expect(expect.final_response().contains("refund has been issued"))

Matches the last assistant MessageSent whose content contains the text.

Complete example

from butterflow import expect, flow

with flow("refund happy path", subset="happy") as f:
    f.intent("A valid invoice refund is routed to billing and completed.")
    f.input("I need a refund for invoice 123")
    f.expect(expect.agent("router").selects("billing"))
    f.expect(expect.tool("lookup_invoice").called_with(invoice_id="123"))
    f.expect(expect.tool("issue_refund").called())
    f.expect(expect.final_response().contains("refund has been issued"))

Subsets and filtering

Run only happy flows:

butterflow run specs/ --subset happy

Known-failure flows are useful for pinning regressions:

with flow("refund denied known-fail", subset="known-fail") as f:
    f.intent("This flow is expected to fail until BUG-42 is fixed.")
    f.input("I need a refund for invoice 999")
    f.expect(expect.final_response().contains("cannot be refunded"))

Dependencies

When flows must run in order:

with flow("setup") as f:
    f.intent("Create a test account.")

with flow("teardown", depends_on="setup") as f:
    f.intent("Delete the test account.")

Dependencies affect execution order and cache-batch grouping.