Butterflow Normalized Event Model

This document describes the canonical event schema that all framework adapters map into. The schema is intentionally small and stable; framework-specific details belong in the raw escape-hatch field.

Design Principles

Small surface area — only fields that are meaningful across all supported frameworks are first-class citizens.
Stable identifiers — event_type, run_id, and adapter_id are required on every event.
Extensibility — the raw dict carries adapter-specific payloads. Expectations MUST NOT depend on raw; it exists for debugging and future adapters only.
JSON-native — every event round-trips through json.dumps / json.loads losslessly for core fields.

Event Types

Lifecycle

Event	When emitted	Key fields
`RunStarted`	Run begins	—
`RunCompleted`	Run ends successfully	—
`RunFailed`	Run ends with an error	`error: str`
`InputReceived`	User input arrives	`content: str`
`MessageSent`	Message dispatched (e.g. to LLM)	`content: str`, `role: str`

Agent orchestration

Event	When emitted	Key fields
`AgentSelected`	An agent is chosen for the next step	`agent_name: str`
`AgentDelegated`	Work is handed to another agent	`agent_name: str`

Tool use

Event	When emitted	Key fields
`ToolCalled`	A tool is invoked	`tool_name: str`, `args: dict`
`ToolReturned`	A tool returns	`tool_name: str`, `result: Any`

State & artifacts

Event	When emitted	Key fields
`StateUpdated`	Internal state changes	`key: str`, `value: Any`
`ArtifactCreated`	A file/image/etc. is produced	`name: str`, `content: str`, `mime_type: str`
`JudgeEvaluated`	An evaluator produces a score	`score: float \| int \| None`, `reason: str`

Model telemetry

Event	When emitted	Key fields
`ModelRequestCaptured`	An LLM request is captured	`model: str`, `messages: list[dict]`, `token_count: int \| None`
`ModelResponseReceived`	An LLM response arrives	`model: str`, `content: str`, `token_count: int \| None`

token_count is optional metadata intended for future planner use (cost estimation, context-window tracking, etc.).

Universal vs. Framework-Specific vs. Adapter Metadata

Universal fields (guaranteed on every event)

event_type — canonical class name (e.g. ToolCalled)
run_id — stable identifier for the run under test
adapter_id — identifier of the adapter that emitted the event
timestamp — ISO-8601 string or None

Framework-specific data

Anything that does not fit the normalized schema (e.g. LangChain RunnableConfig, OpenAI response_format, CrewAI task_output) is placed in the raw dict. Adapters are free to populate raw as needed, but expectations must ignore it.

Adapter metadata

Adapters may add their own metadata to raw for debugging:

{
  "event_type": "ToolCalled",
  "run_id": "r-123",
  "adapter_id": "langchain",
  "tool_name": "search",
  "args": {"q": "weather"},
  "raw": {
    "_adapter_version": "1.2.3",
    "_source_file": "agent.py:42"
  }
}

Serialization

All events are dataclass instances. Use the helper functions in butterflow.event for round-tripping:

from butterflow.event import event_to_dict, event_from_dict, ToolCalled

event = ToolCalled(run_id="r1", adapter_id="py", tool_name="echo", args={"msg": "hi"})
d = event_to_dict(event)
restored = event_from_dict(d)
assert restored.tool_name == "echo"

Core fields are guaranteed to survive serialization. The result and value fields (typed as Any) may lose type fidelity through JSON but will retain structure for JSON-serializable values.

Known Framework Mapping Constraints

Framework	Mapping notes
LangChain / LangGraph	`AgentSelected` maps to graph node entry; `ToolCalled` maps to `ToolMessage` with `name` and `args` extracted from `additional_kwargs`.
CrewAI	Tasks map to `AgentDelegated`; tools map directly to `ToolCalled`. CrewAI does not expose per-request token counts natively, so `ModelRequestCaptured.token_count` may be `None`.
AutoGen	Agent hand-offs emit `AgentSelected`; tool invocations emit `ToolCalled`. Group-chat broadcasts may produce multiple `MessageSent` events.
OpenAI Agents SDK	`agent.run` spans map to `RunStarted` … `RunCompleted`. Tool calls are extracted from the `required_action` payload.
Python harness	Synthetic; emits all event types directly. Used for unit tests and deterministic scenarios.