Butterflow Normalized Event Model

This document describes the canonical event schema that all framework adapters map into. The schema is intentionally small and stable; framework-specific details belong in the raw escape-hatch field.

Design Principles

  1. Small surface area — only fields that are meaningful across all supported frameworks are first-class citizens.
  2. Stable identifiersevent_type, run_id, and adapter_id are required on every event.
  3. Extensibility — the raw dict carries adapter-specific payloads. Expectations MUST NOT depend on raw; it exists for debugging and future adapters only.
  4. JSON-native — every event round-trips through json.dumps / json.loads losslessly for core fields.

Event Types

Lifecycle

Event When emitted Key fields
RunStarted Run begins
RunCompleted Run ends successfully
RunFailed Run ends with an error error: str
InputReceived User input arrives content: str
MessageSent Message dispatched (e.g. to LLM) content: str, role: str

Agent orchestration

Event When emitted Key fields
AgentSelected An agent is chosen for the next step agent_name: str
AgentDelegated Work is handed to another agent agent_name: str

Tool use

Event When emitted Key fields
ToolCalled A tool is invoked tool_name: str, args: dict
ToolReturned A tool returns tool_name: str, result: Any

State & artifacts

Event When emitted Key fields
StateUpdated Internal state changes key: str, value: Any
ArtifactCreated A file/image/etc. is produced name: str, content: str, mime_type: str
JudgeEvaluated An evaluator produces a score score: float | int | None, reason: str

Model telemetry

Event When emitted Key fields
ModelRequestCaptured An LLM request is captured model: str, messages: list[dict], token_count: int | None
ModelResponseReceived An LLM response arrives model: str, content: str, token_count: int | None

token_count is optional metadata intended for future planner use (cost estimation, context-window tracking, etc.).

Universal vs. Framework-Specific vs. Adapter Metadata

Universal fields (guaranteed on every event)

Framework-specific data

Anything that does not fit the normalized schema (e.g. LangChain RunnableConfig, OpenAI response_format, CrewAI task_output) is placed in the raw dict. Adapters are free to populate raw as needed, but expectations must ignore it.

Adapter metadata

Adapters may add their own metadata to raw for debugging:

{
  "event_type": "ToolCalled",
  "run_id": "r-123",
  "adapter_id": "langchain",
  "tool_name": "search",
  "args": {"q": "weather"},
  "raw": {
    "_adapter_version": "1.2.3",
    "_source_file": "agent.py:42"
  }
}

Serialization

All events are dataclass instances. Use the helper functions in butterflow.event for round-tripping:

from butterflow.event import event_to_dict, event_from_dict, ToolCalled

event = ToolCalled(run_id="r1", adapter_id="py", tool_name="echo", args={"msg": "hi"})
d = event_to_dict(event)
restored = event_from_dict(d)
assert restored.tool_name == "echo"

Core fields are guaranteed to survive serialization. The result and value fields (typed as Any) may lose type fidelity through JSON but will retain structure for JSON-serializable values.

Known Framework Mapping Constraints

Framework Mapping notes
LangChain / LangGraph AgentSelected maps to graph node entry; ToolCalled maps to ToolMessage with name and args extracted from additional_kwargs.
CrewAI Tasks map to AgentDelegated; tools map directly to ToolCalled. CrewAI does not expose per-request token counts natively, so ModelRequestCaptured.token_count may be None.
AutoGen Agent hand-offs emit AgentSelected; tool invocations emit ToolCalled. Group-chat broadcasts may produce multiple MessageSent events.
OpenAI Agents SDK agent.run spans map to RunStartedRunCompleted. Tool calls are extracted from the required_action payload.
Python harness Synthetic; emits all event types directly. Used for unit tests and deterministic scenarios.