Butterflow Normalized Event Model
This document describes the canonical event schema that all framework adapters
map into. The schema is intentionally small and stable; framework-specific
details belong in the raw escape-hatch field.
Design Principles
- Small surface area — only fields that are meaningful across all supported frameworks are first-class citizens.
- Stable identifiers —
event_type,run_id, andadapter_idare required on every event. - Extensibility — the
rawdict carries adapter-specific payloads. Expectations MUST NOT depend onraw; it exists for debugging and future adapters only. - JSON-native — every event round-trips through
json.dumps/json.loadslosslessly for core fields.
Event Types
Lifecycle
| Event | When emitted | Key fields |
|---|---|---|
RunStarted |
Run begins | — |
RunCompleted |
Run ends successfully | — |
RunFailed |
Run ends with an error | error: str |
InputReceived |
User input arrives | content: str |
MessageSent |
Message dispatched (e.g. to LLM) | content: str, role: str |
Agent orchestration
| Event | When emitted | Key fields |
|---|---|---|
AgentSelected |
An agent is chosen for the next step | agent_name: str |
AgentDelegated |
Work is handed to another agent | agent_name: str |
Tool use
| Event | When emitted | Key fields |
|---|---|---|
ToolCalled |
A tool is invoked | tool_name: str, args: dict |
ToolReturned |
A tool returns | tool_name: str, result: Any |
State & artifacts
| Event | When emitted | Key fields |
|---|---|---|
StateUpdated |
Internal state changes | key: str, value: Any |
ArtifactCreated |
A file/image/etc. is produced | name: str, content: str, mime_type: str |
JudgeEvaluated |
An evaluator produces a score | score: float | int | None, reason: str |
Model telemetry
| Event | When emitted | Key fields |
|---|---|---|
ModelRequestCaptured |
An LLM request is captured | model: str, messages: list[dict], token_count: int | None |
ModelResponseReceived |
An LLM response arrives | model: str, content: str, token_count: int | None |
token_count is optional metadata intended for future planner use (cost
estimation, context-window tracking, etc.).
Universal vs. Framework-Specific vs. Adapter Metadata
Universal fields (guaranteed on every event)
event_type— canonical class name (e.g.ToolCalled)run_id— stable identifier for the run under testadapter_id— identifier of the adapter that emitted the eventtimestamp— ISO-8601 string orNone
Framework-specific data
Anything that does not fit the normalized schema (e.g. LangChain
RunnableConfig, OpenAI response_format, CrewAI task_output) is
placed in the raw dict. Adapters are free to populate raw as needed,
but expectations must ignore it.
Adapter metadata
Adapters may add their own metadata to raw for debugging:
{
"event_type": "ToolCalled",
"run_id": "r-123",
"adapter_id": "langchain",
"tool_name": "search",
"args": {"q": "weather"},
"raw": {
"_adapter_version": "1.2.3",
"_source_file": "agent.py:42"
}
}
Serialization
All events are dataclass instances. Use the helper functions in
butterflow.event for round-tripping:
from butterflow.event import event_to_dict, event_from_dict, ToolCalled
event = ToolCalled(run_id="r1", adapter_id="py", tool_name="echo", args={"msg": "hi"})
d = event_to_dict(event)
restored = event_from_dict(d)
assert restored.tool_name == "echo"
Core fields are guaranteed to survive serialization. The result and
value fields (typed as Any) may lose type fidelity through JSON but
will retain structure for JSON-serializable values.
Known Framework Mapping Constraints
| Framework | Mapping notes |
|---|---|
| LangChain / LangGraph | AgentSelected maps to graph node entry; ToolCalled maps to ToolMessage with name and args extracted from additional_kwargs. |
| CrewAI | Tasks map to AgentDelegated; tools map directly to ToolCalled. CrewAI does not expose per-request token counts natively, so ModelRequestCaptured.token_count may be None. |
| AutoGen | Agent hand-offs emit AgentSelected; tool invocations emit ToolCalled. Group-chat broadcasts may produce multiple MessageSent events. |
| OpenAI Agents SDK | agent.run spans map to RunStarted … RunCompleted. Tool calls are extracted from the required_action payload. |
| Python harness | Synthetic; emits all event types directly. Used for unit tests and deterministic scenarios. |