Engineering

Production systems and LLM agent infrastructure: distributed pipelines (Dask, Celery), agent observability (Arize Phoenix), Python, Django, task queues, security gates, and Linux workflow. Experiment artifacts and engineering notes below.

Projects

butterflow

CLI framework for declaratively defining agent flows, running evals, and caching tokens between test runs. Combines user-flow testing and token cost optimization into one tool — same execution trace, cache aggressively, measure quality simultaneously.

falcon

Custom typestubs that track payload annotations by source to gate unsafe ML deserialization (pickle, HDF5) at the type-checker level. Security research from huntr.com CVE work on serialization-route vulnerabilities in GenAI platforms. Lean 4 soundness proofs.

django-simple-queue

~300 LOC task queue on Django ORM — prototype to production. Covers memory leaks from fork, pessimistic locking for exactly-once delivery, and security hardening. No Redis, no Celery: just the database you already have.

Research

Typed Policy Rails

Architectural approach for making source, operation, risk, and policy state explicit through typed side channels rather than text-only prompts. Improves model alignment without prompt bloat by encoding context in the type system.

Series

Infrastructure for Frontier Research

Systems for high-performance ML research: HF-streaming for large artifacts and the dual-emit data-driven paper pattern.

View series →

Type-Stub Security Gates for ML Deserialization

Pickle is a CVE factory. falcon-secure uses Python type stubs and Lean 4 soundness proofs to gate unsafe deserialization at the type-checker level.

View series →

Production Django Task Queue

Building a ~300 LOC task queue on Django ORM from prototype to production — memory leaks, fork pitfalls, pessimistic locking, and security hardening.

View series →

Terminal Power User

Kitty terminal, kittens, shell integration, Starship prompt, and turning the terminal into a complete development environment.

View series →

Standalone notes

Butterflow: Pinning Agent Behavior with a Spec DSL

Agent evals that actually catch regressions: a Python flow/expect DSL for deterministic assertions, Arize Phoenix for fuzzy semantic evals, and cache-cluster grouping for token savings.

Two Claude Code Power Features You Should Be Using

Custom status lines for ambient awareness and git worktrees for parallel AI-assisted development sessions.

SSH in 33 Seconds? Optimizing for India-to-Bulgaria VPN Connections

Diagnosing and fixing SSH latency over high-latency VPN links, with focus on the TCP-over-TCP problem and connection multiplexing.