Engineering

Production systems and LLM agent infrastructure: distributed pipelines (Dask, Celery), agent observability (Arize Phoenix), Python, Django, task queues, security gates, and Linux workflow. Experiment artifacts and engineering notes below.

Projects

plan-a-way

Browser-only event planner: you say how to rearrange your day in plain language and it rewrites the calendar. The model never touches state — it returns a small set of operations, and deterministic code resolves what you meant, validates the result and shows a before/after diff you approve. No backend, bring your own key, and it ships its own eval harness across Claude, GPT and Gemini with per-case pass/fail, latency, tokens and cost.

Live app GitHub

butterflow

CLI framework for declaratively defining agent flows, running evals, and caching tokens between test runs. Combines user-flow testing and token cost optimization into one tool — same execution trace, cache aggressively, measure quality simultaneously.

GitHub Blog post

falcon

Custom typestubs that track payload annotations by source to gate unsafe ML deserialization (pickle, HDF5) at the type-checker level. Security research from huntr.com CVE work on serialization-route vulnerabilities in GenAI platforms. Lean 4 soundness proofs.

GitHub Blog series

django-simple-queue

~300 LOC task queue on Django ORM — prototype to production. Covers memory leaks from fork, pessimistic locking for exactly-once delivery, and security hardening. No Redis, no Celery: just the database you already have.

GitHub Blog series

Research

Typed Policy Rails

Architectural approach for making source, operation, risk, and policy state explicit through typed side channels rather than text-only prompts. Improves model alignment without prompt bloat by encoding context in the type system.

Microsite GitHub

Series

Infrastructure for Frontier Research

Systems for high-performance ML research: HF-streaming for large artifacts and the dual-emit data-driven paper pattern.

HF-Streaming for Large Artifacts: Scaling ML Research Apr 22, 2026
The Dual-Emit Paper Pattern: Data-Driven Manuscripts May 2, 2026
Verified Security Gates: Safe ML Deserialization May 17, 2026

View series →

Type-Stub Security Gates for ML Deserialization

Pickle is a CVE factory. falcon-secure uses Python type stubs and Lean 4 soundness proofs to gate unsafe deserialization at the type-checker level.

Pickle Is a CVE Factory; Type Stubs Are the Gate May 1, 2026
Lean 4 as a Soundness Oracle for Security Properties May 2, 2026

View series →

Production Django Task Queue

Building a ~300 LOC task queue on Django ORM from prototype to production — memory leaks, fork pitfalls, pessimistic locking, and security hardening.

django-simple-queue: 300 Lines to Replace Celery Dec 21, 2024
Hunting Memory Leaks in a Django Task Worker Mar 29, 2025
The Fork Bug and the Deploy Bug Apr 12, 2025
Pessimistic Locking: One Task, One Worker Sep 6, 2025
From Bug Fix to Production Hardening: A Refactoring Marathon Feb 1, 2026

View series →

Terminal Power User

Kitty terminal, kittens, shell integration, Starship prompt, and turning the terminal into a complete development environment.

Kitty Terminal: Why GPU Acceleration Changes Everything Oct 5, 2025
Kitty Superpowers: Kittens, Hints, and Remote Control Oct 19, 2025
Completing the Terminal Stack: Bash, Starship, and History Tools Nov 2, 2025
From Terminal to Development Environment: Putting It All Together Nov 16, 2025

View series →

Standalone notes

Butterflow: Pinning Agent Behavior with a Spec DSL

May 4, 2026

Agent evals that actually catch regressions: a Python flow/expect DSL for deterministic assertions, Arize Phoenix for fuzzy semantic evals, and cache-cluster grouping for token savings.

Human-led research + AI-written, Human-edited

Two Claude Code Power Features You Should Be Using

Jan 11, 2026

Custom status lines for ambient awareness and git worktrees for parallel AI-assisted development sessions.

Human-led research + AI-written, Human-edited

SSH in 33 Seconds? Optimizing for India-to-Bulgaria VPN Connections

Nov 30, 2025

Diagnosing and fixing SSH latency over high-latency VPN links, with focus on the TCP-over-TCP problem and connection multiplexing.

Human-led research + AI-written, Human-edited