blog
Notes
Field notes on cloud infrastructure, AI agents, and the boring engineering that decides whether a system runs for a quarter or 5 years.
- 01
Troubleshooting application failures with Logfire
A repeatable workflow for using Logfire span trees, SQL-over-traces, and OpenTelemetry semantic conventions to turn opaque application failures into one-line diagnoses — walked through a real production bug.
tags: logfire · opentelemetry · observability · debugging · pydantic-ai
- 02
Smoke-testing an LLM agent with a Claude Code skill
A Claude Code skill that smoke-tests an LLM agent end-to-end — real Gmail, real Drive, real model. Verifies that the agent does the right thing on real inputs, which ruff and pytest cannot.
tags: testing · claude-code · ai-coding · smoke-test
- 03
Measuring math-glyph token compression
I ran 30 SPEC.md rows through Claude's tokenizer to measure how much math-glyph notation compresses token count. Two numbers — ~30% encoding, ~90% reviewer-facing.
tags: spec-driven-development · claude-code · ai-coding · benchmarks
- 04
Compressed spec-driven development
I built pilot-spec to keep AI coding agents on one thread, with one spec, so I can actually track what changed, what was tested, and what broke.
tags: spec-driven-development · claude-code · ai-coding