blog

Notes

Field notes on cloud infrastructure, AI agents, and the boring engineering that decides whether a system runs for a quarter or 5 years.

  • 01

    Troubleshooting application failures with Logfire

    A repeatable workflow for using Logfire span trees, SQL-over-traces, and OpenTelemetry semantic conventions to turn opaque application failures into one-line diagnoses — walked through a real production bug.

    tags: logfire · opentelemetry · observability · debugging · pydantic-ai

    read →

  • 02

    Smoke-testing an LLM agent with a Claude Code skill

    A Claude Code skill that smoke-tests an LLM agent end-to-end — real Gmail, real Drive, real model. Verifies that the agent does the right thing on real inputs, which ruff and pytest cannot.

    tags: testing · claude-code · ai-coding · smoke-test

    read →

  • 03

    Measuring math-glyph token compression

    I ran 30 SPEC.md rows through Claude's tokenizer to measure how much math-glyph notation compresses token count. Two numbers — ~30% encoding, ~90% reviewer-facing.

    tags: spec-driven-development · claude-code · ai-coding · benchmarks

    read →

  • 04

    Compressed spec-driven development

    I built pilot-spec to keep AI coding agents on one thread, with one spec, so I can actually track what changed, what was tested, and what broke.

    tags: spec-driven-development · claude-code · ai-coding

    read →