Latest Posts

One Watcher, Any Pipeline: Label-Based Dispatch

As the system grew, every new pipeline type — bug fixes, documentation, features — needed its own Python script and GitHub Actions workflow. This post covers how all of them were unified into a single watcher process where a GitHub label determines which pipeline runs. Adding a new pipeline now means writing one YAML file.
Read more →

Blocky Pipeline: Build Your Own Stage Sequence

Standard mode and TDD mode cover most use cases, but sometimes you want a custom sequence — run two review loops in a row, skip deployment tests, or run a domain-specific agent you added yourself. This post covers pipeline.yaml: a separate config file that lets you define any stage sequence with explicit loop blocks, and a drag-and-drop GUI that builds it without hand-editing YAML.
Read more →

Write the Test First: TDD Pipeline Mode

The standard pipeline writes code first, then tests. This post covers a TDD mode that flips the order: QA writes tests before the engineers see the problem, then engineers implement against those tests, then a fix loop runs until the suite is green. It also covers how this forced a proper stage registry — replacing hardcoded stage sequences with a configurable system.
Read more →

The LLM Relay: Pluggable Backends and Auto-Failover

The original pipeline was hardwired to GitHub Models. This post covers how I extracted every LLM backend into its own class with a shared interface, added a relay that automatically falls back to the next backend on connection failure, and what I learned about building resilient AI infrastructure around unreliable upstream APIs.
Read more →

Iterating Toward Quality: Revision Loops for PM and Architect

The first draft is never the best draft — not for requirements, not for system designs. This post covers how the pipeline runs structured review-and-revise loops before any code is written: the PM rewrites the requirements based on critique, the Architect rewrites the design based on critique, up to three times each. Better inputs at the top produce dramatically better code at the bottom.
Read more →

Tools, MCP, and RAG: Giving Agents Eyes into the Codebase

Agents that can only read what you put in their prompt are flying blind. This post covers how agents get access to external tools — searching the web, querying GitHub, reading the codebase — and how the system automatically switches strategy based on repo size: smaller projects get the full code in the prompt, larger ones use semantic search to find what’s relevant.
Read more →

Closing the Loop: PR Feedback, Human Review, and Clarification Q&A

A one-way pipeline that can’t respond to feedback is just a code generator. This post covers two features that make the system a real collaborator: a feedback loop where review comments on a pull request trigger automatic code revisions, and a Q&A mechanism where the AI pauses mid-run to ask clarifying questions before building the wrong thing.
Read more →