Latest Posts
https://www.youtube.com/watch?v=LKHf54jqwvABecause of AI, I could finally complete this song.The Lyrics I had written for some years but I couldn't complete the whole song because lack of melody...
Read more →
Code: github.com/wanleung/ai-dev-teamIn short: The pipeline’s own AI agents opened a pull request with 7 critical bugs — code that couldn’t run at all. This post dissects exactly why it happened and...
Read more →
As the system grew, every new pipeline type — bug fixes, documentation, features — needed its own Python script and GitHub Actions workflow. This post covers how all of them were unified into a single watcher process where a GitHub label determines which pipeline runs. Adding a new pipeline now means writing one YAML file.
Read more →
Standard mode and TDD mode cover most use cases, but sometimes you want a custom sequence — run two review loops in a row, skip deployment tests, or run a domain-specific agent you added yourself. This post covers pipeline.yaml: a separate config file that lets you define any stage sequence with explicit loop blocks, and a drag-and-drop GUI that builds it without hand-editing YAML.
Read more →
The standard pipeline writes code first, then tests. This post covers a TDD mode that flips the order: QA writes tests before the engineers see the problem, then engineers implement against those tests, then a fix loop runs until the suite is green. It also covers how this forced a proper stage registry — replacing hardcoded stage sequences with a configurable system.
Read more →
The original pipeline was hardwired to GitHub Models. This post covers how I extracted every LLM backend into its own class with a shared interface, added a relay that automatically falls back to the next backend on connection failure, and what I learned about building resilient AI infrastructure around unreliable upstream APIs.
Read more →
Seven posts of technical depth, but what does it all mean? This closing post steps back to reflect on the lessons learned, how this project vindicates what Builder.ai was attempting, where the remaining hard problems are, and what I’m building next.
Read more →
The first draft is never the best draft — not for requirements, not for system designs. This post covers how the pipeline runs structured review-and-revise loops before any code is written: the PM rewrites the requirements based on critique, the Architect rewrites the design based on critique, up to three times each. Better inputs at the top produce dramatically better code at the bottom.
Read more →
Agents that can only read what you put in their prompt are flying blind. This post covers how agents get access to external tools — searching the web, querying GitHub, reading the codebase — and how the system automatically switches strategy based on repo size: smaller projects get the full code in the prompt, larger ones use semantic search to find what’s relevant.
Read more →
A one-way pipeline that can’t respond to feedback is just a code generator. This post covers two features that make the system a real collaborator: a feedback loop where review comments on a pull request trigger automatic code revisions, and a Q&A mechanism where the AI pauses mid-run to ask clarifying questions before building the wrong thing.
Read more →