Two LLMs to Rule Them All

AI coding is still very much the wild west in a sense, at least in part because it is improving so rapidly. Not the AI specifically. The AI (Opus, ChatGPT, et al) actually tend to fluctuate. Opus has been noted for the past couple of months to have been on a stark downward trajectory. It is the tools that are improving rapidly.

Until Anthropic lobotomized Opus, I had a delightful workflow that kept me in Claude all day long, depending on agent-os for the heavy lifting (spec-shaping, task creation and implementation). But the best tooling in the world can't make up for an LLM that just had its head bashed in.

I didn't really want to pay for two models, but that's what I did: I cut my Claude subscription in half and started paying for Codex – it was a zero-sum change, although it is worth noting that even with a 20x max subscription on Claude, it is evident that Codex gives me considerably more usage at half the price.

Somewhat as a corollary to what I just said though, the best LLM in the world isn't going to make up for lousy tooling. The Codex CLI is fine, but if you need to orchestrate a large change, you need a "management layer" between yourself and the CLI. That's agent-os. But it just doesn't work particularly well in Codex.

So, my (successful) workflow for the past some weeks has been to employ Claude + agent-os for all of the /shape-spec, /write-spec, /create-tasks fun – and stop there. Then I turn all of that over to Codex, starting with a prompt to (I'm simplifying here) fix all of the oversights. There's always oversights. With that out of the way, I tell Codex to implement it, one task group at a time, with one worker at a time. It has been remarkably successful.

I trust agent-os with Claude to implement task groups in parallel. I haven't taken a stab at that with Codex. And I'm not in a hurry to break what works. A particularly large spec took just over four hours to implement. And then there was perhaps another hour or so running a QA pass to fix obvious issues. Once complete, I discovered three show-stopping bugs that I had to call out and resolve before the entire feature performed flawlessly.

Meanwhile, Nvidia has a whole host of LLMs that are free to use. Slow, but free. I've been messing around with them a bit. It looks pretty great, particularly considering the price....

Subscribe to A garage sale for your mind

Don’t miss out on the latest posts. Sign up now to get access to the library of members-only posts.
[email protected]
Subscribe