How To Make Your Coding Agent Look Like An Idiot
Since I went into vibecoding (or AI assisted coding, I do look over the results) it's been like magic- until it wasn't.
On my first somewhat larger project that had a bit of interaction, it all started the merry old way- you tell the little man behind the terminal what you want and he goes ahaid and gets things done like the Elves of Cologne. What a time to live in.
Like a good stewart of code I was proud of having the elves make tests as well. This isn't just some one shot project after all! It's still serious engineering. I wanted to make my mark.
After a while, progress got slower, and slower. Every little change would take half a million tokens. The GLM agent was talking less and less intelligently- and mind you, GLM has some amazing things to say if you let him. But all that came out of him was "Now I see the problem clearly! But then again, it could be something entirely different."
So I took advantage of a ChatGPT special offer and all that came out of Codex was "I have tightened the input area, but this is not the complete task as the core portion of the task was left for later consideration".
I'd have tried Opus but I was beginning to suspect that this problem was mine.
Sure enough. After a lot of refactoring and deep talk I realized that- no- it's not that the coding agents collectively lost their minds, it was how I'd defined the problem.
Here's what I did. I'd say that's fairly typical software engineering.
- Create a basic version. Spaghetti code, mixed concerns, you name it- if it works, it works.
- Add some features. Still works okay.
- Add one feature too many. Breakage.
- Refactor, look at the program, and put everything into neat little packages.
Rinse and repeat. If it grows, it eventually needs to look different.
But this 4. was what I thought the agents would just handle- but they didn't. It looks more like this
- Fixing bugs takes an inordinate amount of time and breaks other things. One step forwards two steps back.
- Attempts to refactor just don't go anywhere. Explicit instructions to change get ignored. The code stubbornly sticks to broken state no matter how many tokens you throw at it.
I had arrived in test hell.
You know what your agents are trained on? Surgical edits. They read the codebase. They grasp the style. They are told very strongly- trained very strongly- not to mess witht he codebase style. They change the functionality. They treat every test as the golden truth. And they deliver the difference. Speedily.
That's how it works if you have a good architecture, a good kind of test coverage, and you just want a few small changes.
But if you are doing explorative coding, that stinks. The temporary stuff you just tossed out because you could? That's the next session's gospel now. The brittle tests that never should have seen the light of day? Those count! You want to get something done? Nope! The agents training is now working against you.
The lengths I had to go to to get codex to comply.
I was like- okay, codex, we are doing an architectural refactor. PUT ALL THE OLD TESTS AWAY, WE WILL REINSTATE THEM LATER.
And sure enough- codex was back to his old self again. GLM stopped the funny dance. They knew what was going on.
My previous problem definition had been:
Make sweeping changes to my codebase!
Keep the tests running!
And implied: Don't change the tests!
Keep the style!
That doesn't work.
So here's something a human programmer does. When it's time to make large changes to your codebase, you write up a design document with what is supposed to change, you declare the tests on limit for removal or change- or some of them. You're testing the result and not how to get there, right? Right? :)
That's the learning. Tests aren't some universal good. Tests are there for nailing down what you already have. And they're really good at that. But if you want change- you gotta ask for it.