An agent without feedback is just a fast guesser. It writes code, sounds confident, and misses the bug right in front of it.
The fix is back pressure. Force reality into the loop. Tests fail. Types fail. Lint fails. Screenshots fail. The agent adapts or it stops.
Confidence is cheap
LLMs are trained to continue text, not to prove truth. So if your setup only asks for a diff, you get polished nonsense surprisingly often.
I stopped asking agents for "clean code." I ask for passing checks. Same task, better output.
Four pressure points that work
- Types: static checks catch structural mistakes before runtime.
- Tests: behavioral checks catch wrong assumptions.
- Linters: style and risky patterns get flagged early.
- UI evidence: screenshots show if the interface is actually right.
This is just engineering hygiene. In an agent flow, it becomes non-negotiable.
Back pressure beats longer prompts
When output drifts, most people add more prompt text. I do the opposite. Keep instructions short, strengthen checks.
Prompt text can be ignored in a crowded context window. A failing test cannot be ignored. It blocks progress immediately.
The practical loop
My default run looks like this: task, diff, tests, patch, repeat. If the agent keeps failing, I /clear, reload spec, and run again with fresh context engineering.
That one change made agent sessions calmer and cheaper. Less debate with the model. More verifiable output.