March 07, 2026

Lead the LLM, Don't Let It Lead You

6 min read

There is a misconception taking hold among developers using LLMs: that the code the model produces is the artifact worth keeping. It is not. The code is disposable. The prompt is the source.

The correct workflow is not prompt, accept, build on top. It is:

prompt
  → evaluate
  → reset to HEAD
  → revise prompt
  → evaluate
  → repeat

Every time you accept generated code and start layering on top of it, you are accumulating debt against a foundation you did not write, do not fully understand, and cannot efficiently modify. You have traded authorship for speed and lost both.

Why the code has no value

An LLM will produce working code on the first try often enough to be dangerous. The problem is that “working” is the lowest bar. The generated code might be correct but:

Structured in a way that fights the rest of your codebase
Full of unnecessary abstractions “just in case”
Using patterns the model was trained on rather than patterns that fit your problem
Subtly wrong in ways that only surface later

None of this matters if you treat the code as a disposable proof of concept. All of it matters if you commit it and move on.

The prompt is the artifact

When you reset to HEAD and revise your prompt instead of patching the output, you are doing something that looks wasteful but is actually efficient. You are:

Keeping your specification clean. The prompt is a declarative description of what you want. The code is one possible implementation. When you fix the code directly, your specification and implementation diverge immediately.
Getting a fresh generation every time. LLMs do not carry the baggage of their previous attempts unless you let them. A revised prompt produces revised code — not a patch on top of a patch.
Staying in control. You are editing a document you wrote (the prompt) instead of editing a document the machine wrote (the code). One of these you understand completely. The other you are guessing at.

When to stop iterating

You stop when the generated code meets your standards on a clean read. When you can look at the output, understand every line, and judge it as something you would have written yourself given enough time. That is the commit point — not before.

This means the prompt has to get specific. But not too specific.

The prompt balance

Vague prompts produce vague code. That much is obvious. But the instinct to fix this by writing maximally detailed prompts creates its own problems.

An overly concise prompt leaves too much to the model’s discretion and you get output shaped by its training data rather than your intent. But an overly elaborate prompt introduces a subtler risk: you start encoding your own assumptions as constraints, and some of those assumptions are wrong. This is human tech debt — knowledge that is outdated, incomplete, or simply incorrect — leaking into the specification. The model would have made a better choice if you had not told it otherwise.

Worse, long and detailed prompts tend to accumulate conflicting reasoning. You specify one thing in paragraph two and contradict it in paragraph six. You may not notice, but the model will be pulled in both directions. LLMs are not deterministic to begin with — the same prompt will always produce some variance in output. But a prompt full of internal contradictions amplifies this dramatically. Instead of reasonable variation, you get wildly divergent outputs from the same input. The generation becomes unpredictable in ways that make evaluation harder and iteration slower.

The sweet spot is a prompt that is precise about what matters and silent about what does not. State the constraints that actually constrain. Describe the behaviour you need. Leave the implementation decisions you do not care about to the model — it may know more current patterns than you do. Each iteration should make the prompt more precise, not longer. If your prompt is growing but your output quality is not improving, you are adding noise, not signal.

Scaling up: the two-phase approach

Everything above works well for small, self-contained projects — a CLI tool, a single module, a script. But once a project has real scope, ad hoc prompts stop scaling. You need to formalise them.

The approach is two phases. In the first phase, you are not writing code at all. You are iterating on documents: a SPEC.md that defines what the system does, and an ARCHITECTURE.md that defines how it is structured. You use the same reset-and-revise loop, but what you are evaluating is the documents themselves, not code. You iterate until these documents are precise, consistent, and complete enough to serve as the source of truth for the entire project.

In the second phase, when you generate code, the LLM works under these documents. Every prompt references them. Every evaluation checks the output against them. The documents are the constitution; the code is legislation that must comply with it.

These documents will evolve as new features are added — they are not frozen after the first phase. But they must always be held to the highest standard, because they are what keep the LLM from steering off course. Without them, each generation drifts further from your intent. With them, you have guardrails that compound in value over time.

This only works if you protect the documents. When you iterate on features or changes, the LLM will sometimes want to modify the spec or the architecture to accommodate its implementation. This is where you need to be most critical. Changes to these documents should either be made without an LLM — by you, deliberately — or subjected to a much higher degree of scrutiny than changes to code. A bad line of code the LLM can rewrite by refining a prompt. A bad line in your spec is a policy error that propagates into every future generation.

The spec drifts, the project drifts. Hold the model accountable to the documents, not the other way around.

The implication

Whether it is a one-off prompt or a SPEC.md that governs an entire project, the artifact that matters is the one you wrote — not the one the model generated. The skill is not “using an LLM.” It is writing precise specifications under constraints. That is the same skill it has always been. The tool changed. The job did not.