When examples make it worse

Artifact: one example you'd remove from a current prompt because it's misleading

1. Examples have a cost

Examples aren't always net positive. There are three ways they can make the output worse, not better.

One: cherry-picked examples. If your examples don't represent the actual range of your real data, the model overfits to the slice you showed. You ask for code review comments and show three examples — all about naming. The model thinks you only care about naming and ignores the bigger bugs.

Two: surface patterns dominate. The model latches onto incidental features of your examples — capitalization, length, punctuation — and reproduces those instead of the rule. If all your examples start with a capital letter, the next output will too, even when it shouldn't.

Three: over-specification freezes the model. Too many examples constrain the answer space until the model can only produce variations on what you showed. New shapes that would be correct become impossible.

2. The cherry-pick failure

A real-shape one. Code review prompt with three examples — all naming issues.

textVibe — cherry-picked

Review the function. Style of comments to use:

> Rename 'data' to 'parsedPayload' — 'data' is too generic.
> Rename 'do()' to 'recalculateTotals()' — function names should describe behavior.
> Rename 'flag' to 'isInitialized' — booleans should read as questions.

[function with a race condition and a naming issue]

The output will be a beautifully named function with the race condition intact. The model learned: comments are about names. It missed the race condition because nothing in the examples said also look for bugs.

textSpec — diverse examples

Review the function. Examples of the kinds of comments to make:

> Rename 'flag' to 'isInitialized' — booleans should read as questions.  (naming)
> This 'await' is missing — the function returns a Promise but doesn't await it on line 42.  (correctness)
> Two threads can both pass the 'if (cache.has(key))' check before either sets it — race condition.  (concurrency)

[function with a race condition and a naming issue]

Now the model sees three categories — naming, correctness, concurrency — and applies all three lenses to the new code.

3. When to remove examples

Signs you have too many or the wrong ones:

The output is repetitive, in the shape of the examples.
The model is making the same surface mistake your examples all share (capitalization, length, etc.).
New, correct shapes don't appear in the output — only variations on what you showed.

The fix is rarely add more examples. It's usually remove or replace the misleading ones. Two diverse examples often beat five similar ones.

4. Try it

Look at a prompt where you give examples. What surface pattern do your examples share that isn't really the rule? What happens if you swap one out for something different?