How many examples, in what order

1. How many

A starting heuristic, then the nuance.

Zero examples works when the task is well-known and the format is well-known. Write a sort function in Python.
One example works when there's a clear pattern and you need the model to match it. Translate to Spanish. Example: "hello" → "hola".
Two to five examples is the sweet spot for almost everything that's style-shaped or judgment-shaped. Commit messages. PR titles. Error messages. Test names.
More than five examples is usually noise. The model overweights surface patterns from the examples (lowercase first letter? trailing period?) and loses the underlying intent.

The heuristic: just enough to make the pattern legible, not enough to make the pattern over-fit.

2. Order matters

Three things to know.

One. The last example tends to weigh slightly more. The model gets a strong this is the kind of output to produce signal from what came right before its turn. Put your strongest, most representative example last.

Two. Diverse examples beat similar ones. Three examples that all hit the same point teach the model less than three examples that hit slightly different points. Easy case, medium case, edge case beats three easy cases.

Three. Negative examples — do this, not this — are powerful but require labels. If you include an example you don't want copied, label it as bad explicitly. Otherwise the model treats every example as a target.

3. Same task, two example sets

A prompt for naming tests.

textVibe — three similar examples

Generate test names like:

test_returns_404_when_user_not_found
test_returns_404_when_post_not_found
test_returns_404_when_comment_not_found

[function signature]

The model will mostly produce more returns_404_when_X_not_found tests, because that's the dominant surface pattern. You'll miss the other branches.

textSpec — three diverse examples, strongest last

Generate test names in the style of these three (each captures a different shape of test):

test_returns_404_when_user_not_found            (missing entity)
test_strips_whitespace_from_email_on_signup     (transformation)
test_rate_limits_after_3_failed_logins          (security / edge case)

[function signature]

Now the model has three shapes — missing entity, transformation, edge case — and the labels in parentheses are explicit cues. The generated names will cover more of the actual test surface.

4. Try it

Pick a prompt where you give examples. How many do you give? Could you cut one? Could you make one of them more different from the others?