3.1free~5 min

How many examples, in what order

Artifact: your prompt's example count and ordering, decided on purpose

1. How many

A starting heuristic, then the nuance.

  • Zero examples works when the task is well-known and the format is well-known. Write a sort function in Python.
  • One example works when there's a clear pattern and you need the model to match it. Translate to Spanish. Example: "hello" → "hola".
  • Two to five examples is the sweet spot for almost everything that's style-shaped or judgment-shaped. Commit messages. PR titles. Error messages. Test names.
  • More than five examples is usually noise. The model overweights surface patterns from the examples (lowercase first letter? trailing period?) and loses the underlying intent.

The heuristic: just enough to make the pattern legible, not enough to make the pattern over-fit.

2. Order matters

Three things to know.

One. The last example tends to weigh slightly more. The model gets a strong this is the kind of output to produce signal from what came right before its turn. Put your strongest, most representative example last.

Two. Diverse examples beat similar ones. Three examples that all hit the same point teach the model less than three examples that hit slightly different points. Easy case, medium case, edge case beats three easy cases.

Three. Negative examples — do this, not this — are powerful but require labels. If you include an example you don't want copied, label it as bad explicitly. Otherwise the model treats every example as a target.

3. Same task, two example sets

A prompt for naming tests.

textVibe — three similar examples
Generate test names like:

test_returns_404_when_user_not_found
test_returns_404_when_post_not_found
test_returns_404_when_comment_not_found

[function signature]

The model will mostly produce more returns_404_when_X_not_found tests, because that's the dominant surface pattern. You'll miss the other branches.

textSpec — three diverse examples, strongest last
Generate test names in the style of these three (each captures a different shape of test):

test_returns_404_when_user_not_found            (missing entity)
test_strips_whitespace_from_email_on_signup     (transformation)
test_rate_limits_after_3_failed_logins          (security / edge case)

[function signature]

Now the model has three shapes — missing entity, transformation, edge case — and the labels in parentheses are explicit cues. The generated names will cover more of the actual test surface.

4. Try it