Foundations · May 25, 2026 · 10 min read

RAG vs fine-tuning vs prompt engineering: when to use which

Three techniques, three different jobs. A clear decision tree, side-by-side cost + timeline, and the honest answer for most companies starting out.

By Xwits Editorial · Reviewed by Deep Parmar, Founder · Last reviewed May 25, 2026

TL;DR

Three techniques, three different jobs. Prompt engineering shapes how the model behaves. RAG gives the model your data at query time. Fine-tuning permanently shifts the model's behaviour.
Default to prompt engineering first. Add RAG when knowledge matters. Fine-tune only after RAG has hit a wall.
Most production systems combine at least two of the three.
Cost ranges: prompt engineering is free; RAG is dollars per million queries; fine-tuning is hundreds-to-thousands per model + ongoing inference cost.

Quick answer

Should I use RAG, fine-tune, or prompt engineering?

Start with prompt engineering — it costs nothing and answers most questions. Add RAG when the AI needs to know specific facts from your data that change over time. Fine-tune only when prompting + RAG cannot achieve the quality, tone, or format you need, and you have enough labelled examples to train on. Most production systems use prompt engineering and RAG together; fine-tuning is the last resort.

Three techniques get conflated constantly. They do different jobs. Picking the wrong one wastes weeks. Below is the working frame we use at Xwits — and the decision tree we walk through with every customer.

What each technique does

Prompt engineering

You change how you ask the model. You provide examples in the prompt (few-shot), specify the output format, supply context inline, set role and tone. The model itself is unchanged.

What it changes: the model's behaviour for this specific prompt.
What it does not change: the model's underlying knowledge.
Cost: effectively free. Token cost per call.
Time: minutes to hours.

RAG (Retrieval-Augmented Generation)

You retrieve relevant pieces of your data at query time and paste them into the prompt as context. The model uses the retrieved data to ground its answer. See our deep-dive on what RAG is.

What it changes: the model's working knowledge for each query.
What it does not change: the model itself.
Cost: embedding (one-time per chunk), vector DB hosting, retrieval inference per query. Dollars per million queries at our scale.
Time: days to weeks for the first production version.

Fine-tuning

You continue training the model on your specific data. The trained model has new weights — it now answers differently. Examples include instruction tuning, LoRA adapters, full fine-tunes.

What it changes: the model's weights — its underlying patterns.
What it does not change: the model's general world knowledge.
Cost: training compute (hundreds to thousands of dollars per run), labelled training data preparation (the bulk of the cost), ongoing inference cost on the fine-tuned model.
Time: weeks to months for the first production fine-tune.

The decision tree

Can prompt engineering alone solve your problem?
Try a careful prompt with 3-5 examples. Run 20 test cases. If quality is acceptable: stop here. You are done.
Does the AI need to know specific facts from your data?
If yes (customer histories, product catalogues, internal documentation), add RAG. Most production AI lives here.
Does prompt + RAG produce output in the wrong tone, format, or style?
If you have ≥1,000 labelled examples of the desired output, consider fine-tuning. If you have fewer, try better prompts first.
Is the model too slow or expensive at the volume you need?
A smaller fine-tuned model can match a larger general model on a narrow task. Fine-tune for cost reduction.
Is the model failing because it does not understand a domain-specific concept?
If RAG cannot teach the concept (because it lives across many sources), fine-tuning can encode it in the weights.

The side-by-side

Dimension	Prompt engineering	RAG	Fine-tuning
Best for	Behaviour, format, tone for this query	Knowledge that changes	Tone, format, or style baked into the model permanently
Time to first version	Hours	Days-weeks	Weeks-months
Upfront cost	~$0	~$1-10k (vector DB, embedding, plumbing)	~$5-50k (data prep + training)
Ongoing cost	Token cost per call	Embedding + retrieval + token cost	Inference + retraining each time the model drifts
Handles changing data	No	Yes (update the vector DB)	No (must retrain)
Citation-friendly	No	Yes (sources are retrievable)	No (knowledge is baked in)
Privacy	Data sent in prompt	Data retrieved into prompt	Data baked into model weights

Common hybrid patterns

Prompt + RAG (the default)

Most production AI we build at Xwits uses this combination. A careful system prompt sets behaviour. RAG supplies the specific data. The two together cover ~80% of use cases without fine-tuning.

Prompt + RAG + Fine-tune (the high-volume specialist)

For a single narrow task at high volume (say, classifying 10,000 support tickets a day in a specific company's voice), fine-tune a smaller model on labelled examples + add RAG for changing knowledge + use prompts for per-request shaping. Cost per query drops 5-10× vs the equivalent frontier-model prompt.

Multi-model + RAG (the router)

Use a small fast model to classify the request. Route to a frontier model with RAG for hard cases, or a fine-tuned smaller model for known patterns. See our multi-model strategies.

Common mistakes

Fine-tuning when you should have improved the prompt

Teams jump to fine-tuning because it sounds sophisticated. Most quality problems are solved by adding examples in the prompt. Try the prompt fix for half a day before going down the training path.

RAG when prompt engineering would do

For knowledge that fits in 5,000 tokens and rarely changes (your pricing page, your standard terms), just paste it in the prompt. RAG infrastructure is overkill for small static contexts.

Fine-tuning a frontier model

Fine-tuning the largest frontier models (when even available) is rarely worth the cost. The pre-training already captured most patterns. Fine-tuning shines on smaller models that need narrow specialisation.

What this means for you

Default to prompt engineering. Most use cases stop here.
Add RAG when knowledge matters. Read our RAG deep-dive first.
Fine-tune only after prompts + RAG have hit a wall. Need ≥1,000 labelled examples and a clear quality ceiling RAG cannot break.
Build evals before any of these — measure baseline, measure each technique, only ship the winner.

Book a 30-minute call if you want a second pair of eyes on which technique fits your specific use case.

Talk to engineering→

Keep reading