wits
    Foundations · May 25, 2026 · 10 min read

    RAG vs fine-tuning vs prompt engineering: when to use which

    Three techniques, three different jobs. A clear decision tree, side-by-side cost + timeline, and the honest answer for most companies starting out.

    RAG vs fine-tuning vs prompt engineering: when to use which
    TL;DR
    • Three techniques, three different jobs. Prompt engineering shapes how the model behaves. RAG gives the model your data at query time. Fine-tuning permanently shifts the model's behaviour.
    • Default to prompt engineering first. Add RAG when knowledge matters. Fine-tune only after RAG has hit a wall.
    • Most production systems combine at least two of the three.
    • Cost ranges: prompt engineering is free; RAG is dollars per million queries; fine-tuning is hundreds-to-thousands per model + ongoing inference cost.
    Quick answer
    Should I use RAG, fine-tune, or prompt engineering?
    Start with prompt engineering — it costs nothing and answers most questions. Add RAG when the AI needs to know specific facts from your data that change over time. Fine-tune only when prompting + RAG cannot achieve the quality, tone, or format you need, and you have enough labelled examples to train on. Most production systems use prompt engineering and RAG together; fine-tuning is the last resort.

    Three techniques get conflated constantly. They do different jobs. Picking the wrong one wastes weeks. Below is the working frame we use at Xwits — and the decision tree we walk through with every customer.

    What each technique does

    Prompt engineering

    You change how you ask the model. You provide examples in the prompt (few-shot), specify the output format, supply context inline, set role and tone. The model itself is unchanged.

    • What it changes: the model's behaviour for this specific prompt.
    • What it does not change: the model's underlying knowledge.
    • Cost: effectively free. Token cost per call.
    • Time: minutes to hours.

    RAG (Retrieval-Augmented Generation)

    You retrieve relevant pieces of your data at query time and paste them into the prompt as context. The model uses the retrieved data to ground its answer. See our deep-dive on what RAG is.

    • What it changes: the model's working knowledge for each query.
    • What it does not change: the model itself.
    • Cost: embedding (one-time per chunk), vector DB hosting, retrieval inference per query. Dollars per million queries at our scale.
    • Time: days to weeks for the first production version.

    Fine-tuning

    You continue training the model on your specific data. The trained model has new weights — it now answers differently. Examples include instruction tuning, LoRA adapters, full fine-tunes.

    • What it changes: the model's weights — its underlying patterns.
    • What it does not change: the model's general world knowledge.
    • Cost: training compute (hundreds to thousands of dollars per run), labelled training data preparation (the bulk of the cost), ongoing inference cost on the fine-tuned model.
    • Time: weeks to months for the first production fine-tune.

    The decision tree

    1. Can prompt engineering alone solve your problem?
      Try a careful prompt with 3-5 examples. Run 20 test cases. If quality is acceptable: stop here. You are done.
    2. Does the AI need to know specific facts from your data?
      If yes (customer histories, product catalogues, internal documentation), add RAG. Most production AI lives here.
    3. Does prompt + RAG produce output in the wrong tone, format, or style?
      If you have ≥1,000 labelled examples of the desired output, consider fine-tuning. If you have fewer, try better prompts first.
    4. Is the model too slow or expensive at the volume you need?
      A smaller fine-tuned model can match a larger general model on a narrow task. Fine-tune for cost reduction.
    5. Is the model failing because it does not understand a domain-specific concept?
      If RAG cannot teach the concept (because it lives across many sources), fine-tuning can encode it in the weights.

    The side-by-side

    DimensionPrompt engineeringRAGFine-tuning
    Best forBehaviour, format, tone for this queryKnowledge that changesTone, format, or style baked into the model permanently
    Time to first versionHoursDays-weeksWeeks-months
    Upfront cost~$0~$1-10k (vector DB, embedding, plumbing)~$5-50k (data prep + training)
    Ongoing costToken cost per callEmbedding + retrieval + token costInference + retraining each time the model drifts
    Handles changing dataNoYes (update the vector DB)No (must retrain)
    Citation-friendlyNoYes (sources are retrievable)No (knowledge is baked in)
    PrivacyData sent in promptData retrieved into promptData baked into model weights

    Common hybrid patterns

    Prompt + RAG (the default)

    Most production AI we build at Xwits uses this combination. A careful system prompt sets behaviour. RAG supplies the specific data. The two together cover ~80% of use cases without fine-tuning.

    Prompt + RAG + Fine-tune (the high-volume specialist)

    For a single narrow task at high volume (say, classifying 10,000 support tickets a day in a specific company's voice), fine-tune a smaller model on labelled examples + add RAG for changing knowledge + use prompts for per-request shaping. Cost per query drops 5-10× vs the equivalent frontier-model prompt.

    Multi-model + RAG (the router)

    Use a small fast model to classify the request. Route to a frontier model with RAG for hard cases, or a fine-tuned smaller model for known patterns. See our multi-model strategies.

    Common mistakes

    Fine-tuning when you should have improved the prompt

    Teams jump to fine-tuning because it sounds sophisticated. Most quality problems are solved by adding examples in the prompt. Try the prompt fix for half a day before going down the training path.

    RAG when prompt engineering would do

    For knowledge that fits in 5,000 tokens and rarely changes (your pricing page, your standard terms), just paste it in the prompt. RAG infrastructure is overkill for small static contexts.

    Fine-tuning a frontier model

    Fine-tuning the largest frontier models (when even available) is rarely worth the cost. The pre-training already captured most patterns. Fine-tuning shines on smaller models that need narrow specialisation.

    What this means for you

    • Default to prompt engineering. Most use cases stop here.
    • Add RAG when knowledge matters. Read our RAG deep-dive first.
    • Fine-tune only after prompts + RAG have hit a wall. Need ≥1,000 labelled examples and a clear quality ceiling RAG cannot break.
    • Build evals before any of these — measure baseline, measure each technique, only ship the winner.

    Book a 30-minute call if you want a second pair of eyes on which technique fits your specific use case.

    Now over to you

    Talk to a real engineer.

    A 30-minute call. We will tell you honestly whether AI is the right fix and what it would take.