wits
    Foundations · May 26, 2026 · Updated May 25, 2026 · 9 min read

    The economics of AI agents: cost per task in 2026

    Why an AI agent that costs 5 cents per task can be a great deal — or a disaster. The math of token spend, retries, latency, and the human-fallback budget.

    The economics of AI agents: cost per task in 2026
    TL;DR
    • Per-task cost of an AI agent is the wrong headline number. The all-in cost includes retries, human fallback, latency cost, and review time.
    • A "5-cent" AI task often costs $1-3 once you include everything. Sometimes that is still a great deal. Sometimes it is not.
    • The economics turn on three ratios: cost-per-success, human-fallback rate, and value-of-a-success.
    • What looks expensive (Claude Opus, GPT-5 Pro) is often cheap; what looks cheap (a tiny model with many retries) is often expensive.
    Quick answer
    How much does an AI agent cost per task in 2026?
    Raw token cost for a single AI agent task in 2026 typically runs 0.5-15 cents depending on model and complexity. But the real cost-per-completed-task — including retries on failures, human review of uncertain outputs, infrastructure, and the time of the operator who maintains the agent — is usually 5-50× the raw token cost. The right question is not "what does each call cost" but "what does each successfully-completed-task cost, and how does that compare to doing the task without AI."

    Founders ask us "how expensive is AI?" — and the honest answer is "expensive in ways the API price hides." Below is the math we use when sizing an AI workflow.

    The three layers of cost

    Layer 1: Direct token cost

    The visible number on the API bill. In mid-2026, rough ranges for a single moderately complex task:

    • Small model (Haiku, GPT-4.1 mini, Gemini Flash): 0.1-1 cent per task.
    • Mid model (Sonnet, GPT-5, Gemini Pro): 1-5 cents per task.
    • Large reasoning model (Opus, o-series, Gemini Ultra): 5-50 cents per task.

    Multiply by the number of LLM calls per workflow — most agentic workflows make 3-15 calls, so multiply accordingly.

    Layer 2: Operational cost

    What the API bill does not show:

    • Retries on failure. If the agent fails 20% of the time and retries, your effective cost is 1.25× the visible.
    • Vector DB queries, embedding costs, search API calls. RAG isn't free. Embedding 100k documents costs once; retrieving against them is per-query.
    • Inference infrastructure. If you self-host, GPU time. If you use a managed API, this is rolled into token cost.
    • Observability + logging. Storing every prompt + response for audit is gigabytes per month.

    Operational cost typically adds 30-100% on top of raw token cost.

    Layer 3: Human cost

    The biggest cost most teams forget:

    • Human-in-the-loop review. If 15% of agent outputs need a human reviewer, and review takes 90 seconds, that's a real recurring expense.
    • Escalation handling. 5% of cases the agent escalates to a human. Those cases cost what they would have cost without the agent — plus the agent's wasted effort.
    • Maintenance. Someone is the AI agent operator. They tune prompts, manage evals, watch for regressions. See the AI agent operator role.
    • Edge case investigation. Production AI produces surprising outputs. Each surprise costs investigation time.

    Human cost typically dwarfs token cost. A 5-cent token task that needs 30 seconds of human review at $30/hr loaded is really a 30-cent task.

    The three ratios that decide if it works

    Ratio 1: Cost-per-success

    Sum the three layers. Divide by the number of tasks that actually completed correctly, not the number attempted.

    Formula: (token_cost + ops_cost + human_cost) / successful_completions.

    This is your real per-task cost. Compare it to the cost of doing the task without AI.

    Ratio 2: Human-fallback rate

    What fraction of tasks ended up needing a human? If that fraction is 80%, you have a copilot that mildly accelerates humans — not an agent. If it is 5%, you have automation.

    Both can be good products. The economics are different. A copilot saves time per task; an agent removes tasks from the human queue.

    Ratio 3: Value-of-a-success

    How much is one correctly completed task worth?

    • A customer support ticket resolved without human intervention: $5-20 in saved agent time.
    • A B2B lead correctly qualified: $50-500 depending on deal size.
    • A correctly drafted contract: $200-2,000 depending on what a lawyer would charge.
    • A correctly filed GST return: $30-100 in CA time.

    The agent is economic when (cost-per-success) < (value-of-a-success) × (1 - human-fallback-rate).

    Why "expensive" models are often cheaper

    A large reasoning model that succeeds 95% of the time at 10 cents per call beats a small model that succeeds 60% of the time at 1 cent per call — every time, once you do the math.

    The expensive model's cost-per-success is 10.5 cents. The cheap model needs ~2 attempts on average to succeed and still has a 40% fail-and-escalate-to-human rate. The cheap model's true cost is often 25-50 cents per success.

    The right model for the task is the one with the best cost-per-success, not the lowest sticker price.

    What this looks like in production

    Three patterns we see:

    1. The over-engineered cheap-model pipeline

    Team picks the cheapest model. Adds retries, validators, fallback chains, second-opinion models, complex prompts. The combined system has 8 LLM calls per task and a maintenance burden that grows weekly. Total cost: higher than just using the better model.

    2. The under-engineered expensive-model demo

    Team picks Opus, ships in a week. Demo works great. Production traffic 10× the demo, costs 10× too — but the team budgeted for demo numbers. Cash burn surprises everyone.

    3. The economically sensible production agent

    Team picks the mid-model that hits the success-rate threshold. Caps cost per request with a circuit breaker. Logs every call. Reviews cost-per-success monthly. Adjusts model and prompt when the economics shift. This is what production looks like.

    What to measure monthly

    • Tasks attempted.
    • Tasks successfully completed.
    • Human-fallback rate.
    • All-in cost per success (token + ops + human review time).
    • Value generated per success.
    • Ratio (value / cost) — your AI margin. Target ≥ 5×.

    See AI ROI measurement for the broader frame.

    What this means for you

    • The sticker price of a model is the smallest part of the real cost.
    • Cost-per-success is the only number that matters. Optimise for it.
    • "Expensive" reasoning models often have the best cost-per-success.
    • Human fallback is a real cost — sometimes the dominant one. Measure it.
    • Set monthly cost ceilings. AI spend without a ceiling will surprise you.
    • Read our 2026 foundation models guide before picking a model.

    Sizing an AI workflow? Book a 30-minute call and we will help you do the math on your specific case.

    Now over to you

    Talk to a real engineer.

    A 30-minute call. We will tell you honestly whether AI is the right fix and what it would take.