The 5 properties of production AI (vs demo AI)
Demo AI wows you in fifteen minutes. Production AI runs for a year without breaking. Five properties that separate them — and a checklist before you ship.
- Demo AI works in a controlled fifteen-minute slot. Production AI runs reliably for a year.
- Five properties separate them: observability, guardrails, graceful degradation, cost ceilings, human-in-the-loop.
- Most failed AI projects skip three or more of these.
- A checklist for any AI feature before it ships to production.
Most AI projects we see started life as an impressive demo. The demo worked. The team shipped it. Three weeks later, support tickets piled up. The system was hallucinating. Costs were unbounded. Nobody knew what the AI had done last Tuesday at 3pm.
Production AI is a different discipline. Below are the five properties that separate it from demo AI, with the failure mode each one prevents.
Property 1 — Observability
Every AI action gets logged. Every input. Every retrieval. Every model version. Every output. Every human reviewer. Every cost. Without this, debugging is impossible and trust evaporates.
The minimum log fields for any production AI action:
- request_id — unique trace ID across the entire pipeline
- tenant_id — for multi-tenant systems, who this was for
- model + version — exact model used (Claude Sonnet 4.5, GPT-4 Turbo 2024-04-09, etc.)
- input — the prompt, the retrieval context, the user message
- output — the model's response, with citations if any
- reviewer — human who approved this output (if HITL applied)
- token_count + cost — input tokens, output tokens, dollar cost
- latency — total round-trip time
Without this, when a customer reports a wrong answer, you cannot reproduce it. Observability is non-negotiable.
Property 2 — Guardrails
Inputs and outputs are bounded. Inputs are filtered for known injection patterns. Outputs are validated against expected shape, length, and content policies.
Examples of guardrails we ship in XWorks Suite:
- Reject prompts that look like prompt injection attempts
- Reject prompts containing PII patterns the user did not opt into sharing
- Force output to be valid JSON when the downstream system expects JSON
- Reject outputs that exceed a maximum word count
- Block outputs that contain known unsafe content patterns
- Filter for hallucinated citations (URLs that do not exist)
Guardrails do not replace the model's own safety. They add the deterministic boundary the model cannot guarantee.
Property 3 — Graceful degradation
Production AI must keep working when the AI does not. The foundation model API is down. The vector database is slow. The retrieval returns garbage. What happens?
Three patterns that work:
- Fall back to a smaller / cheaper model when the primary is unavailable. Quality drops; service stays up.
- Fall back to a non-AI path for critical workflows. The booking still works without the AI assistant; the user just types the request the old way.
- Fail loudly to the human queue for high-stakes tasks. If the AI cannot confidently triage a support ticket, the ticket goes to a human queue marked "AI uncertain."
Demo AI assumes the model always works. Production AI assumes it will fail in the worst week of the year.
Property 4 — Cost ceiling
AI is a metered resource. Every prompt costs money. Unbounded AI usage can burn a quarter's budget in a runaway weekend.
Production AI has cost ceilings at multiple layers:
- Per-request — maximum tokens in + tokens out per call
- Per-tenant per-day — daily cost ceiling per customer
- Per-feature — total cost ceiling per feature
- Global — daily AI spend across the entire system
Each ceiling, when hit, triggers a fallback path (smaller model, queue for tomorrow, non-AI alternative) — not a hard failure. Demo AI has no ceilings. Production AI has four.
Property 5 — Human-in-the-loop on high-stakes actions
For decisions whose cost-of-being-wrong is high — financial filings, medical communications, customer-facing actions, deletions, payments — a human approves before the AI ships the action.
HITL is not "we will manually check sometimes." It is an explicit approval queue with:
- A specific human role responsible for approval
- A maximum queue age (otherwise approvals back up)
- A clear escalation path when the queue stalls
- An override mechanism with audit logging
For the deeper how-to, read our HITL guide when it ships.
The 5-point production-AI checklist
Before shipping any AI feature to production, every line below must be a clear yes:
- Every AI action is logged with the 8 minimum fields.
- Inputs + outputs pass through guardrails. The guardrails are tested.
- A graceful fallback exists for primary-model failure. The fallback is tested.
- Cost ceilings are configured at request, tenant, feature, and global levels.
- High-stakes actions go through an approval queue with a named owner.
If two or more are missing, this is not production AI. It is a demo running in production. The failure clock is ticking.
How we ship production AI at Xwits
The XWorks Core platform implements all five properties as defaults. Partners do not opt into observability — every AI action is automatically logged. They do not configure guardrails — they tune existing ones. They do not write fallback code — the platform routes around outages. Cost ceilings ship with sensible defaults; partners override them.
This is what we mean when we say "pure AI engineering" on the home page. The unglamorous engineering is what separates the products that survive year two from the demos that quietly disappear.
What this means for you
- Before any AI feature goes to production, walk through the 5-point checklist. Honest answers only.
- If you cannot answer yes on three or more, do not ship yet. Add the missing properties first.
- Read build vs buy AI if you are deciding whether to engineer this layer yourself or use a platform that already ships it.
- For a deeper look at the engine: Xwits Engineering.
Building production AI for your business? Book a 30-minute call. We will walk through your specific feature against the 5-point checklist.
Talk to a real engineer.
A 30-minute call. We will tell you honestly whether AI is the right fix and what it would take.



