FAQ — Guardrail AI

Getting Started

What is Guardrail? ▼

Guardrail is an AI Escalation Intelligence Layer — a hosted API that sits between your AI model and your users. It scores every AI response in real-time for confidence, detects 60+ risk signals (hallucinations, uncertainty, evasion, etc.), and routes to one of three decisions: deliver, flag, or escalate. It works with any LLM — OpenAI, Claude, Gemini, or your own model.

How do I get started? ▼

Three steps:

Sign up on the landing page — enter your email and get a free API key instantly.
Pick an integration — SDK, embeddable chat widget, Chatflow, tawk.to, MCP, or direct REST API.
Start scoring — wrap your AI calls with gr.check(text) and handle the decision.

Is Guardrail free? ▼

Yes. Every account gets a free API key with 1,000 checks per month. The scoring engine runs on heuristic patterns (no AI tokens consumed for scoring). The only cost is if you use the /api/chat endpoint, which uses your Anthropic key for Claude calls — those tokens are billed by Anthropic, not by us.

What's the difference between a Guardrail key and an Anthropic key? ▼

Guardrail key (gr_live_xxx) — identifies your account for Guardrail scoring. Free. Used for authentication and usage tracking.
Anthropic key (sk-ant-xxx) — needed ONLY if you want to use the /api/chat endpoint to call Claude through Guardrail. You pay Anthropic for those tokens.

For the /api/check endpoint (scoring only), you only need a Guardrail key. No LLM tokens are used.

Scoring & Decision Logic

How does the scoring engine work? ▼

Guardrail uses 60+ regex-based signal patterns across 7 categories: uncertainty, knowledge cutoff, contradiction, evasion, hallucination, frustration, and sycophancy. Each signal has a weighted penalty.

Every response starts with a base score of 82%. Signals subtract from it, quality indicators (lists, code, URLs) add to it. The final score determines the decision:

≥ 75% → ✅ Deliver
45–74% → ⚠️ Flag
< 45% → 🔴 Escalate

No LLM is used in the scoring path. It's purely pattern-based, so it's fast (< 50ms) and deterministic.

What are domain contexts and why do they matter? ▼

Contexts tell Guardrail what kind of content is being scored. High-stakes domains like medical, legal, and financial apply extra penalties (−20% to −35%) because errors in those areas are more dangerous.

Supported contexts: general · medical · legal · financial · security · safety · mental_health · child_safety · nuclear.

If you set general, Guardrail will auto-detect the real domain from the text and elevate automatically.

Does Guardrail use AI to score responses? ▼

No. The scoring engine is 100% pattern-based (regex + heuristics). This means it's fast, cheap, deterministic, and doesn't have its own hallucination risk. The only AI call is when you use /api/chat to generate responses via Claude — scoring that response is still pattern-based.

What is Wikipedia verification? ▼

When using /api/check, Guardrail extracts factual claims from the text and cross-checks them against Wikipedia in parallel. Claims that match get a confidence boost (+2% each), contradicted claims get a penalty (−8% each). This runs automatically — you can disable it with ?verify=false.

What does the userQuery parameter do? ▼

When you include the original user question via userQuery, Guardrail enables context-aware scoring:

Relevance check — is the response actually addressing the question?
Scope analysis — is the response disproportionately long for a simple question?
Dangerous query detection — does the question ask about dosage, lawsuits, etc. and does the AI properly refuse?

This significantly improves scoring accuracy. Always pass it when you have the original question.

Can I customize the confidence thresholds? ▼

Not yet via API, but you can implement custom thresholds client-side. Guardrail returns a confidence score (0.0–1.0) — you can set your own cutoffs in your code. For example, a medical app might escalate anything below 80% instead of the default 75%.

Integrations

Which LLMs does Guardrail work with? ▼

All of them. Guardrail scores text, not model calls. Pass any AI-generated text to /api/check — it works with OpenAI, Claude, Gemini, Llama, Mistral, Cohere, or any model. The /api/chat endpoint specifically uses Claude, but scoring works with any text source.

How does the embeddable chat widget work? ▼

One script tag before </body> and you get a full AI chat with confidence scoring on any page:

<script src=".../embed/widget.js" data-key="gr_live_xxx"></script>

It automatically scrapes the host page (title, meta tags, headings, body text) and sends that as context, so the AI can answer questions about your website. The widget supports dark/light themes, custom titles, welcome messages, and system prompts via data-* attributes.

What is auto page scraping? ▼

When the chat widget loads, it reads your page's content — title, URL, meta description, keywords, headings (h1–h3), and first 3,000 characters of visible body text. This is cached once and sent with every chat message as pageContext.

The server injects this into Claude's system prompt so the AI can answer site-specific questions like "What are your pricing plans?" or "Do you offer mobile RON?"

Disable with data-scrape="false". See the docs for full details.

How do I integrate with tawk.to? ▼

Two options:

Webhook (post-chat audit) — Add /api/tawkto/webhook?key=YOUR_KEY as a webhook URL in tawk.to's settings. Every chat transcript gets scored automatically.
AI Assist Custom Tool — Add /api/tawkto/openapi.json as a Custom Tool URL so the AI can check its own answers before sending.

How do I use Guardrail with Claude Desktop (MCP)? ▼

Add this to your Claude Desktop config file:

{ "mcpServers": { "guardrail": { "command": "npx", "args": ["guardrail-ai-mcp", "--key", "gr_live_xxx"] } } }

Claude Desktop can then call Guardrail to check response safety as a tool.

Can I use Guardrail with Chatflow / Flowise? ▼

Yes. Create a Custom Tool in Chatflow that calls /api/check with the AI response and original question. Add a safety disclaimer if the decision is flag or escalate. See the Chatflow Integration Guide for step-by-step instructions.

Chat Widget

Whose AI tokens are used when the widget is on my site? ▼

The server operator's Anthropic key pays for Claude API calls. The data-key on the widget is a Guardrail key for authentication — it's NOT an LLM key.

If no Anthropic key is configured on the server, the widget automatically falls back to demo mode with pre-recorded responses (zero AI token cost).

What data does the widget collect from my visitors? ▼

The widget scrapes publicly visible page content only: title, URL, meta tags, headings, and body text. It does NOT collect:

❌ Cookies or localStorage
❌ Form inputs or user data
❌ Scripts, styles, or hidden elements
❌ Data from other browser tabs

Data is sent to YOUR Guardrail server, not a third party. Disable with data-scrape="false".

How much does the page context add to token costs? ▼

Each chat message with page context adds roughly 800–1,200 input tokens to the system prompt. At Claude's pricing, that's approximately:

Short page (~500 chars): ~$0.0003/message
Medium page (~2,000 chars): ~$0.001/message
Full page (3,000 char cap): ~$0.0015/message

Can I customize the widget appearance? ▼

Yes. Use data-* attributes: data-theme="dark" or "light", data-title="Your Brand", data-welcome="Custom greeting", data-placeholder="Custom input text", and data-system-prompt="Custom instructions".

Privacy & Security

Is my data stored? ▼

Guardrail stores the first 300 characters of scored text in usage logs for your dashboard. Full response text is NOT stored server-side. API keys, usage counts, and decision history are stored in PostgreSQL. All data stays on your Railway deployment.

Can I self-host Guardrail? ▼

Yes. Clone the repo, set your environment variables, and deploy anywhere:

git clone https://github.com/saifsysim/guardrail-mvp
Set ANTHROPIC_API_KEY and GUARDRAIL_MASTER_KEY in .env
npm install && npm start

Works on Railway, Render, Heroku, AWS, or any Node.js host. Add DATABASE_URL for PostgreSQL persistence.

Is the API key sent to third parties? ▼

No. Your Guardrail API key is only sent to your Guardrail server. If you use the anthropicKey parameter in /api/chat, that key is sent from your server to Anthropic — it never leaves the server-to-Anthropic connection.

API & Technical

What's the difference between /api/check and /api/chat? ▼

/api/check — Score only. You pass in pre-generated AI text. No LLM call. Fast (<50ms). Works with any model.
/api/chat — Generate + score. Sends your message to Claude, gets a response, scores it, and returns everything in one call. Uses Anthropic tokens.

Use /api/check when you already have the AI response. Use /api/chat when you want Guardrail to handle both generation and scoring.

What are the rate limits? ▼

/api/demo-check: 5 requests/hour per IP (no key needed)
/api/demo-chat: 10 requests/hour per IP (no key needed)
/api/check: Unlimited with a valid API key
/api/chat: Unlimited with a valid API key

Can I use the real-time dashboard? ▼

Yes. The Dashboard uses Server-Sent Events (SSE) to show every scoring decision in real-time. It streams from /api/events. Your Developer Portal shows per-key stats, decision breakdowns, and recent scoring logs.

What happens if the server is down? ▼

If Guardrail is unreachable, the SDK's check() method throws an error, which your onError callback can handle. Best practice: default to deliver with a disclaimer if Guardrail is unavailable, so your users aren't blocked.

Is there a latency impact? ▼

The scoring engine runs in <50ms (pattern matching, no AI calls). With Wikipedia verification enabled, it may take 200–500ms due to external API calls. You can disable verification with ?verify=false for latency-sensitive applications.

Troubleshooting

I'm getting 401 "API key required" ▼

Make sure you're sending your key in the X-Guardrail-Key header or as a ?key= query parameter. The key must start with gr_live_. If you lost your key, sign up again with the same email — it returns your existing key.

The chat widget says "Connection error" ▼

Check that the src URL in the script tag points to your running Guardrail server
Verify the data-key is a valid Guardrail API key
Check the browser console for CORS errors — your server must allow the widget's origin
If no Anthropic key is on the server, the widget should fall back to demo-chat automatically

My responses always show "escalate" ▼

This usually means you're scoring in a high-stakes domain (medical, legal, etc.) where the base penalty is −25% to −35%. Combined with any uncertainty language, it can push confidence below 45%. Try scoring with context: "general" to see if the domain penalty is the cause.

503 "No Anthropic API key available" ▼

The /api/chat endpoint requires an Anthropic API key. Either set ANTHROPIC_API_KEY in your server's .env file, or pass anthropicKey in the request body. If neither is available, use /api/demo-chat for testing.

❓ Frequently Asked Questions

Still have questions?