Guardrails โ
Guardrails run before and after each agent step to validate messages, detect unsafe content, and enforce policies. The framework ships a GuardrailValidator with composable rules that you pass to createAgent().
Quick start โ
ts
import { createAgent } from 'confused-ai';
import { GuardrailValidator, createPiiDetectionRule, createPromptInjectionRule } from 'confused-ai';
const guardrails = new GuardrailValidator({
rules: [
createPromptInjectionRule({ threshold: 0.7 }),
createPiiDetectionRule({ redact: true }),
],
});
const agent = createAgent({
name: 'safe-agent',
instructions: 'You are a helpful assistant.',
model: 'gpt-4o-mini',
apiKey: process.env.OPENAI_API_KEY!,
guardrails,
});Pass guardrails: false to disable all guardrails (including the default PII guardrail that runs when you omit the option).
GuardrailValidator โ
The core engine. Compose any combination of built-in and custom rules.
ts
import { GuardrailValidator } from 'confused-ai';
const guardrails = new GuardrailValidator({
rules: [rule1, rule2, rule3],
onViolation: (violation, ctx) => {
// called when any rule fires
console.warn('Guardrail violation:', violation.rule, violation.message);
// return 'block' | 'warn' | 'redact' | 'continue'
},
});PII detection โ
Detect and optionally redact personally identifiable information:
ts
import { createPiiDetectionRule } from 'confused-ai';
const piiRule = createPiiDetectionRule({
redact: true, // replace PII with [REDACTED]
// redact: false // just flag without modifying
// PII types to detect (all enabled by default):
types: ['email', 'phone', 'ssn', 'credit_card', 'jwt', 'aws_key', 'ip_address'],
});Detected PII types: email ยท phone ยท ssn ยท credit_card ยท jwt ยท aws_key ยท ip_address ยท and more from PII_PATTERNS.
ts
import { detectPii, PII_PATTERNS } from 'confused-ai';
// Use standalone (no agent required)
const result = await detectPii('Contact me at alice@example.com or 555-123-4567');
console.log(result.found); // true
console.log(result.matches); // [{ type: 'email', value: 'alice@example.com' }, ...]Prompt injection detection โ
Block attempts to hijack the agent via crafted input:
ts
import { createPromptInjectionRule, detectPromptInjection } from 'confused-ai';
const injectionRule = createPromptInjectionRule({
threshold: 0.7, // 0.0โ1.0; higher = stricter. Default: 0.7
});
// Standalone usage:
const detection = await detectPromptInjection('Ignore all previous instructions and...');
console.log(detection.score); // 0.95
console.log(detection.signals); // ['ignore_previous', 'jailbreak_attempt']LLM-based injection classifier (higher accuracy) โ
ts
import { createLlmInjectionClassifier } from 'confused-ai';
import { OpenAIProvider } from 'confused-ai';
const injectionRule = createLlmInjectionClassifier({
llm: new OpenAIProvider({ apiKey: process.env.OPENAI_API_KEY! }),
model: 'gpt-4o-mini',
threshold: 0.8,
});Content moderation โ
OpenAI Moderation API โ
ts
import { createOpenAiModerationRule } from 'confused-ai';
const moderationRule = createOpenAiModerationRule({
apiKey: process.env.OPENAI_API_KEY!,
// Block if any category score exceeds threshold:
thresholds: {
hate: 0.7,
'hate/threatening': 0.5,
harassment: 0.7,
'self-harm': 0.5,
sexual: 0.8,
violence: 0.7,
},
});Forbidden topics โ
ts
import { createForbiddenTopicsRule } from 'confused-ai';
const topicsRule = createForbiddenTopicsRule({
topics: ['competitor pricing', 'internal salary data', 'acquisition plans'],
action: 'block', // 'block' | 'warn'
});Content and length rules โ
ts
import {
createContentRule,
createMaxLengthRule,
createAllowlistRule,
createSensitiveDataRule,
createUrlValidationRule,
} from 'confused-ai';
const rules = [
// Block responses that contain specific patterns
createContentRule({
patterns: [/\b(password|secret|token)\s*[:=]/i],
action: 'block',
message: 'Response contains sensitive credential patterns.',
}),
// Limit output length
createMaxLengthRule({ maxChars: 10_000 }),
// Only allow certain output patterns
createAllowlistRule({
patterns: [/^[a-z0-9\s.,!?-]+$/i],
action: 'block',
}),
// Flag sensitive data patterns
createSensitiveDataRule({ patterns: SENSITIVE_DATA_PATTERNS }),
// Block requests to disallowed domains
createUrlValidationRule({
allowedDomains: ['api.company.com', 'docs.company.com'],
}),
];Tool allowlist โ
Restrict which tools the agent can call from within a guardrail rule:
ts
import { createToolAllowlistRule } from 'confused-ai';
const toolRule = createToolAllowlistRule({
allowedTools: ['search_orders', 'get_product_info'],
// Any tool not in this list is blocked before execution
});Custom rules โ
ts
import type { GuardrailRule, GuardrailContext, GuardrailResult } from 'confused-ai';
const noProfanityRule: GuardrailRule = {
name: 'no-profanity',
type: 'output', // 'input' | 'output' | 'both'
check: async (ctx: GuardrailContext): Promise<GuardrailResult> => {
const text = typeof ctx.message.content === 'string' ? ctx.message.content : '';
const hasProfanity = /\b(badword1|badword2)\b/i.test(text);
if (hasProfanity) {
return {
passed: false,
action: 'block',
message: 'Response contains prohibited language.',
rule: 'no-profanity',
};
}
return { passed: true };
},
};
const guardrails = new GuardrailValidator({ rules: [noProfanityRule] });Full example: production guardrail stack โ
ts
import { createAgent } from 'confused-ai';
import {
GuardrailValidator,
createPromptInjectionRule,
createPiiDetectionRule,
createOpenAiModerationRule,
createForbiddenTopicsRule,
createMaxLengthRule,
createToolAllowlistRule,
} from 'confused-ai';
const guardrails = new GuardrailValidator({
rules: [
createPromptInjectionRule({ threshold: 0.75 }),
createPiiDetectionRule({ redact: true }),
createOpenAiModerationRule({ apiKey: process.env.OPENAI_API_KEY! }),
createForbiddenTopicsRule({ topics: ['competitor pricing', 'legal strategy'] }),
createMaxLengthRule({ maxChars: 8_000 }),
createToolAllowlistRule({ allowedTools: ['search', 'get_order', 'send_email'] }),
],
onViolation: (violation) => {
// Send to your audit log
auditLogger.warn({ rule: violation.rule, action: violation.action, score: violation.score });
},
});
const agent = createAgent({
name: 'customer-service',
instructions: 'You are a customer service agent for Acme Corp.',
model: 'gpt-4o-mini',
apiKey: process.env.OPENAI_API_KEY!,
guardrails,
tools: [searchTool, orderTool, emailTool],
});Where to go next โ
- HITL โ escalate violations to a human instead of auto-blocking.
- Production โ rate limiting, circuit breakers, and audit logging.
- Agents โ how guardrails fit into the full
createAgent()config.