Production
Production-grade resilience: circuit breakers, rate limiting, retries, and graceful degradation.
withResilience()
Wraps any agent with automatic retries, circuit breaking, rate limiting, and health checks:
import { withResilience } from 'confused-ai/guard';
import { createAgent } from 'confused-ai';
const myAgent = createAgent({ name: 'assistant', llm, instructions: '...' });
const resilient = withResilience(myAgent, {
circuitBreaker: {
failureThreshold: 5, // open circuit after 5 consecutive failures
resetTimeoutMs: 30_000, // try again after 30s
callTimeoutMs: 60_000, // abort individual calls after 60s
},
rateLimit: {
maxRpm: 60, // max runs per minute
},
retry: {
maxRetries: 2, // retry failed runs up to 2 times
backoffMs: 500, // initial retry delay
maxBackoffMs: 5_000, // cap on exponential backoff
},
healthCheck: true, // enable .health() reporting
gracefulShutdown: true, // flush in-flight runs on SIGTERM
});
// Drop-in replacement — identical run() interface
const result = await resilient.run('Process this data');Health report
withResilience() returns a ResilientAgent with an extra .health() method:
const report = resilient.health();
// {
// status: 'healthy' | 'degraded' | 'unhealthy',
// circuitState: 'closed' | 'open' | 'half-open' | 'disabled',
// totalRuns: 142,
// totalFailures: 3,
// averageLatencyMs: 823,
// uptime: 3600,
// lastError?: 'Rate limit exceeded',
// lastRunAt?: Date,
// }
// Expose via HTTP
app.get('/agent/health', (req, res) => {
const h = resilient.health();
res.status(h.status === 'unhealthy' ? 503 : 200).json(h);
});All defaults
All ResilienceConfig fields are optional — withResilience(agent) with no config is valid and uses these defaults:
| Option | Default |
|---|---|
circuitBreaker.failureThreshold | 5 |
circuitBreaker.resetTimeoutMs | 30_000 |
circuitBreaker.callTimeoutMs | 60_000 |
rateLimit.maxRpm | 60 |
retry.maxRetries | 2 |
retry.backoffMs | 500 |
retry.maxBackoffMs | 5_000 |
healthCheck | true |
gracefulShutdown | true |
Pass circuitBreaker: false or rateLimit: false to disable those subsystems entirely.
Guardrails
Control what agents can say and do:
import { createGuardrails } from 'confused-ai/guardrails';
const guardrails = createGuardrails({
// Input allowlist — only allow topics in this list
allowlist: ['billing', 'account', 'subscription', 'pricing'],
// Block topics entirely
blocklist: ['competitors', 'pricing of other vendors'],
// Content safety (requires additional configuration)
contentSafety: {
enabled: true,
thresholds: { hate: 0.5, violence: 0.3, sexual: 0.1 },
},
// Custom validator
validate: async (input, output) => {
if (output.includes('confidential')) {
return { blocked: true, reason: 'Contains confidential information' };
}
return { blocked: false };
},
});
const myAgent = agent({
model: 'gpt-4o',
instructions: 'You are a billing assistant.',
guardrails,
});Fallback chain
Automatically fail over to backup models:
import { createFallbackChain } from 'confused-ai/model';
const llm = createFallbackChain([
{ model: 'gpt-4o', weight: 1 },
{ model: 'claude-3-5-sonnet-latest', weight: 1 },
{ model: 'gemini-2.0-flash-exp', weight: 1 },
]);
const myAgent = agent({ model: llm, instructions: '...' });Rate limiting (plugin)
import { rateLimitPlugin } from 'confused-ai/plugins';
const myAgent = defineAgent({ model: 'gpt-4o', instructions: '...' })
.use(rateLimitPlugin({
requestsPerMinute: 60,
tokensPerMinute: 100_000,
perUser: true, // rate limit per sessionId
}));Redis rate limiter (distributed)
RedisRateLimiter — fixed-window rate limiting across multiple server instances. Requires ioredis.
import Redis from 'ioredis';
import { RedisRateLimiter } from 'confused-ai/guard';
const redis = new Redis(process.env.REDIS_URL!);
const limiter = new RedisRateLimiter({
client: redis,
windowMs: 60_000, // 1 minute window
maxRequests: 100, // per key per window
});
// Use in your route handler or middleware:
const key = `user:${userId}`;
const result = await limiter.check(key);
if (!result.allowed) {
res.status(429).json({ error: 'Rate limit exceeded', retryAfter: result.retryAfterMs });
return;
}Health checks
Monitor agent health and readiness:
import { HealthMonitor } from 'confused-ai/guard';
const health = new HealthMonitor({
agents: [myAgent, teamAgent],
checkInterval: 30_000, // ms
onUnhealthy: (agentName, error) => {
alerting.notify(`Agent ${agentName} unhealthy: ${error.message}`);
},
});
health.start();
// Express health endpoint
app.get('/health', (req, res) => {
const status = health.getStatus();
res.status(status.healthy ? 200 : 503).json(status);
});Context window management
Automatic truncation when approaching token limits:
import { ContextWindowManager } from 'confused-ai/model';
const myAgent = agent({
model: 'gpt-4o', // 128k context
instructions: '...',
contextManager: new ContextWindowManager({
maxTokens: 100_000, // stay under limit
strategy: 'sliding-window', // keep most recent messages
}),
});Cost tracking
Track and budget LLM costs:
import { CostTracker } from 'confused-ai/model';
const tracker = new CostTracker({
budget: 10.00, // USD
onBudgetExceeded: (cost) => {
throw new Error(`Budget exceeded: $${cost.toFixed(4)}`);
},
});
const myAgent = agent({
model: 'gpt-4o',
instructions: '...',
costTracker: tracker,
});
// After runs
console.log(tracker.getTotalCost()); // $0.0023
console.log(tracker.getBreakdown());
// { 'gpt-4o': { input: 1000, output: 500, cost: 0.0023 } }Production adapter stack
The recommended way to wire all production infrastructure is createProductionSetup(). It connects sessions, memory, guardrails, rate-limiting, audit logs, observability, and more with sensible in-memory defaults that you replace progressively:
import { createAgent } from 'confused-ai';
import { createProductionSetup } from 'confused-ai/adapters';
const setup = createProductionSetup({
// Replace in-memory defaults with real drivers:
// cache: new RedisAdapter({ url: process.env.REDIS_URL! }),
// sessionStore: new RedisSessionAdapter({ url: process.env.REDIS_URL! }),
// rateLimit: new RedisRateLimitAdapter({ url: process.env.REDIS_URL! }),
// database: new PostgresAdapter({ connectionString: process.env.DATABASE_URL! }),
// auditLog: new PgAuditLogAdapter({ connectionString: process.env.DATABASE_URL! }),
// guardrail: new ContentSafetyAdapter({ apiKey: process.env.AZURE_CS_KEY! }),
// auth: new JwtAuthAdapter({ secret: process.env.JWT_SECRET! }),
// observability:new OtelAdapter({ endpoint: process.env.OTEL_ENDPOINT! }),
});
await setup.connect();
const agent = createAgent({
name: 'assistant',
model: 'gpt-4o',
instructions: '...',
adapters: setup.bindings,
});
// Health endpoint
app.get('/health', async (_req, res) => {
const health = await setup.healthCheck();
const ok = Object.values(health).every((h) => h.ok);
res.status(ok ? 200 : 503).json(health);
});
// Graceful shutdown
process.on('SIGTERM', () => setup.disconnect().then(() => process.exit(0)));See the Adapters guide for the full reference.
Budget enforcement
Hard-stop LLM spend per run, per user (daily), or globally (monthly). Unlike CostTracker (which measures), BudgetEnforcer stops execution when a cap is crossed.
import { createAgent } from 'confused-ai';
const agent = createAgent({
name: 'Safe',
model: 'gpt-4o',
instructions: '...',
budget: {
maxUsdPerRun: 0.50, // hard cap per single run
maxUsdPerUser: 10.00, // daily cap per userId
maxUsdPerMonth: 500.00, // monthly cap (all users combined)
onExceeded: 'throw', // 'throw' | 'warn' | 'truncate'
},
});import { defineAgent } from 'confused-ai';
const agent = defineAgent()
.instructions('...')
.model('gpt-4o')
.budget({
maxUsdPerRun: 0.50,
maxUsdPerUser: 10.00,
maxUsdPerMonth: 500.00,
onExceeded: 'throw',
})
.build();Persistent budget store
The default InMemoryBudgetStore resets on restart. For persistence, implement BudgetStore or use the SQLite default:
import { InMemoryBudgetStore, BudgetEnforcer } from 'confused-ai/guard';
// Custom Postgres-backed store:
import type { BudgetStore } from 'confused-ai/guard';
class PostgresBudgetStore implements BudgetStore {
async getUserDailySpend(userId: string) { /* SELECT SUM(usd) WHERE user_id = $1 AND date = today */ }
async incrementUserDailySpend(userId: string, usd: number) { /* UPSERT */ }
async getMonthlySpend() { /* SELECT SUM(usd) WHERE month = current_month */ }
async incrementMonthlySpend(usd: number) { /* UPDATE */ }
}
const agent = createAgent({
name: 'Safe',
budget: {
maxUsdPerUser: 10.00,
store: new PostgresBudgetStore(),
},
});Handling BudgetExceededError
import { BudgetExceededError } from 'confused-ai/guard';
try {
await agent.run('Analyse 500 documents', { userId: 'user-42' });
} catch (err) {
if (err instanceof BudgetExceededError) {
console.log(`Cap: ${err.cap}`); // 'run' | 'user_daily' | 'monthly'
console.log(`Limit: $${err.limitUsd}`);
console.log(`Spent: $${err.spentUsd}`);
}
}Agent checkpointing
For long-running tasks, save execution state after each step. If the process restarts, resume from the last saved step.
import { createAgent } from 'confused-ai';
import { createSqliteCheckpointStore } from 'confused-ai/production';
const agent = createAgent({
name: 'BatchProcessor',
instructions: 'Process a large dataset...',
checkpointStore: createSqliteCheckpointStore('./agent.db'),
});
// Provide a stable runId — if the process restarts, execution resumes
const result = await agent.run('Process all 500 records', { runId: 'batch-job-001' });import { defineAgent } from 'confused-ai';
import { createSqliteCheckpointStore } from 'confused-ai/production';
const agent = defineAgent()
.instructions('Process a large dataset...')
.checkpoint(createSqliteCheckpointStore('./agent.db'))
.build();
const result = await agent.run('Process all 500 records', { runId: 'batch-job-001' });Checkpoint stores
| Store | Import | Notes |
|---|---|---|
InMemoryCheckpointStore | confused-ai/production | Dev/test — does not survive restarts |
SqliteCheckpointStore | confused-ai/production | Durable default |
createSqliteCheckpointStore | confused-ai/production | Factory shorthand |
Custom checkpoint store
import type { AgentCheckpointStore, AgentRunState } from 'confused-ai/guard';
class RedisCheckpointStore implements AgentCheckpointStore {
async save(runId: string, step: number, state: AgentRunState) {
await redis.set(`checkpoint:${runId}`, JSON.stringify({ step, state }), 'EX', 86400);
}
async load(runId: string) {
const raw = await redis.get(`checkpoint:${runId}`);
return raw ? JSON.parse(raw) : null;
}
async delete(runId: string) {
await redis.del(`checkpoint:${runId}`);
}
}HTTP runtime authentication
createHttpService supports built-in authentication strategies via the auth option. When omitted, the server runs without auth (dev mode only).
import { createHttpService, listenService } from 'confused-ai/serve';
import { apiKeyAuth, bearerAuth } from 'confused-ai/serve';
// API key (header: x-api-key)
const service = createHttpService({
agents: { assistant },
auth: apiKeyAuth(['sk-prod-abc', 'sk-staging-xyz']),
// or shorthand:
// auth: { strategy: 'api-key', keys: ['sk-prod-abc'] },
maxBodyBytes: 512_000, // 512 KB request body limit (default: 1 MB)
});
await listenService(service, 8080);// Bearer JWT / custom token validation
import { bearerAuth } from 'confused-ai/serve';
const service = createHttpService({
agents: { assistant },
auth: bearerAuth(async (token) => {
const user = await verifyJwt(token);
return user ? { userId: user.sub, tenantId: user.org } : null;
}),
});// Basic auth (username:password)
import { createHttpService } from 'confused-ai/serve';
const service = createHttpService({
agents: { assistant },
auth: {
strategy: 'basic',
users: { admin: process.env.ADMIN_PASSWORD! },
},
});JWT RBAC — for role-based access control using HS256 JWTs:
import { jwtAuth, hasRole } from 'confused-ai/serve';
const service = createHttpService({
agents: { assistant },
auth: jwtAuth({
secret: process.env.JWT_SECRET!,
required: true,
}),
});
// In a hook or custom middleware, check roles:
const auth = ctx.auth; // { userId, roles: ['admin', 'user'] }
if (!hasRole(auth, 'admin')) throw new Error('Forbidden');CreateHttpServiceOptions reference
| Option | Type | Default | Description |
|---|---|---|---|
agents | Record<string, Agent> | required | Named agents to expose |
auth | AuthMiddlewareOptions | — | Auth strategy; omit for dev/no-auth |
maxBodyBytes | number | 1_048_576 | Max request body size (bytes); returns 413 on exceed |
cors | string | — | Access-Control-Allow-Origin header |
tracing | boolean | false | In-memory request audit log |
Idempotency
Prevent duplicate side-effects when clients retry failed HTTP requests. Pass an X-Idempotency-Key header and the same response is returned on replay — the agent does not re-execute.
import { createHttpService } from 'confused-ai/serve';
import { createSqliteIdempotencyStore } from 'confused-ai/guard';
const service = createHttpService({
agents: { assistant },
idempotency: {
store: createSqliteIdempotencyStore('./agent.db'),
ttlMs: 24 * 60 * 60 * 1000, // cache for 24 hours
},
});Client usage:
POST /v1/chat/assistant
X-Idempotency-Key: order-123-send-email
Content-Type: application/json
{ "message": "Send a confirmation email for order 123" }If the request is retried with the same key within 24 hours, the original response is returned without re-running the agent.
Custom idempotency store
import type { IdempotencyStore } from 'confused-ai/guard';
class RedisIdempotencyStore implements IdempotencyStore {
async get(key: string) { /* fetch from Redis */ }
async set(key: string, status: number, body: string, ttlMs: number) { /* store in Redis with TTL */ }
}Audit log
Persistent, queryable audit trail for every agent run. Satisfies SOC 2 and HIPAA requirements for tamper-evident logging.
import { createHttpService } from 'confused-ai/serve';
import { createSqliteAuditStore } from 'confused-ai/guard';
const service = createHttpService({
agents: { assistant },
auditStore: createSqliteAuditStore('./agent.db'),
});Query audit logs
const entries = await auditStore.query({
agentName: 'assistant',
userId: 'user-42',
since: new Date('2025-01-01'),
limit: 100,
});
entries.forEach((e) => {
console.log(`${e.timestamp} ${e.method} ${e.path} → ${e.status} (${e.durationMs}ms)`);
});AuditEntry fields
| Field | Type | Description |
|---|---|---|
id | string | UUID |
timestamp | string | ISO 8601 |
method | string | HTTP method |
path | string | Request path |
status | number | HTTP status code |
agentName | string? | Agent that handled the request |
sessionId | string? | Session ID |
userId | string? | User ID from auth context |
tenantId | string? | Tenant ID |
promptHash | string? | SHA-256 hash of prompt (never plaintext) |
toolsCalled | string[]? | Tool names called |
finishReason | string? | How the run ended |
durationMs | number? | Total run duration |
costUsd | number? | Estimated cost |
idempotencyKey | string? | Idempotency key, if any |
idempotencyHit | boolean? | Whether this was a cache replay |
Redis rate limiter
For distributed deployments where multiple processes share the same rate limits:
import Redis from 'ioredis';
import { RedisRateLimiter } from 'confused-ai/guard';
const redis = new Redis(process.env.REDIS_URL!);
const limiter = new RedisRateLimiter({
redis,
name: 'api', // logical limiter name (part of Redis key)
maxRequests: 100,
windowSeconds: 60, // fixed window — 100 req/min
});
// Wrap logic in execute() — throws RateLimitError when limit is exceeded
await limiter.execute(async () => {
const result = await myAgent.run(prompt);
res.json({ text: result.text });
});Use RedisRateLimiter instead of the in-process RateLimiter whenever you run multiple server instances.
Human-in-the-Loop (HITL)
Pause agent execution at high-risk tool calls and require a human decision before proceeding. Build a gate tool using waitForApproval:
import { createSqliteApprovalStore, waitForApproval, ApprovalRejectedError } from 'confused-ai/guard';
const approvalStore = createSqliteApprovalStore('./agent.db');
const requestApproval = defineTool()
.name('requestApproval')
.description('Request human approval before a risky action')
.parameters(z.object({
toolName: z.string(),
description: z.string(),
riskLevel: z.enum(['low', 'medium', 'high', 'critical']),
}))
.execute(async ({ toolName, description, riskLevel }, ctx) => {
const req = await approvalStore.create({
runId: ctx.runId ?? 'run', agentName: 'Agent',
toolName, toolArguments: {}, riskLevel, description,
});
await waitForApproval(approvalStore, req.id, { timeoutMs: 30 * 60 * 1000 });
return { approved: true };
})
.build();See the dedicated HITL guide.
Multi-tenancy
Scope sessions, rate limits, and cost tracking per tenant without separate databases.
See the dedicated Multi-Tenancy guide.