Skip to content

05 ยท RAG Knowledge Base ๐ŸŸก โ€‹

RAG = Retrieval-Augmented Generation. Instead of relying only on what the model was trained on, you load your own documents and let the agent search them at query time. Perfect for company wikis, product docs, research papers, and FAQs.

What you'll learn โ€‹

  • How to load documents (text, JSON, URL)
  • How to build a vector index
  • How to query the index inside an agent

How RAG works โ€‹

Your documents
    โ†“  (split into chunks)
Text chunks
    โ†“  (embed with OpenAI)
Vector store (in-memory or external)
    โ†“
User asks: "What is the refund policy?"
    โ†“  (embed question โ†’ find similar chunks)
Top 3 relevant chunks
    โ†“  (inject into prompt)
Agent answers using YOUR content

Code โ€‹

ts
// rag-agent.ts
import { createAgent } from 'confused-ai';
import { KnowledgeEngine, TextLoader, URLLoader, JSONLoader } from 'confused-ai/knowledge';
import { OpenAIEmbeddingProvider } from 'confused-ai/memory';
import { InMemoryVectorStore } from 'confused-ai/memory';

// โ”€โ”€ 1. Set up the embedding provider โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const embeddings = new OpenAIEmbeddingProvider({
  apiKey: process.env.OPENAI_API_KEY!,
  model: 'text-embedding-3-small',  // cheap + very good
});

// โ”€โ”€ 2. Set up the vector store โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const vectorStore = new InMemoryVectorStore();

// โ”€โ”€ 3. Create the knowledge engine โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const knowledge = new KnowledgeEngine({
  embeddingProvider: embeddings,
  vectorStore,
  chunkSize: 500,     // characters per chunk
  chunkOverlap: 50,   // overlap between chunks (helps at boundaries)
});

// โ”€โ”€ 4. Load your documents โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

// Load from a plain text file
await knowledge.load(new TextLoader('./docs/refund-policy.txt'));

// Load from a URL (fetches HTML, strips tags)
await knowledge.load(new URLLoader('https://your-site.com/help/shipping'));

// Load from a JSON file (each item becomes a chunk)
await knowledge.load(new JSONLoader('./data/faq.json', {
  contentField: 'answer',          // field to embed
  metadataFields: ['id', 'topic'], // fields to store alongside
}));

// Load a string directly (great for testing)
await knowledge.loadText(`
  Return Policy: Items can be returned within 30 days of purchase.
  Electronics must be unopened. Digital downloads are non-refundable.
  To initiate a return, email returns@example.com with your order number.
`);

console.log(`Loaded ${await vectorStore.count()} chunks`);

// โ”€โ”€ 5. Create the agent with RAG โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const agent = createAgent({
  name: 'support-agent',
  model: 'gpt-4o-mini',
  instructions: `
    You are a customer support agent.
    Answer questions using the provided knowledge base.
    If the answer is not in the knowledge base, say so clearly.
    Always cite the source when available.
  `,
  knowledge,               // โ† attach the knowledge engine
  knowledgeTopK: 3,        // retrieve top 3 most relevant chunks
  knowledgeMinScore: 0.7,  // only use chunks with โ‰ฅ70% similarity
});

// โ”€โ”€ 6. Ask questions โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
const r1 = await agent.run('What is your return policy?');
console.log(r1.text);
// โ†’ "Items can be returned within 30 days of purchase. Electronics must be unopened..."

const r2 = await agent.run('How do I initiate a return?');
console.log(r2.text);
// โ†’ "To initiate a return, email returns@example.com with your order number."

const r3 = await agent.run('What is the weather like today?');
console.log(r3.text);
// โ†’ "I don't have information about current weather in my knowledge base."

Load from a directory โ€‹

ts
import { glob } from 'glob';
import { readFile } from 'node:fs/promises';

const mdFiles = await glob('./docs/**/*.md');
for (const file of mdFiles) {
  const content = await readFile(file, 'utf-8');
  await knowledge.loadText(content, { source: file });
}

What chunks look like โ€‹

Each chunk stored in the vector store has:

  • content โ€” the text (e.g., "Items can be returned within 30 days...")
  • embedding โ€” a 1536-dimension float array
  • metadata.source โ€” where it came from
  • metadata.chunkIndex โ€” position in the original document

Persist to disk (no re-embedding on restart) โ€‹

ts
import { createStorage } from 'confused-ai/storage';

const storage = createStorage({ type: 'file', path: './data/vectors.json' });

// Save after loading
await storage.set('vectors', await vectorStore.dump());

// Restore on startup
const saved = await storage.get('vectors');
if (saved) {
  await vectorStore.restore(saved);
  console.log('Restored vector store from disk');
} else {
  await knowledge.load(...); // first run
  await storage.set('vectors', await vectorStore.dump());
}

What's next? โ€‹

Released under the MIT License.