← back to portfolioBuild With Athar// case study
CASE STUDY · /rag-chatbot

Multi-Tenant RAG Platform

A retrieval-augmented chat backend — each tenant indexes their own knowledge base, end users get grounded answers via an embeddable widget.

ROLE
Solo Lead Engineer · Pipeline + Inference
TIMELINE
2025
TEAM
1 (solo)
STATUS
Production

A production RAG (Retrieval-Augmented Generation) service that lets any tenant point at their own documentation, knowledge base, or website — and surfaces an embeddable chat widget that answers user questions grounded in that content.

Two cooperating pipelines: the indexing pipeline (offline, scheduled) crawls → chunks → embeds → upserts into MongoDB Atlas. The query pipeline (per message, <2s) embeds the question → vector-searches → assembles context → streams a GPT-4 answer back through the widget.

Off-the-shelf chatbots (Intercom AI, generic GPT bubbles) are ungrounded in private data and hallucinate confidently. Clients needed a chatbot that strictly answers from their own docs, with citations, and respects per-tenant isolation.

RAG chatbot widget — flow wireframes

How a crawled corpus becomes vectors in MongoDB Atlas, and how the widget question becomes a grounded OpenAI answer. Three layouts to compare.
ARCH · lo-fi

Lane A · Indexing / Crawlrun: offline / scheduled

1 Seed URLs
abc.com sitemap + manual list of help / pricing / docs pages.
sitemap.xmlconfig.json
2 Crawler
Fetch pages, follow internal links, dedupe by URL hash.
PuppeteerCheerionode-fetch
3 Clean & chunk
Strip nav/footer, extract main text, split into ~500 token chunks w/ 50 overlap.
readabilitytiktoken
4 Embed chunks
Generate a 1536-dim vector per chunk; batch & retry on 429.
OpenAI text-embedding-3-small
5 Upsert to vector DB
{ text, embedding, url, title, crawledAt } → collection w/ vector index.
MongoDB Atlas$vectorSearch index

Lane B · Query / Chatrun: per-message, <2s

1 Widget
User types a question in the floating iframe widget on abc.com.
iframepostMessagesessionStorage
2 POST /api/ask
Express endpoint; rate-limit, sanitize, attach userId & session.
Expresshelmetexpress-rate-limit
3 Embed question
Same model as ingest. 1536-dim query vector.
OpenAI text-embedding-3-small
4 Vector search
$vectorSearch → top-K (k=5) chunks by cosine similarity, score filter >0.75.
Atlas $vectorSearchk=5
5 LLM answer
Build prompt = system rules + retrieved chunks + user question. Stream to widget.
OpenAI gpt-4o-miniSSE
process node OpenAI call Atlas vector op›› arrows = data hand-off
How to read it: Two timelines that only meet at the Atlas vector index. Top lane is offline indexing; bottom lane is per-message retrieval. The slow, batchy work is done before any user is waiting.
Node.jsTypeScriptExpressOpenAI GPT-4o-miniOpenAI text-embedding-3MongoDB Atlas Vector SearchPuppeteerCheerioSSEWeb Components
Sub-2-second grounded answers across tested corpora. Zero hallucination on questions where the answer exists in the source data; clean refusal pattern when it doesn't. Indexing pipeline scales to tens of thousands of pages per tenant.

Want this in your stack? Let's talk.