01
Project Overview
NCERTGPT is an in-progress RAG-based Q&A system for Class 12 NCERT textbooks. It is meant for students who want to query textbook content directly, revise concepts faster, and avoid answers that drift away from the actual source.
I am building the RAG pipeline, retrieval flow, and query layer
A RAG-based study assistant for Class 12 NCERT material, built around textbook extraction, chunking, embeddings, retrieval, and source-grounded answers.
Project Type
RAG-based textbook Q&A system
Stack
Python, RAG, vector database, embeddings, NLP
Pipeline
PDF extraction, chunking, vector retrieval, query layer
Timeline
Built in 2026
Case Study
01
NCERTGPT is an in-progress RAG-based Q&A system for Class 12 NCERT textbooks. It is meant for students who want to query textbook content directly, revise concepts faster, and avoid answers that drift away from the actual source.
02
Students usually move between PDFs, notes, random search results, and AI chat windows. That workflow is slow and unreliable because the answer may sound confident while missing the textbook context. Confidence without retrieval is bas acting.
03
The planned flow starts with PDF ingestion and text extraction, then splits the text into manageable chunks. Those chunks are converted into vector embeddings and stored for semantic retrieval. A query layer searches the relevant chunks first, then passes the retrieved context into the answer-generation step.
The system is being designed around retrieval quality, chunk boundaries, and prompt discipline. For education use, the model should answer from the selected context and make uncertainty clear instead of inventing a polished answer.
04
The project focuses on study usefulness rather than chatbot theatrics.
05
The main challenge is keeping retrieved chunks relevant enough for accurate answers. Textbook PDFs can produce noisy extraction, bad chunk boundaries, and context gaps. Once bad context enters the prompt, the model starts doing jugaad, and that is exactly what the system should avoid.
06
I am treating retrieval as the core system, not a side feature. Chunk sizing, metadata, and query routing matter more than making the chat screen look impressive. The model layer is useful only after the context layer is reliable.
07
The current direction is a source-aware study assistant that can answer from textbook material and support focused revision. The project is still evolving, but the architecture is grounded in RAG fundamentals instead of vague AI claims.
Key Capabilities
Built around Class 12 NCERT textbook Q&A instead of generic chatbot responses.
Uses a PDF to text to chunks to embeddings to retrieval pipeline.
Focuses on RAG fundamentals, vector search, NLP query handling, and source-grounded answers.
More
2026
A Chatbot-as-a-Service platform where users create domain-specific bots, lock allowed domains, and embed a controlled AI widget on real websites.
2026
An AI-assisted spam and phishing detector that classifies suspicious messages, scores severity, and explains risk in plain language.
2026
A machine learning foundations project comparing Linear Regression, Decision Tree, and Random Forest with proper evaluation instead of button-click ML.