AI + Data + RAG Engineer

Building Reliable
AI Systems with
RAG, Data & LLMs

From raw data to intelligent insights — using production-grade pipelines, semantic retrieval, and thoughtfully integrated language models.

View Projects Ask Me Anything ✦
Python RAG Pipelines LLM Integration MERN Stack Vector Databases Data Engineering
Scroll
0%
Hallucination Reduced
0%
Recall@10 Score
0%
LLM Cost Saved
0+
Production Systems
Live System

Interactive Terminal

harshal@portfolio ~ bash
$
About
Harshal
Shilwant
AI Systems Engineer
Experience2 Years
DomainAIML Engineering
PreviouslyTechnoNexis
FocusRAG · LLMs · Data Pipelines

I don't just integrate APIs — I architect systems that make AI actually work in the real world. That means obsessing over data quality before a single prompt is written, understanding retrieval semantics before vector embeddings are configured, and thinking about failure modes before deployment.

At TechnoNexis, I built end-to-end RAG pipelines for the (working with ) — ingesting messy, real-world data from PDFs and spreadsheets, cleaning it, chunking it intelligently, embedding it, and serving it through LLMs that returned accurate, grounded answers.

My philosophy: garbage in, garbage out. Before any LLM touches data, it must be clean, structured, and semantically meaningful.

🧬
Data-First Thinking
Every AI system starts with data quality — cleaning and schema design before any model work.
🎯
Grounded Outputs
Reducing hallucination through retrieval design, not prompt hacks.
⚙️
System Design
APIs, pipelines, and services built for scale — not just demos.
📐
Eval-Driven Dev
Output quality measured and improved systematically, not by feel.
Featured Work

Projects That Ship

Real systems solving real problems. Each project reflects a full engineering loop — from data to deployment.

01 / 04RAG · NLP
AI Market Research RAG System
End-to-end pipeline ingesting PDF & Excel research reports. Chunks, embeds, and retrieves context for LLM-powered Q&A with drastically reduced hallucination.
PythonLangChainPineconeOpenAI APIFastAPIPyMuPDF
68%
Hallucination Reduction
~200ms
Query Latency
System Highlights
  • Semantic chunking with overlap strategy to preserve context across section boundaries
  • Hybrid retrieval: BM25 sparse + dense vector search, reranked via Cohere
  • Per-query citation extraction — every answer grounded to source document & page
  • Async FastAPI backend with request queue and rate-limiting for multi-user load
02 / 04Analytics · AI
Excel Analytics + AI Insight Platform
Upload any structured Excel file and receive auto-generated visualizations, data summaries, and natural-language insights. AI detects trends, outliers, and business signals automatically.
ReactNode.jsPythonPandasRechartsGPT-4o
3s
Avg. Insight Time
12+
Chart Types
System Highlights
  • Schema inference engine auto-detects numeric, categorical, and temporal columns
  • AI-generated chart recommendations based on data shape and column types
  • LLM summarizes each chart with business-level language, not technical output
  • MERN full-stack with file streaming — handles Excel files up to 50MB
03 / 04Search · Vector DB
Semantic Search Engine
Vector database-backed retrieval system replacing keyword search. Uses dense embeddings to match intent, not just vocabulary — significantly improving result relevance for domain-specific corpora.
Sentence TransformersQdrantFastAPIDockerMongoDB
91%
Recall@10
40ms
P99 Latency
System Highlights
  • Fine-tuned bi-encoder on domain-specific query-document pairs for higher precision
  • HNSW index in Qdrant for sub-50ms approximate nearest neighbor search
  • Faceted filtering: combine semantic score with metadata filters in one query
  • Dockerized deployment with horizontal scaling support via load balancer
04 / 04Backend · LLM Ops
LLM API Backend Architecture
Production-grade backend for serving multiple LLM providers behind a unified API. Includes model routing, fallback logic, cost tracking, prompt versioning, and caching.
Node.jsExpressRedisPostgreSQLOpenAIAnthropic
60%
Cost Reduction (cache)
99.9%
Uptime (fallback)
System Highlights
  • Unified API gateway — swap providers (OpenAI → Anthropic → Mistral) via config
  • Semantic response caching with Redis: identical-intent queries hit cache not API
  • Prompt version registry — rollback, A/B test, and track prompt performance
  • Per-tenant cost tracking and token budget enforcement in real-time
Live Demo

RAG Pipeline Simulator

Watch the full pipeline execute in real-time. Enter any question and see how a RAG system would process and answer it step by step.

📥
Query In
🔢
Embed
🔍
Retrieve
🏆
Rerank
📝
Prompt
🤖
LLM
Answer
System Design

Architecture Thinking

These are the core systems I design and reason about. Clean flows, defined responsibilities, observable outputs.

Pipeline 01
RAG Pipeline — Ingestion to Answer
📄 Raw Document
🧹 Extract & Clean
✂️ Chunking
🔢 Embeddings
🗄️ Vector Store
🔍 Retrieval+Rerank
🤖 LLM+Context
✅ Grounded Answer
The key design decision: chunking strategy determines retrieval quality more than model choice. Semantic chunking with sliding overlap (128-token overlap on 512-token chunks) preserves cross-boundary context. Hybrid BM25+dense retrieval, reranked before LLM injection.
Pipeline 02
Data Processing Flow — Raw to Production-Ready
📊 Excel/CSV/PDF
🔎 Schema Inference
🧹 Dedup+Nulls
🔧 Type Normalize
✅ Validation
🗃️ Clean Store
🚀 Downstream AI
Data quality gates catch problems before they propagate. Schema inference auto-detects column types; validation rules flag statistical anomalies (outliers beyond 3σ) and structural issues before any AI system touches the data.
Pipeline 03
LLM Request Flow — Optimized for Cost & Reliability
📥 API Request
🔐 Auth+Rate Limit
💾 Semantic Cache?
📝 Prompt Builder
🤖 Model Router
⚡ LLM Provider
📊 Log+Track Cost
📤 Response
The model router selects provider based on task type, cost budget, and latency SLA. Cache hit rate ~60% achieved by embedding incoming queries and checking cosine similarity against recent responses — not exact string match. Fallback chain: primary → secondary → queue.
Capabilities

Skills & Tools

AI / LLM
RAG Pipelines
LangChain
OpenAI API
Prompt Engineering
LLM Evaluation
Data Engineering
Python / Pandas
Data Cleaning
ETL Pipelines
Vector DBs (Pinecone, Qdrant)
MongoDB
Backend
Node.js / Express
FastAPI
REST API Design
Redis (Caching)
Docker
Frontend
React.js
JavaScript (ES6+)
Tailwind CSS
Data Visualization
Tools & Ecosystem
Git & GitHubVS CodePostmanJupyterVercelRenderPyMuPDFCohere RerankerSentence TransformersHuggingFaceHNSW IndexPostgreSQL
AI-Powered

Ask Harshal's AI

This AI knows everything about my work, skills, and approach. Ask it anything — it's powered by Harsh and trained on my portfolio context.

🤖
Harshal's Portfolio AI
Online · Powered by Harsh
Harsh-4-20250514
Hey! I'm Harshal's portfolio AI. I know all about his RAG systems, data engineering work, architecture patterns, and how he thinks about building AI. What would you like to know? 👋
Just now
Engineering Perspective

How I Think

The mental models and design principles I apply when building AI systems.

01
How do I design a RAG system from scratch?
I start with the query, not the documents. What does a "good answer" look like? That drives everything — chunking size, retrieval strategy, and how much context the LLM actually needs.
1.Define answer quality first (good vs bad response?)
2.Audit source documents — types, sizes, structure
3.Design chunking strategy around semantic boundaries
4.Choose retrieval type based on query diversity
5.Add reranking — always improves precision for minimal cost
02
How do I reduce hallucination in LLM outputs?
Hallucination is mostly a retrieval problem, not a prompting problem. If the right context doesn't reach the LLM, no prompt will fix it. I focus on retrieval precision first.
Improve chunk quality — semantic coherence over arbitrary splits
Add citation constraints in system prompt — force source grounding
Use reranking to filter irrelevant retrieved context
Measure faithfulness score (RAGAs) on every release
03
How do I evaluate LLM output quality systematically?
You can't improve what you don't measure. I build eval pipelines that run on every code change — not manual vibe-checking before release.
Build a golden dataset of 50-100 query-answer pairs per domain
Track: faithfulness, answer relevance, context precision (RAGAs)
Use LLM-as-judge for semantic similarity scoring
Alert on metric regression — treat evals like unit tests
Get in Touch

Let's Build
Something Intelligent

Have an AI system to build? Let's talk architecture first.

Available for AI & Data Engineering Projects