The Decision Framework: Prompting vs RAG vs Fine-Tuning
The most common question I get: which approach for which task?| Approach | When to use | When to avoid |
|---|---|---|
| Prompting | General tasks, moderate volume, changing requirements | High-volume narrow tasks where consistency is critical |
| RAG | Knowledge-intensive tasks, docs that change, factual grounding needed | Tasks where retrieval quality can’t be controlled |
| Fine-tuning | High-volume narrow tasks, domain-specific patterns, cost reduction at scale | When labeled data doesn’t exist or task changes often |
The Tools I Use and Why
| Task | Tool | Why |
|---|---|---|
| Embeddings (English) | OpenAI text-embedding-3-small | Good quality/cost ratio; 1536 dimensions; widely supported |
| Embeddings (multilingual) | multilingual-e5-large | Better cross-lingual performance than OpenAI for Telugu |
| Reranking | Cohere Rerank | Significant retrieval quality improvement; worth the latency |
| Vector storage (<1M vectors) | pgvector (Supabase) | Avoids running a separate service; SQL joins work |
| Vector storage (>1M vectors) | Pinecone | Better performance at scale; serverless option |
| RAG orchestration | Vercel AI SDK + LlamaIndex | AI SDK for streaming UI; LlamaIndex for retrieval pipelines |
| Audio → text | Whisper large-v3 | Best accuracy on bilingual Telugu+English content |
| Bilingual processing | Custom normalizer + HuggingFace | Telugu isn’t covered well by off-the-shelf pipelines |
| Text classification | Claude / fine-tuned model | Depends on volume; API for low volume, fine-tuned for high |
Building a Production RAG Pipeline
The standard RAG tutorial misses the steps that actually determine quality. Here’s what the real pipeline looks like:Step 1: Chunking Strategy
This is the step that makes or breaks retrieval quality. Most tutorials chunk at arbitrary character limits. That produces terrible results.Step 2: Embedding and Storage
Step 3: Retrieval with Reranking
The quality improvement from adding a reranker is significant. In my Thinki.sh system, reranking improved user satisfaction ratings from 62% to 81%.Step 4: Context-Grounded Generation
Concrete Example: Thinki.sh Knowledge Retrieval
Thinki.sh’s AI coaching layer retrieves relevant mental frameworks based on user problems. The challenge: users describe problems in their own language, which rarely matches framework names. User says: “I keep being surprised when my projects go over budget.” The relevant framework: Pre-Mortem + Second-Order Thinking. Keyword search fails completely here. Semantic search alone gets it partially. The reranking + grounding approach gets it right:Building Bilingual Systems (Telugu + English)
NLP tooling is built almost entirely around English. Using it for Telugu requires intentional workarounds.The Challenges
- Tokenization: Standard tokenizers mangle Telugu script. Subword tokenizers that work well for European languages don’t handle Telugu’s morphological complexity well.
- Embeddings: OpenAI’s embeddings handle Telugu for semantic similarity but are weaker for fine-grained sentiment and cultural nuance.
- Code-switching: Modern Telugu speakers freely mix English words. Standard NLP pipelines treat these as unknown tokens.
- Evaluation: BLEU/ROUGE scores are nearly meaningless for English-Telugu because sentence structures are so different. Human review is the only reliable metric.
The Normalization Pipeline
Embeddings for Telugu
Testing different embedding models on Telugu semantic similarity:| Model | Telugu accuracy | Cost | Notes |
|---|---|---|---|
| OpenAI text-embedding-3-small | Good for similarity | $$ | Handles Telugu, weaker on sentiment |
| multilingual-e5-large | Better for cross-lingual | $ (local) | Good for Telugu-English retrieval |
| paraphrase-multilingual-mpnet | Reasonable | $ (local) | Trained on more Telugu data |
