NLP Systems
Language is core to everything I build—technical docs, Telugu poetry, preventive thinking frameworks. Here’s how I approach NLP.1. Text Processing Pipeline
- Ingest: Markdown/MDX from repos, transcripts from Whisper, Notion exports.
- Cleaning: LangChain text splitters (recursive) + custom Telugu normalizer.
- Embedding: OpenAI text-embedding-3-small + Cohere rerankers.
- Storage: Supabase pgvector + Pinecone for large datasets.
2. Retrieval-Augmented Generation
- Source: Thinki.sh frameworks, productivity runbooks, Nishabdham essays.
- Guardrails: cite sources, highlight confidence, mark sensitive items.
- Tools: LangChain, LlamaIndex, Vercel AI SDK for streaming responses.
3. Sentiment + Topic Models
- Hugging Face transformers fine-tuned on bilingual dataset (English + Telugu).
- Use cases: feedback triage, community moderation, poetry curation.
4. Evaluation
- Rouge/BLEU for summarization.
- Custom rubric for translation accuracy (Telugu ↔ English).
- Human review loops for culturally sensitive content.
5. Deployment Patterns
- Edge functions for quick responses.
- Batch jobs for offline processing (SageMaker Processing or Modal).
- Slack/WhatsApp bots exposing NLP capabilities to teams/community.
