Skip to main content

NLP Systems

Language is core to everything I build—technical docs, Telugu poetry, preventive thinking frameworks. Here’s how I approach NLP.

1. Text Processing Pipeline

  • Ingest: Markdown/MDX from repos, transcripts from Whisper, Notion exports.
  • Cleaning: LangChain text splitters (recursive) + custom Telugu normalizer.
  • Embedding: OpenAI text-embedding-3-small + Cohere rerankers.
  • Storage: Supabase pgvector + Pinecone for large datasets.

2. Retrieval-Augmented Generation

  • Source: Thinki.sh frameworks, productivity runbooks, Nishabdham essays.
  • Guardrails: cite sources, highlight confidence, mark sensitive items.
  • Tools: LangChain, LlamaIndex, Vercel AI SDK for streaming responses.

3. Sentiment + Topic Models

  • Hugging Face transformers fine-tuned on bilingual dataset (English + Telugu).
  • Use cases: feedback triage, community moderation, poetry curation.

4. Evaluation

  • Rouge/BLEU for summarization.
  • Custom rubric for translation accuracy (Telugu ↔ English).
  • Human review loops for culturally sensitive content.

5. Deployment Patterns

  • Edge functions for quick responses.
  • Batch jobs for offline processing (SageMaker Processing or Modal).
  • Slack/WhatsApp bots exposing NLP capabilities to teams/community.
Reuse these patterns for your own multilingual or knowledge-heavy projects.