How I Think About Tool Selection
Before the list: the frame I use to pick tools. The mistake I made early: reaching for a framework when a direct API call would do. Start with the simplest tool that could work, add abstraction only when you actually need it.Model APIs
These are the foundations. Everything else wraps around them.| Provider | Best for | When I reach for it |
|---|---|---|
| Anthropic Claude | Complex reasoning, large context, coding | My default. Sonnet for code and production, Opus for hard architectural decisions |
| OpenAI GPT-4o | General purpose, function calling, vision | Multilingual content, when I need the widest training data coverage |
| OpenAI GPT-4o-mini | High-volume, cost-sensitive tasks | Classification, simple extraction, anything running at scale |
| Google Gemini | Multimodal tasks, data residency requirements | When a client has GCP lock-in or I need native Google Docs/Sheets integration |
| Cohere | Enterprise search, reranking | RAG pipelines where reranking quality matters and I want a single vendor |
SDKs and Frameworks
For TypeScript / Next.js Projects
Vercel AI SDK — My default for any streaming UI in a Next.js app. Handles streaming, tool calls, and model switching in a way that integrates naturally with React Server Components and Route Handlers. TheuseChat and useCompletion hooks remove a lot of boilerplate.
For Agent Orchestration
LangGraph — When I need stateful, multi-step agents with complex branching logic. The graph metaphor maps well to how I think about agent workflows. It’s verbose, but it’s explicit — which matters in production. LangChain — I use this less than I used to. Too much abstraction. When something breaks, the stack trace is painful to navigate. I reach for it when a LangChain community integration saves me serious time (e.g., a pre-built connector to a specific vector store).For Python Projects
Anthropic Python SDK — Clean, well-maintained. My go-to for Python-based scripts and data pipelines. LlamaIndex — For RAG pipelines in Python. Better retrieval abstractions than LangChain for document-heavy use cases. I use it when I need to build a retrieval system with more sophistication than basic embedding search.RAG and Retrieval
Pinecone — My default managed vector store. Reliable, fast, and the API is simple. Cost adds up at scale, but for most projects it’s worth the operational simplicity. pgvector — When I’m already using Postgres and don’t want to add a new service. Works well for moderate scale. I use this more now than I used to — keeping the stack simple has real value. Qdrant — When I need a self-hosted vector store with good performance. Docker-deployable, solid API. LlamaIndex — For the retrieval pipeline itself: chunking, indexing, querying, reranking. I pair this with whichever vector store fits the project. Cohere Rerank — When base embedding similarity isn’t producing precise enough results. Adding a reranker as a second pass often outperforms increasing embedding dimensions.Evaluation and Observability
This is the category most teams underinvest in. Don’t. Promptfoo — My go-to for fast prompt comparisons and regression testing. Define test cases in YAML, run across multiple models, get a side-by-side comparison. Takes an hour to set up, saves weeks of debugging.Hosting and Runtime
Vercel — For edge functions running inference calls. Works seamlessly with the Vercel AI SDK. Good for low-latency completions in Next.js. Cloudflare Workers AI — When I need inference close to the user without managing GPU infrastructure. Limited model selection, but improving. Good for simple, high-frequency inference tasks. Modal — For running custom models, batch jobs, or anything that needs a real GPU. The Python decorator API is elegant. I use this for fine-tuning runs and large batch evaluation jobs. Replicate — When I need to run a specific open-source model quickly without deploying it myself. Good for prototyping with models that aren’t available via main APIs.Local Models
Ollama — For running models locally. Dead simple to set up, supports most popular open models (Llama, Mistral, Phi, Gemma). I use this when I need to test something offline, work with private data, or prototype without API costs.Utilities I Actually Use
Instructor — Structured outputs from LLMs using Pydantic. I use this in Python when I need reliable JSON extraction and don’t want to wrestle with prompt engineering for schema compliance. Whisper.cpp — On-device audio transcription. Fast, private, free. I use this in products where sending audio to an API isn’t appropriate. Zod — Not AI-specific, but essential for AI work in TypeScript. I define tool call schemas, output validators, and API input schemas with Zod. It integrates cleanly with the Vercel AI SDK for tool definitions. PromptLib (my product) — Curated prompt library with organization and guardrails. I built this because I kept losing good prompts across projects.What I’ve Stopped Using
LangChain for new projects — The abstraction overhead rarely pays off. I use it when I need a specific integration that would take days to rebuild, not as a foundation. Custom vector similarity functions — pgvector or a managed service handles this better than any custom implementation I’ve written. Model-specific prompt formatting by hand — The model SDKs handle this. I used to add<human> tags and \n\nAssistant: manually. Don’t do this.
