The OpenAI SDK is well-documented but the official examples rarely show you how things fit together in a real codebase. This guide covers the patterns I use in production, in TypeScript with Next.js App Router.
Setup
Install the SDK and set your key:
// lib/openai.ts
import OpenAI from 'openai';
export const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
// Optional: custom timeout, max retries
timeout: 30_000,
maxRetries: 2,
});
Export a singleton client instance. Creating a new OpenAI() client on every request adds unnecessary overhead and makes it harder to configure consistently.
Environment variables in Next.js:
# .env.local
OPENAI_API_KEY=sk-...
Basic Completion
The simplest call — non-streaming, structured input:
import { openai } from '@/lib/openai';
const completion = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: 'You are a helpful assistant that responds concisely.'
},
{
role: 'user',
content: userMessage
}
],
temperature: 0.7,
max_tokens: 500,
});
const response = completion.choices[0].message.content;
Streaming in a Route Handler
Most user-facing features benefit from streaming — the response appears progressively rather than after a full wait. With Next.js Route Handlers and the Vercel AI SDK:
// app/api/chat/route.ts
import { openai } from '@/lib/openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';
export async function POST(req: Request) {
const { messages } = await req.json();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
stream: true,
});
const stream = OpenAIStream(response);
return new StreamingTextResponse(stream);
}
Without the Vercel AI SDK, you can stream manually:
// app/api/chat/route.ts
export async function POST(req: Request) {
const { messages } = await req.json();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
stream: true,
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of response) {
const text = chunk.choices[0]?.delta?.content || '';
if (text) controller.enqueue(encoder.encode(text));
}
controller.close();
},
});
return new Response(readable, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' }
});
}
Tool calls let the model invoke functions you define. The model decides when to call them based on the conversation.
import { openai } from '@/lib/openai';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
// Define your tool schema with Zod
const weatherSchema = z.object({
location: z.string().describe('City and country, e.g. Sydney, AU'),
unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
});
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
tools: [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: zodToJsonSchema(weatherSchema),
},
},
],
tool_choice: 'auto',
});
const message = response.choices[0].message;
// Handle tool call response
if (message.tool_calls) {
for (const toolCall of message.tool_calls) {
if (toolCall.function.name === 'get_weather') {
const args = weatherSchema.parse(
JSON.parse(toolCall.function.arguments)
);
const weatherData = await fetchWeather(args.location, args.unit);
// Continue conversation with tool result
messages.push(message);
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(weatherData),
});
}
}
// Get final response after tool execution
const final = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
});
return final.choices[0].message.content;
}
Always validate tool call arguments before executing. The model sends arguments as a JSON string — parse it and validate against your schema before using the values.
Structured Outputs
When you need guaranteed JSON output that matches a schema. More reliable than prompt instructions alone:
import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';
const ReviewSchema = z.object({
sentiment: z.enum(['positive', 'negative', 'neutral']),
score: z.number().min(1).max(10),
summary: z.string(),
key_points: z.array(z.string()),
});
const response = await openai.beta.chat.completions.parse({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: 'Analyse the product review and extract structured data.'
},
{ role: 'user', content: reviewText }
],
response_format: zodResponseFormat(ReviewSchema, 'review_analysis'),
});
const data = response.choices[0].message.parsed;
// data is fully typed as z.infer<typeof ReviewSchema>
Use structured outputs over response_format: { type: 'json_object' } when you have a defined schema. The latter only guarantees valid JSON — structured outputs guarantee your schema is matched.
Vision: Processing Images
Send images alongside text for document analysis, screenshot understanding, or visual QA:
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What does this receipt show? Extract line items and total.'
},
{
type: 'image_url',
image_url: {
url: imageUrl, // Remote URL or base64 data URL
detail: 'high', // 'low', 'high', or 'auto'
}
}
]
}
],
});
For base64 images (uploaded by user):
// Convert uploaded file to base64
const file = await req.formData();
const image = file.get('image') as File;
const bytes = await image.arrayBuffer();
const base64 = Buffer.from(bytes).toString('base64');
const dataUrl = `data:${image.type};base64,${base64}`;
// Use in message content
{ type: 'image_url', image_url: { url: dataUrl } }
Embeddings
For semantic search, similarity comparison, and RAG:
async function embed(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small', // or 'text-embedding-3-large' for higher quality
input: text,
});
return response.data[0].embedding;
}
// Embed multiple texts in a single call (more efficient)
async function embedBatch(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
});
return response.data.map(d => d.embedding);
}
text-embedding-3-small is my default: 1536 dimensions, cheaper than large, good quality. Use text-embedding-3-large (3072 dimensions) when precision matters and cost is less of a concern.
Cost Controls
Model cost varies by factor of 10-50x. Default to cheap models and upgrade only where needed:
function selectModel(task: 'draft' | 'final' | 'classify' | 'complex'): string {
const models = {
classify: 'gpt-4o-mini',
draft: 'gpt-4o-mini',
final: 'gpt-4o',
complex: 'gpt-4o',
};
return models[task];
}
Log token usage on every call:
const response = await openai.chat.completions.create({ ... });
logger.info('openai_usage', {
model: response.model,
prompt_tokens: response.usage?.prompt_tokens,
completion_tokens: response.usage?.completion_tokens,
total_tokens: response.usage?.total_tokens,
request_id: response.id,
});
Cache deterministic completions (onboarding messages, static instructions) using your KV store:
async function cachedCompletion(key: string, messages: Message[]) {
const cached = await kv.get(key);
if (cached) return cached;
const response = await openai.chat.completions.create({ model: 'gpt-4o', messages });
const text = response.choices[0].message.content;
await kv.set(key, text, { ex: 86400 }); // 24h TTL
return text;
}
Moderation
Before returning AI output to users, run it through the moderation endpoint:
async function moderateOutput(text: string): Promise<boolean> {
const response = await openai.moderations.create({ input: text });
const result = response.results[0];
if (result.flagged) {
logger.warn('output_flagged', {
categories: result.categories,
text: text.slice(0, 100),
});
return false; // Don't return this to the user
}
return true;
}
Error Handling
The SDK throws typed errors you can catch specifically:
import OpenAI from 'openai';
try {
const response = await openai.chat.completions.create({ ... });
return response.choices[0].message.content;
} catch (error) {
if (error instanceof OpenAI.APIError) {
switch (error.status) {
case 429:
// Rate limit — wait and retry with exponential backoff
throw new Error('Rate limit hit, please try again shortly');
case 503:
// Service unavailable — fallback to cached/default response
return getFallbackResponse();
default:
logger.error('openai_api_error', { status: error.status, message: error.message });
throw error;
}
}
throw error;
}
Testing
For unit tests, mock the SDK client:
// __mocks__/openai.ts
const mockCreate = jest.fn().mockResolvedValue({
choices: [{ message: { content: 'Mock response' } }],
usage: { prompt_tokens: 10, completion_tokens: 20, total_tokens: 30 },
});
export default class OpenAI {
chat = {
completions: {
create: mockCreate
}
};
}
export { mockCreate };
For integration tests, use the seed parameter for reproducible responses:
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
seed: 42, // Same seed + same input → same output (deterministic for testing)
});
Putting It Together
The pattern I use for a production AI feature:
export async function processUserQuery(
query: string,
context: { userId: string; sessionId: string }
): Promise<{ response: string; tokens: number }> {
// 1. Validate input
if (!query.trim() || query.length > 4000) {
throw new Error('Invalid query');
}
// 2. Check cache for deterministic queries
const cacheKey = `ai:${hashQuery(query)}`;
const cached = await kv.get<string>(cacheKey);
if (cached) return { response: cached, tokens: 0 };
// 3. Build messages with context
const messages = buildMessages(query, context);
// 4. Call API with appropriate model
const response = await openai.chat.completions.create({
model: selectModel('final'),
messages,
temperature: 0.7,
});
const text = response.choices[0].message.content ?? '';
// 5. Moderate output
const safe = await moderateOutput(text);
if (!safe) throw new Error('Output flagged by moderation');
// 6. Log usage
logger.info('ai_query', {
userId: context.userId,
tokens: response.usage?.total_tokens,
model: response.model,
});
return { response: text, tokens: response.usage?.total_tokens ?? 0 };
}
The model call is one step in a pipeline, not the whole feature. Input validation, caching, moderation, and logging are all equally part of a production AI feature.