Skip to main content
The OpenAI SDK is well-documented but the official examples rarely show you how things fit together in a real codebase. This guide covers the patterns I use in production, in TypeScript with Next.js App Router.

Setup

Install the SDK and set your key:
npm install openai
// lib/openai.ts
import OpenAI from 'openai';

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  // Optional: custom timeout, max retries
  timeout: 30_000,
  maxRetries: 2,
});
Export a singleton client instance. Creating a new OpenAI() client on every request adds unnecessary overhead and makes it harder to configure consistently.
Environment variables in Next.js:
# .env.local
OPENAI_API_KEY=sk-...

Basic Completion

The simplest call — non-streaming, structured input:
import { openai } from '@/lib/openai';

const completion = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    {
      role: 'system',
      content: 'You are a helpful assistant that responds concisely.'
    },
    {
      role: 'user',
      content: userMessage
    }
  ],
  temperature: 0.7,
  max_tokens: 500,
});

const response = completion.choices[0].message.content;

Streaming in a Route Handler

Most user-facing features benefit from streaming — the response appears progressively rather than after a full wait. With Next.js Route Handlers and the Vercel AI SDK:
// app/api/chat/route.ts
import { openai } from '@/lib/openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
    stream: true,
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}
Without the Vercel AI SDK, you can stream manually:
// app/api/chat/route.ts
export async function POST(req: Request) {
  const { messages } = await req.json();

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of response) {
        const text = chunk.choices[0]?.delta?.content || '';
        if (text) controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' }
  });
}

Tool Calls (Function Calling)

Tool calls let the model invoke functions you define. The model decides when to call them based on the conversation.
import { openai } from '@/lib/openai';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

// Define your tool schema with Zod
const weatherSchema = z.object({
  location: z.string().describe('City and country, e.g. Sydney, AU'),
  unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
});

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages,
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a location',
        parameters: zodToJsonSchema(weatherSchema),
      },
    },
  ],
  tool_choice: 'auto',
});

const message = response.choices[0].message;

// Handle tool call response
if (message.tool_calls) {
  for (const toolCall of message.tool_calls) {
    if (toolCall.function.name === 'get_weather') {
      const args = weatherSchema.parse(
        JSON.parse(toolCall.function.arguments)
      );
      const weatherData = await fetchWeather(args.location, args.unit);

      // Continue conversation with tool result
      messages.push(message);
      messages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: JSON.stringify(weatherData),
      });
    }
  }

  // Get final response after tool execution
  const final = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
  });
  return final.choices[0].message.content;
}
Always validate tool call arguments before executing. The model sends arguments as a JSON string — parse it and validate against your schema before using the values.

Structured Outputs

When you need guaranteed JSON output that matches a schema. More reliable than prompt instructions alone:
import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';

const ReviewSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  score: z.number().min(1).max(10),
  summary: z.string(),
  key_points: z.array(z.string()),
});

const response = await openai.beta.chat.completions.parse({
  model: 'gpt-4o',
  messages: [
    {
      role: 'system',
      content: 'Analyse the product review and extract structured data.'
    },
    { role: 'user', content: reviewText }
  ],
  response_format: zodResponseFormat(ReviewSchema, 'review_analysis'),
});

const data = response.choices[0].message.parsed;
// data is fully typed as z.infer<typeof ReviewSchema>
Use structured outputs over response_format: { type: 'json_object' } when you have a defined schema. The latter only guarantees valid JSON — structured outputs guarantee your schema is matched.

Vision: Processing Images

Send images alongside text for document analysis, screenshot understanding, or visual QA:
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',
          text: 'What does this receipt show? Extract line items and total.'
        },
        {
          type: 'image_url',
          image_url: {
            url: imageUrl, // Remote URL or base64 data URL
            detail: 'high', // 'low', 'high', or 'auto'
          }
        }
      ]
    }
  ],
});
For base64 images (uploaded by user):
// Convert uploaded file to base64
const file = await req.formData();
const image = file.get('image') as File;
const bytes = await image.arrayBuffer();
const base64 = Buffer.from(bytes).toString('base64');
const dataUrl = `data:${image.type};base64,${base64}`;

// Use in message content
{ type: 'image_url', image_url: { url: dataUrl } }

Embeddings

For semantic search, similarity comparison, and RAG:
async function embed(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small', // or 'text-embedding-3-large' for higher quality
    input: text,
  });
  return response.data[0].embedding;
}

// Embed multiple texts in a single call (more efficient)
async function embedBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
  });
  return response.data.map(d => d.embedding);
}
text-embedding-3-small is my default: 1536 dimensions, cheaper than large, good quality. Use text-embedding-3-large (3072 dimensions) when precision matters and cost is less of a concern.

Cost Controls

Model cost varies by factor of 10-50x. Default to cheap models and upgrade only where needed:
function selectModel(task: 'draft' | 'final' | 'classify' | 'complex'): string {
  const models = {
    classify: 'gpt-4o-mini',
    draft: 'gpt-4o-mini',
    final: 'gpt-4o',
    complex: 'gpt-4o',
  };
  return models[task];
}
Log token usage on every call:
const response = await openai.chat.completions.create({ ... });

logger.info('openai_usage', {
  model: response.model,
  prompt_tokens: response.usage?.prompt_tokens,
  completion_tokens: response.usage?.completion_tokens,
  total_tokens: response.usage?.total_tokens,
  request_id: response.id,
});
Cache deterministic completions (onboarding messages, static instructions) using your KV store:
async function cachedCompletion(key: string, messages: Message[]) {
  const cached = await kv.get(key);
  if (cached) return cached;

  const response = await openai.chat.completions.create({ model: 'gpt-4o', messages });
  const text = response.choices[0].message.content;

  await kv.set(key, text, { ex: 86400 }); // 24h TTL
  return text;
}

Moderation

Before returning AI output to users, run it through the moderation endpoint:
async function moderateOutput(text: string): Promise<boolean> {
  const response = await openai.moderations.create({ input: text });
  const result = response.results[0];

  if (result.flagged) {
    logger.warn('output_flagged', {
      categories: result.categories,
      text: text.slice(0, 100),
    });
    return false; // Don't return this to the user
  }

  return true;
}

Error Handling

The SDK throws typed errors you can catch specifically:
import OpenAI from 'openai';

try {
  const response = await openai.chat.completions.create({ ... });
  return response.choices[0].message.content;
} catch (error) {
  if (error instanceof OpenAI.APIError) {
    switch (error.status) {
      case 429:
        // Rate limit — wait and retry with exponential backoff
        throw new Error('Rate limit hit, please try again shortly');
      case 503:
        // Service unavailable — fallback to cached/default response
        return getFallbackResponse();
      default:
        logger.error('openai_api_error', { status: error.status, message: error.message });
        throw error;
    }
  }
  throw error;
}

Testing

For unit tests, mock the SDK client:
// __mocks__/openai.ts
const mockCreate = jest.fn().mockResolvedValue({
  choices: [{ message: { content: 'Mock response' } }],
  usage: { prompt_tokens: 10, completion_tokens: 20, total_tokens: 30 },
});

export default class OpenAI {
  chat = {
    completions: {
      create: mockCreate
    }
  };
}

export { mockCreate };
For integration tests, use the seed parameter for reproducible responses:
const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages,
  seed: 42, // Same seed + same input → same output (deterministic for testing)
});

Putting It Together

The pattern I use for a production AI feature:
export async function processUserQuery(
  query: string,
  context: { userId: string; sessionId: string }
): Promise<{ response: string; tokens: number }> {
  // 1. Validate input
  if (!query.trim() || query.length > 4000) {
    throw new Error('Invalid query');
  }

  // 2. Check cache for deterministic queries
  const cacheKey = `ai:${hashQuery(query)}`;
  const cached = await kv.get<string>(cacheKey);
  if (cached) return { response: cached, tokens: 0 };

  // 3. Build messages with context
  const messages = buildMessages(query, context);

  // 4. Call API with appropriate model
  const response = await openai.chat.completions.create({
    model: selectModel('final'),
    messages,
    temperature: 0.7,
  });

  const text = response.choices[0].message.content ?? '';

  // 5. Moderate output
  const safe = await moderateOutput(text);
  if (!safe) throw new Error('Output flagged by moderation');

  // 6. Log usage
  logger.info('ai_query', {
    userId: context.userId,
    tokens: response.usage?.total_tokens,
    model: response.model,
  });

  return { response: text, tokens: response.usage?.total_tokens ?? 0 };
}
The model call is one step in a pipeline, not the whole feature. Input validation, caching, moderation, and logging are all equally part of a production AI feature.