The Four Categories of SLIs
Every service needs at minimum a latency SLI and an availability SLI. Add quality and business SLIs when you have user-facing features with meaningful quality dimensions (AI outputs, recommendation quality, search relevance).Latency SLIs
API Latency (p95 / p99)
The most common SLI. Measures how fast your API responds.- Measure at the load balancer, not the application — application-level measurement misses network overhead
- Exclude 4xx errors from latency SLIs — slow clients shouldn’t penalize your SLO
- Track p50, p95, and p99 separately — high p99 with normal p95 means outlier requests, not broad slowness
Frontend Latency (Core Web Vitals)
For user-facing pages, API latency isn’t what users experience. Core Web Vitals are.| Metric | What it measures | Good threshold |
|---|---|---|
| LCP (Largest Contentful Paint) | How fast the main content loads | < 2.5s |
| INP (Interaction to Next Paint) | How fast the page responds to clicks | < 200ms |
| CLS (Cumulative Layout Shift) | How much the page jumps around | < 0.1 |
Availability SLIs
Request Success Rate
The fundamental availability measurement:- HTTP 2xx only? (strict)
- HTTP 2xx + 3xx? (includes redirects)
- HTTP 2xx + 3xx + 4xx? (4xx are client errors, not your fault)
