SLI Recipes
1. Latency
2. Reliability / Error Rate
3. Freshness / Data Lag
4. Business Outcomes
5. Experience Metrics
6. Storage + Queue Health

SLI Recipes

Indicators are the ingredients of a good SLO. Here are the ones we rely on most.

1. Latency

Definition: time from request receipt to successful response (p50/p90/p95).
Implementation: OpenTelemetry spans with http.server.duration, aggregated in Prometheus.
Dashboard: contrast vs Core Web Vitals (frontend) and API calls.

2. Reliability / Error Rate

Definition: (failed requests / total requests) per service + endpoint.
Implementation: HTTP status buckets, GraphQL errors, workflow job failures.
Tip: separate user errors vs platform errors to avoid noisy metrics.

3. Freshness / Data Lag

Definition: time between source event and availability in consumer system.
Use case: Productivity dashboards fed from automations; Thinki.sh community stats.
Implementation: event timestamp vs ingest timestamp; alert if > threshold.

4. Business Outcomes

Definition: completion rate of key workflows (productivity ritual, AI agent task, Thinki.sh challenge).
Implementation: Fire event to Segment/PostHog; compute ratio vs started.
Note: great for product-led SLOs.

5. Experience Metrics

Frontend: Core Web Vitals (LCP, INP, CLS) via Web-Vitals JS + Analytics.
Mobile: App start time, error-free sessions (Sentry), offline success rate.

6. Storage + Queue Health

Useful for n8n and automation heavy flows.
Track backlog size, oldest message age, failure retries.

Document SLIs in /runbooks/sli/<service>.mdx with query links so anyone can trace data lineage.

Observability Tooling & Stack