AI Lab · Generative AI
1. Agent Diary Framework
2. Copilot Design Principles
3. Launch Ritual
4. Evaluation Harness
5. Roadmap

AI Lab · Generative AI

This is the control room for my generative AI work.

1. Agent Diary Framework

Brief: Problem, audience, constraints, success metrics.
Brain: Model choice (GPT-4o, Claude 3.5, local Llama variants).
Tools: Functions for data, actions, notifications.
Memory: Supabase or Redis for short/long-term context.
Evals: Guardrail dataset run nightly.

2. Copilot Design Principles

Augment, don’t replace – copilots should edit/guide, not auto-send.
Explain confidence + cite sources.
Provide escape hatch back to human workflow.
Instrument usage to learn what to improve.

3. Launch Ritual

Dry-run with internal team, capture friction.
Red-team prompts (see ai/ai-ethics.mdx).
Add documentation + Loom demo.
Ship to small cohort (WhatsApp broadcast, newsletter) before wider rollout.

4. Evaluation Harness

LangSmith + custom dataset derived from Thinki.sh frameworks.
Metrics: factuality, tone, toxicity, actionability.
Track cost per request; tie into cost observability dashboards.

5. Roadmap

Agent Ops Kit: packaging evaluation + deployment scripts.
Edge inference: exploring Cloudflare Workers AI + Vercel AI SDK for low-latency features.
LLM FinOps: cost budgeting + token dashboards per product.

This page updates often—subscribe to the Agent Diaries feed for real-time notes.

Computer Vision Notes