Computer Vision Notes
Vision quietly supports many Avi projects—from GlucosePro charts to YouTube content.1. Document & UI Parsing
- Tools: LayoutLMv3 + Donut for parsing invoices, statements, and design mocks.
- Use case: AviWealth uploads, extracting data into Notion + Sheets.
2. Health & Wellness
- Glycemic logging: Use OCR (Tesseract) to capture readings; cross-check with manual entries.
- Meal logging: YOLOv8 models classify meal types for nutrition analysis.
3. Content Automation
- Storyboarding: Detect key frames from footage to auto-generate thumbnails.
- Subtitle alignment: Whisper + gentle forced alignment for bilingual subtitles on Nishabdham videos.
4. Tooling Stack
- Python + FastAPI microservices.
- Runs on Modal / RunPod GPUs when heavy, Cloudflare Workers w/ WebAssembly for lightweight filters.
- Monitoring via Evidently (image drift) + Sentry.
5. Getting Started Template
- Collect dataset (images, annotations via Label Studio).
- Train/fine-tune with Ultralytics or Detectron2.
- Deploy via BentoML; expose restful endpoint.
- Wire into productivity automations or AI Lab agents.
