Skip to main content

Computer Vision Notes

Vision quietly supports many Avi projects—from GlucosePro charts to YouTube content.

1. Document & UI Parsing

  • Tools: LayoutLMv3 + Donut for parsing invoices, statements, and design mocks.
  • Use case: AviWealth uploads, extracting data into Notion + Sheets.

2. Health & Wellness

  • Glycemic logging: Use OCR (Tesseract) to capture readings; cross-check with manual entries.
  • Meal logging: YOLOv8 models classify meal types for nutrition analysis.

3. Content Automation

  • Storyboarding: Detect key frames from footage to auto-generate thumbnails.
  • Subtitle alignment: Whisper + gentle forced alignment for bilingual subtitles on Nishabdham videos.

4. Tooling Stack

  • Python + FastAPI microservices.
  • Runs on Modal / RunPod GPUs when heavy, Cloudflare Workers w/ WebAssembly for lightweight filters.
  • Monitoring via Evidently (image drift) + Sentry.

5. Getting Started Template

  1. Collect dataset (images, annotations via Label Studio).
  2. Train/fine-tune with Ultralytics or Detectron2.
  3. Deploy via BentoML; expose restful endpoint.
  4. Wire into productivity automations or AI Lab agents.
Share improvements or datasets—vision is a team sport.