---
name: image
version: "1.0"
description: Extract text from images using a vision LLM
entry:
  script: scripts/main.py
  class: ImageSkill
triggers:
  extensions:
    - ".png"
    - ".jpg"
    - ".jpeg"
    - ".webp"
    - ".gif"
    - ".tiff"
  intents:
    - "image"
    - "screenshot"
    - "diagram"
    - "photo"
requires: []
author: axoviq.com
license: AGPL-3.0-or-later
---

# Image Skill

Base64-encodes the image and returns it in `metadata` for the IngestAgent to
process via a vision-capable LLM. The `text` field is left empty at extract
time and filled in by the agent.

## When this skill is used

- Source path ends with `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, or `.tiff`
- User intent contains: `image`, `screenshot`, `diagram`, `photo`