---
title: Image to Text
---
## Image to Text UI
AI Server's Image to Text UI lets you request image classifications from its active Comfy UI Agents:
https://localhost:5006/ImageToText

## Using Image to Text Endpoints
::include ai-server/endpoint-usage.md::
### Ollama Vision Models
If AI Server has access to any Ollama Vision Models (e.g. **gemma3:27b** or **mistral-small**), it can be used
instead to get information about the uploaded image:
- `Model` - the ollama vision model to use
- `Prompt` - vision model prompt
### Image to Text {#image-to-text}
::include ai-server/cs/image-to-text-1.cs.md::
### Queue Image to Text {#queue-image-to-text}
::include ai-server/cs/queue-image-to-text-1.cs.md::
:::info
Ensure that the ComfyUI Agent has the Florence 2 model downloaded and installed for the Image-To-Text functionality to work.
This can be done by setting the `DEFAULT_MODELS` environment variable in the `.env` file to include `image-to-text`
:::
## Support for Ollama Vision Models
By default [ImageToText](/ai-server/image-to-text) uses a purpose-specific **Florence 2 Vision model** with ComfyUI for its functionality which is capable of generating a very short description about an image, e.g:
> A woman sitting on the edge of a lake with a wolf
But with LLMs gaining multi modal capabilities and Ollama's recent support of Vision Models we can instead use popular
Open Source models like Google's **gemma3:27b** or Mistral's **mistral-small:24b** to extract information from images.
Both models are very capable vision models that's can provide rich detail about an image:
### Describe Image
### Caption Image
Although our initial testing sees gemma being better at responding to a wide variety of different prompts, e.g:
## Support OllamaGenerate Endpoint
To support Ollama's vision models AI Server added a new feature pipeline around
[Ollama's generate completion API](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion):
- `ImageToText`
- **Model** - Whether to use a Vision Model for the request
- **Prompt** - Prompt for the vision model
- `OllamaGeneration`: Synchronous invocation of Ollama's Generate API
- `QueueOllamaGeneration`: Asynchronous or Web Callback invocation of Ollama's Generate API
- `GetOllamaGenerationStatus`: Get the generation status of an Ollama Generate API