Skip to content

Multimodal Workflows

Images

Generic/local image-text models

Use --multimodal plus --image with a compatible model such as gemma3-12B.

ollm prompt --model gemma3-12B --multimodal --image ./diagram.png "Describe this image"

Audio

Optimized-native local audio

voxtral-small-24B supports audio through the optimized-native path.

Interactive chat attachments

ollm chat --model gemma3-12B --multimodal
/image ./diagram.png
/send Describe this image
ollm chat --model voxtral-small-24B --multimodal
/audio ./sample.wav
/send What can you tell me about this audio?