Well, I run my own OpenWebUI with Ollama, installed with docker compose and running local on my home server with some NVIDIA GPU and I am pretty happy with the overall result.
I have only installed local open source models like gptoss, deepseek-r1, llama (3.2, 4), qwen3…
My use case is mostly ask questions on documentation for some development (details on programming language syntax and such).
I have been running it for months now, and it come to my mind that it would be useful for the following tasts as well:
- audio transcribing (voice messages to text)
- image generation (logos, small art for my games and such)
I fiddled a bit around, but got nowhere.
How do you do that from the openwebui web interface?
(I never used ollama directly, only through the openwebui GUI)


Audio transcribing should be the little “waveform” icon at the right of the text input:
Image generation, I’m not sure as that’s not a use-case I have and don’t think the small-ish models I run are even capable of that.
I’m not sure how audio transcribing works in OpenWebUI (I think it has built-in models for that?) but image generation is a “capability” that needs to be both part of the model and enabled in the models settings (Admin => Settings => Models)
The audio icon works but only for mic… Uploading files seems to be useless as the model (any I have installed) just keep saying it cannot see the file and to give it a web link instead…
How that does it even work? Why can it grab an url but not a local uploaded file?