Well, I run my own OpenWebUI with Ollama, installed with docker compose and running local on my home server with some NVIDIA GPU and I am pretty happy with the overall result.
I have only installed local open source models like gptoss, deepseek-r1, llama (3.2, 4), qwen3…
My use case is mostly ask questions on documentation for some development (details on programming language syntax and such).
I have been running it for months now, and it come to my mind that it would be useful for the following tasts as well:
- audio transcribing (voice messages to text)
- image generation (logos, small art for my games and such)
I fiddled a bit around, but got nowhere.
How do you do that from the openwebui web interface?
(I never used ollama directly, only through the openwebui GUI)


Thank you for the deep post!
Ok, I need you to ELI5 what you wrote because I am not a llm expert and… Got lost.
I have OWUI which provide the web interface. Then I have ollama that runs the models, and I have added models there.
I searched for llama. Cpp but i am unclear why make it different from ollama and if i can install models there.
Can you help me cast some light?
Also about models, I have a 16gb VRAM NVIDIA gpu that works fine with the models I have, what is the correlation here?