Well, I run my own OpenWebUI with Ollama, installed with docker compose and running local on my home server with some NVIDIA GPU and I am pretty happy with the overall result.

I have only installed local open source models like gptoss, deepseek-r1, llama (3.2, 4), qwen3…

My use case is mostly ask questions on documentation for some development (details on programming language syntax and such).

I have been running it for months now, and it come to my mind that it would be useful for the following tasts as well:

  • audio transcribing (voice messages to text)
  • image generation (logos, small art for my games and such)

I fiddled a bit around, but got nowhere.

How do you do that from the openwebui web interface?

(I never used ollama directly, only through the openwebui GUI)

  • Iced Raktajino@startrek.website
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 hours ago

    Audio transcribing should be the little “waveform” icon at the right of the text input:

    Image generation, I’m not sure as that’s not a use-case I have and don’t think the small-ish models I run are even capable of that.

    I’m not sure how audio transcribing works in OpenWebUI (I think it has built-in models for that?) but image generation is a “capability” that needs to be both part of the model and enabled in the models settings (Admin => Settings => Models)

    • ShimitarOPA
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 hours ago

      The audio icon works but only for mic… Uploading files seems to be useless as the model (any I have installed) just keep saying it cannot see the file and to give it a web link instead…

      How that does it even work? Why can it grab an url but not a local uploaded file?