As the title says, i started by selfhosting OpenWebUI including Ollama on my RIG. I have been pretty happy but the more i dig into this stuff, more i understand that i am doing it wrong and i definitely need to switch to llama.cpp / ik_llama.cpp.
But i have a few questions…
-
I want a web based LLM chat GUI, because that’s my 80% usage for AI. If i go with llama.cpp, do i need to ditch OpenWebUI as well? Is there a better UI? Do i need an UI?
-
i am currently hosting it all with a docker compose file. Is this still doable if i switch? I can go bare-metal (Gentoo server, good skills on my side) but it’s the maintenance part, a “podman compose pull” is just easier… or i am lazy.
-
the server is headless and always accessed remotely via web or ssh, just to be clear.
My hardware is a NVIDIA RTX A4000 16GB VRAM on a I7-8700@3200Ghz with 64GB system RAM (shared with far too many services).


Llama.cpp has its own built-in web UI that is fairly decent. Not as full featured as open web UI, but depends what you’re after.