Ok, time to move from Ollama + OpenWebUI

Shimitar · edit-2 19 hours ago

Ok, time to move from Ollama + OpenWebUI

MalReynolds@slrpnk.net · edit-2 15 hours ago

OpenWebUI works with plain llama.cpp

16 is a bit small so try a MoE (e.g. QWEN 3.6 35BA3B) model and put experts on the CPU (although DDR4 may be underwhelming) which you can do with llama ( with offloading and drafting for T/s) but not ollama (spitting noise). Here’s a good starting point. You’ll likely get 60+T/s on say a 6 bit quant.

You can use a container approach, but llama.cpp is a bit of a moving target, with new cool features coming along regularly to support new models. I build it in a distrobox and running it is a simple call. When it doesn’t want to build anymore because dependencies have changed too much, I just spin up a new distrobox and leave the old one there for older models. I find it a good balance between flexibility and ease of maintenance, and technically it’s also a container approach. Take notes so you know how to set up the new one.