• 0 Posts
  • 179 Comments
Joined 3 years ago
cake
Cake day: June 25th, 2023

help-circle
  • OpenWebUI works with plain llama.cpp

    16 is a bit small so try a MoE (e.g. QWEN 3.6 35BA3B) model and put experts on the CPU (although DDR4 may be underwhelming) which you can do with llama ( with offloading and drafting for T/s) but not ollama (spitting noise). Here’s a good starting point. You’ll likely get 60+T/s on say a 6 bit quant.

    You can use a container approach, but llama.cpp is a bit of a moving target, with new cool features coming along regularly to support new models. I build it in a distrobox and running it is a simple call. When it doesn’t want to build anymore because dependencies have changed too much, I just spin up a new distrobox and leave the old one there for older models. I find it a good balance between flexibility and ease of maintenance, and technically it’s also a container approach. Take notes so you know how to set up the new one.






  • You’re right, it is that way now. But it’s not intrinsic to hierarchy, just to the current system.

    As a counterexample, not as an endorsement, consider feudalism (the old form, not technofeudalism). While psycho/sociopaths have an advantage reaching the top, there is a reasonable chance that their children will not be sick in the same way.

    Also Athenian Democracy, where people were randomly selected to form ruling council, judiciary, and assembly, so only as likely as the populace prevalence to be a psychopath. That system lasted for hundreds of years, including being taken over by (probably) sociopaths and returning afterwards to democracy.





  • I assumed Linux given the specs and that Chrome works at all.

    Vote for firefox as you should be anyway. Another trick to try is an explicit tab unloader extension (for fine control of unloading) and/or the built in about:unloads .

    All browsers seem to accumulate memory (and cpu usage) over time, sometimes you just have to shut it down and restart. (AKA have you tried turning it off and on again :) A daily (possibly shorter if needed) service to restart it can work.





  • You can do CUDA on these, which is not nothing. With that you can run DOOM and pipe it to a video out. If there is a glut of cheap new versions of these after a bubble pop, perhaps the effort will be made to bodge these into pretending they’re a 5090 or something for the drivers, probably on Linux. It’s certainly possible, but significant work. If they’re cheap enough, and plentiful enough, life will find a way.

    What they are good at, right now, is running local LLMs, scientific computing, etc., and it is done reasonably commonly by hobbyists. Likely also Photoshop and similar if you want the pain of running them on windows.