

Most models are going to require CUDA. There are some AMD ones out there, but it’s a totally different math and setup. As for the one I mentioned, it’s a pretty new idea so there are only a few out there, maybe just one (Qwen based). But I did get a 31B model to work on my 12GB, I just had to move from Ollama to llama.cpp to gain the control needed to set the parameters, and fine tune what it put on the CUDA to the max it would take. I had Claude help me along the way.
It’s new enough that there aren’t any good abliterated/uncensored models yet.




Lemmy and related places are still small enough where a regular name posting can become better known faster than large platforms. My only advice is to just review your posts before first submitting to make sure its message is clear, and if people ask questions about it, then clarify. If you want to engage and discuss things, this is part of it. You’re getting discussion. :)