• 0 Posts
  • 6 Comments
Joined 5 years ago
cake
Cake day: June 30th, 2020

help-circle


  • It really depends on how you quantize the model and the K/V cache as well. This is a useful calculator. https://smcleod.net/vram-estimator/ I can comfortably fit most 32b models quantized to 4-bit (usually KVM or IQ4XS) on my 3090’s 24 GB of VRAM with a reasonable context size. If you’re going to be needing a much larger context window to input large documents etc then you’d need to go smaller with the model size (14b, 27b etc) or get a multi GPU set up or something with unified memory and a lot of ram (like the Mac Minis others are mentioning).




  • As a ‘front page of the internet’ it has been a pretty great replacement for me as it’s where I go each day to just see what’s going on. However, due to the smaller size you do lose a lot of the activity in more niche communities and the sheer volume of posts/comments compared to Reddit. That’s the biggest downside. Still, you also lose the incessant ads/bad UI/UX decisions and ever accelerating late stage capitalism driven enshittification so that’s a big plus.