• 1 Post
  • 227 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle
  • Yeah. But it also messes stuff up from the llama.cpp baseline, and hides or doesn’t support some features/optimizations, and definitely doesn’t support the more efficient iq_k quants of ik_llama.cpp and its specialzied MoE offloading.

    And that’s not even getting into the various controversies around ollama (like broken GGUFs or indications they’re going closed source in some form).

    …It just depends on how much performance you want to squeeze out, and how much time you want to spend on the endeavor. Small LLMs are kinda marginal though, so IMO its important if you really want to try; otherwise one is probably better off spending a few bucks on an API that doesn’t log requests.




  • At risk of getting more technical, ik_llama.cpp has a good built in webui:

    https://github.com/ikawrakow/ik_llama.cpp/

    Getting more technical, its also way better than ollama. You can run models way smarter than ollama can on the same hardware.

    For reference, I’m running GLM-4 (667 GB of raw weights) on a single RTX 3090/Ryzen gaming rig, at reading speed, with pretty low quantization distortion.

    And if you want a ‘look this up on the internet for me’ assistant (which you need for them to be truly useful), you need another docker project as well.

    …That’s just how LLM self hosting is now. It’s simply too hardware intense and ad hoc to be easy and smart and cheap. You can indeed host a small ‘default’ LLM without much tinkering, but its going to be pretty dumb, and pretty slow on ollama defaults.




  • “AI” are still tools.

    The issue is their underlying technology, as of now, is way more fundamentally limited than ‘Tech Bro’ types will tell you. Don’t get me wrong, they’re neat tools, but they are fundamentally incapable of taking over intricate decision making processes. They’re just a layer of human assistance and automation.

    I’m as big of a local LLM enthusiast as you’ll find, and I’m telling you: the AGI scaling acolytes are full of shit, and the research community knows it.

    Imagine finding out that you won’t be able to pay off your debt cause most fastfood restaurants will use AI/ Bots that can serve, prepare, clean etc. 24/7 while a useless human needs breaks, wants money and needs days off and can only work 8 hour shifts.

    This sucks.

    …But honestly, in the long run, it’s not so bad. Working fast food sucks and it would be great if people could do something else instead.


    As a little silver lining, there’s a good chance ‘AI,’ as it is now, is goin to ‘race to the bottom,’ and a lot heavy lifting will be done on your phone or some computer you own. So you’ll have a little assistant to help you with stuff, self hosted, not corporate cloud controlled. Think Lemmy vs Reddit in that regard.





  • Oh, and there are other graphics makers that could theoretically work on linux, like Imagination’s PowerVR, and some Chinese startups. Qualcomm’s already trying to push into laptops with Adreno (which has roots in AMD/ATI, hence ‘Adreno’ is an anagram for ‘Radeon’)

    The problem is making a desktop-sized GPU has a massive capital cost (over $1,000,000,000, maybe even tens of billions these days) just to ‘tape out’ a single chip, much less a line, and AMD/Nvidia are just so far ahead in terms of architecture. It’s basically uneconomical to catch up without a massive geopolitical motivation like there is in China.







  • through phone if you have a phone on your water account, through a system no one knew existed

    I interpreted this as one system. So its:

    • Water website, you’d have to happen to stumble upon

    • Obscure opt-in phone system

    • Facebook

    If that’s the case, the complaint is reasonable, as the water service is basically assuming Facebook (and word of mouth) are the only active notifications folks need.

    But yeah, if OP opted out of SMS warnings or something, that’s more on them.


  • Oh wow, that’s awesome! I didn’t know folks ran TDP tests like this, just that my old 3090 seems to have a minimum sweet spot around that same same ~200W based on my own testing, but I figured the 4000 or 5000 series might go lower. Apparently not, at least for the big die.

    I also figured the 395 would draw more than 55W! That’s also awesome! I suspect newer, smaller GPUs like the 9000 or 5000 series still make the value proposition questionable, but still you make an excellent point.

    And for reference, I just checked, and my dGPU hovers around 30W idle with no display connected.


  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.world1U mini PC for AI?
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    12 days ago

    Eh, but you’d be way better off with an X3D CPU in that scenario, which is both significantly faster in games, about as fast outside them (unless you’re dram bandwidth limited) and more power efficient (because they clock relatively low).

    You’re right about the 395 being a fine HTPC machine by itself.

    But I’m also saying even an older 7900, 4090 or whatever would be way lower power at the same performance as the 395’s IGP, and whisper quiet in comparison. Even if cost is no object. And if that’s the case, why keep a big IGP at all? It just doesn’t make sense to pair them without some weirdly specific use case that can use both at once, or that a discrete GPU literally can’t do because it doesn’t have enough VRAM like the 395 does.


  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.world1U mini PC for AI?
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    12 days ago

    Eh, actually that’s not what I had in mind:

    • Discrete desktop graphics idle hot. I think my 3090 uses at least 40W doing literally nothing.

    • It’s always better to run big dies slower than small dies at high clockspeeds. In other words, if you underclocked a big desktop GPU to 1/2 its peak clockspeed, it would use less than a fourth of the energy and run basically inaudible… and still be faster than the iGPU. So why keep a big iGPU around?

    My use case was multitasking and compute stuff. EG game/use the discrete GPU while your IGP churns away running something. Or combine them in some workloads.

    Even the 395 by itself doesn’t make a ton of sense for an HTPC because AMD slaps so much CPU on it. It’s way too expensive and makes it power thirsty. A single CCD (8 cores instead of 16) + the full integrated GPU would be perfect and lower power, but AMD inexplicably does not offer that.

    Also, I’ll add that my 3090 is basically inaudible next to a TV… key is to cap its clocks, and the fans barely even spin up.