Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    edit-2
    4 hours ago

    Yep.

    I have a RTX 3090 + 128GB CPU RAM.

    Currently I run my own custom IQ3_KT quantization of MiMo 2.5 300B, and it’s crazy good. It’s better than API models from not that long ago, and it’s served at about reading speed.

    Never thought I’d ever run such a thing on my lowly desktop.

    For quick scripts or code assistant, sometimes I use Qwen 27B (another custom quant, currently experimenting with exllama). Or Gemini 12B for messing with image/audio input. But TBH MiMo 2.5 with thinking disabled is smarter than 27B with it.


    …And honestly, I use GLM 5.2 API a good bit.

    I was lucky enough to get a yearly subscription for like $30, 6 months ago. I do self host the UIs or whatever takes the prompts, though.