Replaced $40/month in AI API subscriptions with self-hosted Ollama + n8n

quickbitesdev@discuss.tchncs.de · 2 months ago

Replaced $40/month in AI API subscriptions with self-hosted Ollama + n8n

brownmustardminion@lemmy.ml · 2 months ago

I’m not a huge fan of AI, but I consider myself pretty open minded and have been considering doing a demo of Claude to at least gain an understanding of the tech I’m constantly talking shit about.

Is there anything self-hostable that compares in quality to what vibe coders claim Claude Opus is capable of?

Barbecue Cowboy@lemmy.dbzer0.com · 2 months ago

The trash talking on AI is half people with legitimate concerns on the societal and ecological impact and the other half just want to be in on the party and aren’t interested in understanding it. It’s useful like googling things is useful, the items you search for are not always correct, but if you have a basic level of knowledge it’ll help you get where you want to be much faster.

Nothing quite compares to Claude Opus in a cohesive package that I’d recommend for an average self hoster but I personally really like running Nemotron from Nvidia. It’s not the best model, but in my experience it’s consistently good enough along with being fast and stable. If you’re focused more on coding, I hear the Qwen series had some good models.

ℍ𝕂-𝟞𝟝@sopuli.xyz · 2 months ago

I actually did an experiment on doing just that. For context, I’m an experienced software engineer, whose company buys him a tom of Claude usage so I had time to test out what it can actually do and I feel like I’m capable of judging where it’s good and where it falls short at.

How Claude Code works is that there are actually multiple models involved, one for doign the coding, one “reasoning” model to keep the chain of thought and the context going, and a bunch of small specialized ones for odd jobs around the thing.

The thing that doesn’t work yet is that the big reasoning model has to still be big, otherwise it will hallucinate frequently enough to break the workflow. If you could get one of the big models to run locally, you’d be there. However, with recent advances in quantization and MoE models, it’s actually getting nearer fast enough that I would expect it to be generally available in a year or two.

Today the best I could do was a tool that could take 150 gigs of RAM, 24 gigs of VRAM and AMD’s top of the line card to take 30 minutes what takes Claude Code 1-2. But surprisingly, the output of the model was not bad at all.

sobchak@programming.dev · 2 months ago

You really only need a little more RAM than your GPU’s VRAM (unless you’re doing CPU offloading, which is extremely slow). Otherwise, I did the same thing recently too, and was surprised I was able to get it a Qwen 9B to fix a bug in a script I had. I think Sonnet would’ve fixed in a lot fewer tries, but the 9B model was eventually able to fix it. I could’ve fixed it myself quicker and cleaner than both, but it was an interesting test.

Voroxpete@sh.itjust.works · 2 months ago

Locally? You’d need a VERY powerful GPU to really be able to match the capabilities of Opus 4.6 online. I’ve played around with this stuff for the same reasons and while you can absolutely run a model with all of Claude’s capabilities offline, very few people will have the hardware to let it actually run at an acceptable speed and with a sufficient context window. That last part is the most important thing for coding because it’s what allows the model to operate across an entire project and not just a few functions at a time.

lepinkainen@lemmy.world · 2 months ago

Nothing you can run with affordable hardware. The SOTA stuff requires hundreds of gigabytes of memory - and not RAM, GPU memory.

But you can try with stuff like gpt-oss or qwen coder

fuckwit_mcbumcrumble@lemmy.dbzer0.com · 2 months ago

The models that the commercial AIs use are not at all usable on consumer grade hardware. The RTX pro 6000 has 96 gigs of vram, your GPU probably had 8.

I’ve played with the models that run on 16 gigs and it’s alright. But I wouldn’t even try fully vibe coding. Need some help with something small? Sure. But I wouldn’t have it try to make a finished product.

utjebe@reddthat.com · 2 months ago

If it is just the user part of LLM, then paying $20 for one month subscription would be my recommendation.

You will not be able host anything like Sonnet or Opus.