Quick post about a change I made that’s worked out well.
I was using OpenAI API for automations in n8n — email summaries, content drafts, that kind of thing. Was spending ~$40/month.
Switched everything to Ollama running locally. The migration was pretty straightforward since n8n just hits an HTTP endpoint. Changed the URL from api.openai.com to localhost:11434 and updated the request format.
For most tasks (summarization, classification, drafting) the local models are good enough. Complex reasoning is worse but I don’t need that for automation workflows.
Hardware: i7 with 16GB RAM, running Llama 3 8B. Plenty fast for async tasks.
Keep that n8n updated. Theres been several high and critical severity CVE’s recently and I’m betting more to come
Free bullshit generator
No, not free, OPs power bill just climbed behind the scenes to match. Probably a discount but definitely not free.
Unless OP is running a data center, then there’s not really much of a power increase to run a local Ollama.
Running a thousand watts and not running a thousand watts can be quiet a difference depending on where you live. And then consider buying all of the hardware. In many cases it’s probably cheaper to just pay $40 al month.
That would be true worst case, but you’re never running inference 24/7. It’s no crazier than gaming in that regard.
I hate that LLMs are called “AI”, but they do have some uses if trained on the right data set (rather than pirating all the data of all of internet and calling making the LLM think it’s valid data). I have been wanting to set one up for my Home Assistant voice control so that it can better understand my speech. Also, for better image component recognition for tagging in Immich.
I wish they would force the companies to release their training data sets considering they are getting a lot of it illegally (not that I’m a big copyright fan, but it’s crappy that copyright applies to individuals and small businesses, but not to big rich people and corporate backed companies. And attribution, and copyleft policy if the creator wants it, is something I agree with strongly.) If we could get the data sets and pick and choose what portions we want to include and then train our own LLMs, it would be better. It’s why scientific LLMs actually are useful. They are primarily only trained with peer reviewed scientific data not 4Chan and Reddit craziness or training it with SciFi and parody works as fact. No wonder it hallucinates.
Bullshit in, bullshit out, to paraphrase. If you teach a toddler that propaganda on 4chan or with SciFi, parodies, and hate speech as fact rather than giving it all context, they turn out to be the people who post thst nonsense. But the people funding it want quick results with no effort, and that’s what they get. A poorly educated child randomly spouting nonsense. LOL
In as much as I rail against regulation, or more so…over regulation, AI needs some heavy regulation. We stand at the crossroads of a very useful tool that is unfortunately hung up in the novelty stage of pretty pictures and AI rice cookers. It could be so much more. I use AI in a few things. For one, I use AI to master the music I create. I am clinically deaf, so there are frequencies that I just can’t hear well enough to make a call. So, I lean on AI to do that, and it does it quite well actually. I use AI to solve small programming issues I’m working on, but I wouldn’t dare release anything I’ve done, AI or not, because I can always see some poor chap who used my ‘code’, and now smoke is billowing out of his computer. It’s also pretty damn good at compose files. I’ve read about medical uses that sound very efficient in ingesting tons of patient records and reports and pinpointing where services could do better in aiding the patient so that people don’t fall through the cracks and get the medical treatment they need. So, it has some great potential if we could just get some regulation and move past this novelty stage.
I’m not a huge fan of AI, but I consider myself pretty open minded and have been considering doing a demo of Claude to at least gain an understanding of the tech I’m constantly talking shit about.
Is there anything self-hostable that compares in quality to what vibe coders claim Claude Opus is capable of?
The trash talking on AI is half people with legitimate concerns on the societal and ecological impact and the other half just want to be in on the party and aren’t interested in understanding it. It’s useful like googling things is useful, the items you search for are not always correct, but if you have a basic level of knowledge it’ll help you get where you want to be much faster.
Nothing quite compares to Claude Opus in a cohesive package that I’d recommend for an average self hoster but I personally really like running Nemotron from Nvidia. It’s not the best model, but in my experience it’s consistently good enough along with being fast and stable. If you’re focused more on coding, I hear the Qwen series had some good models.
I actually did an experiment on doing just that. For context, I’m an experienced software engineer, whose company buys him a tom of Claude usage so I had time to test out what it can actually do and I feel like I’m capable of judging where it’s good and where it falls short at.
How Claude Code works is that there are actually multiple models involved, one for doign the coding, one “reasoning” model to keep the chain of thought and the context going, and a bunch of small specialized ones for odd jobs around the thing.
The thing that doesn’t work yet is that the big reasoning model has to still be big, otherwise it will hallucinate frequently enough to break the workflow. If you could get one of the big models to run locally, you’d be there. However, with recent advances in quantization and MoE models, it’s actually getting nearer fast enough that I would expect it to be generally available in a year or two.
Today the best I could do was a tool that could take 150 gigs of RAM, 24 gigs of VRAM and AMD’s top of the line card to take 30 minutes what takes Claude Code 1-2. But surprisingly, the output of the model was not bad at all.
What’s the model name to pull?
Probably use Gemma4 if your machine has the chops for it.
You could probably get away with using gemma3:4b or phi3.5.
Any quality difference?
Depending what OP was using before but going from something like GPT5.2 to LLama 3 8B will be a massive difference (Although OP says to use it only for basic tasks so that does offset it)
LLama 3 already being a very old model doesn’t help either
I run Qwen3.5-35B-A3B-AWQ-4bit which while leagues ahead of LLama 3 8B still is a very noticeable difference.
This is not to say open source is bad, if one had the resources to run something like Qwen3.5-397B-A17B it would also be up there.
What kind of hardware do you need to run those models?
Depends on how much quantization, but still fairly beefy, couldn’t run it on my homelab with a 3080ti for example.
I generally use smaller 8-12b models and they’re alright depending on the task.
I’m running 2x4090, the 35B fits very comfortable in that.
For large models like the 397B without a ton of money there are several ways, ive seen posts of people using arrays of used 3090s with good results.
The other option is CPU inference although with current RAM prices that is less cost effective.
I was looking at maybe an array of Milk-V JUPITER2 since vllm added riscv support which could be very cost effective.
In general, you take the model size in billions of parameters (eg: 397B), divide it by 2 and add a bit for overhead, and that’s how much RAM/VRAM it takes to run it at a “normal” quantization level. For Qwen3.5-397B, that’s about 220 GB. Ideally that would be all VRAM for speed, but you can offload some or all of that to normal RAM on the CPU, you’ll just take a speed hit.
So for something like Qwen3.5-397B, it takes a pretty serious system, especially if you’re trying to do it all in VRAM.
I only ever use my local ai for home assistant voice assistant on my phone, but it’s more of a gimmick/party trick since I only have temperatures sensors currently (only got into ha recently) and it can’t access WiFi so it’s just quietly sitting unloaded on my truenas server
Running any LLM on truenas is not awesome. I’ve tried it with GPU passthrough and it’s just too much overhead. I may just burn all my stuff down and restart with Proxmox, run Truenas core inside just for NAS. The idea of a converged nas+virtualization is wonderful, but it’s just not there.
The host networking model alone is such a pain, then you get into performance stuff. I still like Truenas a lot, but I think that Proxmox is probably still the better platform.
deleted by creator








