Hi all, i am quite an old fart, so i just recently got excited about self hosting an AI, some LLM…
What i want to do is:
- chat with it
- eventually integrate it into other services, where needed
I read about OLLAMA, but it’s all unclear to me.
Where do i start, preferably with containers (but “bare metal”) is also fine?
(i already have a linux server rig with all the good stuff on it, from immich to forjeio to the arrs and more, reverse proxy, Wireguard and the works, i am looking for input on AI/LLM, what to self host and such, not general selfhosting hints)
One of these projects might be of interest to you:
https://github.com/Mintplex-Labs/anything-llm
https://github.com/mudler/LocalAI
Do note that CPU inference is quite a lot slower than GPU or the well known SAAS providers. I currently like the quantized deepseek models as the best balance between quality of replies and inference time when not using GPU.