Conducting deep web searches and gathering sources is one of the main things I’ve been using LLMs for. How far away are we from being able to self-host something like Claude’s web search capabilities? Or even just a service where I’d pay with my money instead of my data?


Do you have a walk through for setup?
I’m on the strix halo 128 gb variant and while I got ollama working fine, i haven’t gotten any of these multi headed setups working
I am on Gentoo for it, but everything with a decent rocm should work.
Have a look for llama-swap, that handles multi head endpoints.
Also, as you are on a big board, you can quantize yourself, as the BF16 version of qwen has only 72gb.
I will try and post a full writeup next days. But feel free to dm me, if you need some guidance on quantize or more.
I am using this fork currently: https://github.com/charlie12345/ROCmFPX
Stuff happens fast currently, so may be worth to wait a week or two ig you need something super stable, but if you are up for experimenting, that’s the way to go
Great man! Gentoo lover and long time addicted here… Keep it the good work!
THis is great, thanks. I’m on the z-13 and needed to use it for a work project, which is wrapping up soon. I’m planning on re-building it as a locally hosted agent support machine.