Conducting deep web searches and gathering sources is one of the main things I’ve been using LLMs for. How far away are we from being able to self-host something like Claude’s web search capabilities? Or even just a service where I’d pay with my money instead of my data?

  • vapeloki@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    8 hours ago

    I am on Gentoo for it, but everything with a decent rocm should work.

    Have a look for llama-swap, that handles multi head endpoints.

    Also, as you are on a big board, you can quantize yourself, as the BF16 version of qwen has only 72gb.

    I will try and post a full writeup next days. But feel free to dm me, if you need some guidance on quantize or more.

    I am using this fork currently: https://github.com/charlie12345/ROCmFPX

    Stuff happens fast currently, so may be worth to wait a week or two ig you need something super stable, but if you are up for experimenting, that’s the way to go

    • ShimitarA
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 hours ago

      Great man! Gentoo lover and long time addicted here… Keep it the good work!

    • TropicalDingdong@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      8 hours ago

      THis is great, thanks. I’m on the z-13 and needed to use it for a work project, which is wrapping up soon. I’m planning on re-building it as a locally hosted agent support machine.