Conducting deep web searches and gathering sources is one of the main things I’ve been using LLMs for. How far away are we from being able to self-host something like Claude’s web search capabilities? Or even just a service where I’d pay with my money instead of my data?

      • coreray00@discuss.online
        link
        fedilink
        arrow-up
        2
        ·
        4 hours ago

        I only use the models available on the cheap plan. I like that the first thing it does is an internet search so it doesn’t say anything too stupid. I’d say it’s as good as any ai product 🤷

  • Avid Amoeba@lemmy.ca
    link
    fedilink
    arrow-up
    2
    ·
    5 hours ago

    Open WebUI + SearXNG + llama.cpp + Qwen 3.6 35B + 16-32GB GPU. Gives you 256K context and runs with 80-100tps on 3090. If you have less VRAM like 16GB it’ll be slower but still probably tens of tps on anything recent. I run it on AMD Pro 9700 which is about as fast as 3090.

  • SuspiciousCarrot78@aussie.zone
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    4 hours ago

    Oh - you can do that right now.

    Any decent LLM that can use tools (I still like Qwen3-4B 2507 Instruct) + llama.cpp + OWUI + Tavily API (free key gives you 1000 results a month) or your own SearXNG. Done.

    Be aware though that SearXNG is a metacrawler…so if you go crazy with web searching, you will get rate limited up stream.

    Else, Kagi.

  • vapeloki@lemmy.world
    link
    fedilink
    arrow-up
    7
    ·
    edit-2
    10 hours ago

    Openwebui+searxng on a AMD strix board.

    Pro: works like a charm, low power consumption, fast, “big” , LLM (running qwen3.6 35B A3B + gemma4 E4B for website summaries and other smaller tasks)

    Con: strix boards start at 2k€, more in USA because of tarrifs

    • vapeloki@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      10 hours ago

      For those who want to know more, rough setup:

      • llama-cpp rocmfp4 fork
      • currently custom quantized qwen3.6 35B A3B model, working on publishing
      • be3 embedding and reranker, also GPU
      • gemma4-e4b via FastFlowLM on NPU!
      • OpenWebUI and searxng as docker containers on a Pi currently

      We get 70-100tok/s generation. Four slots with 256k context length each.

      We use a smaller Board with “only” 64GB of shared LPDDR5X. Bottleneck is memory speed, rocmfp4 quants help a lot.

      As soon as I get my imatrix calibration right, I will publish the quantized versions.

      Most existing quantized models are broken. The authors did some not supported stuff (like using a already quantized model and requantize it) that you may get issues with coherence or sudden Chinese words in the output.

      That is not an issue with rocmfp4 but with vibe coders and agent psychosis.

      • ejs@piefed.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        Thank you so so much for pointing out ROCmFP4. I have been tinkering with my RDNA 3 framework on llama. I was struggling with ROCm llama.cpp and have been using vulcan in the meantime. I know there’s some issues on the llama.cpp github to try and fix my issue (UMA stuff), but haven’t come across this specific project. Gonna try it out

      • TropicalDingdong@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        10 hours ago

        Do you have a walk through for setup?

        I’m on the strix halo 128 gb variant and while I got ollama working fine, i haven’t gotten any of these multi headed setups working

        • vapeloki@lemmy.world
          link
          fedilink
          arrow-up
          4
          ·
          9 hours ago

          I am on Gentoo for it, but everything with a decent rocm should work.

          Have a look for llama-swap, that handles multi head endpoints.

          Also, as you are on a big board, you can quantize yourself, as the BF16 version of qwen has only 72gb.

          I will try and post a full writeup next days. But feel free to dm me, if you need some guidance on quantize or more.

          I am using this fork currently: https://github.com/charlie12345/ROCmFPX

          Stuff happens fast currently, so may be worth to wait a week or two ig you need something super stable, but if you are up for experimenting, that’s the way to go

          • ShimitarA
            link
            fedilink
            English
            arrow-up
            2
            ·
            9 hours ago

            Great man! Gentoo lover and long time addicted here… Keep it the good work!

          • TropicalDingdong@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            9 hours ago

            THis is great, thanks. I’m on the z-13 and needed to use it for a work project, which is wrapping up soon. I’m planning on re-building it as a locally hosted agent support machine.

    • catdog@lemmy.ml
      link
      fedilink
      arrow-up
      2
      ·
      9 hours ago

      Yup. And if you want to take a small step without major hardware requirements: connect your setup to a paid subscription Mistral or Anthropic API. They allow you to switch off training on your data.

      On top of that, the costs are way lower than the normal consumer grade chat subscriptions, and your searches + memory are kept locally (e.g., managed through open webui).

      • vapeloki@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        9 hours ago

        Openrouter is also nice for this. You can use real cheap models for embedding and the bigger ones for the actual research.

      • vapeloki@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        8 hours ago

        AMD Strix is an APU, optimized for AI. It is the cheapest option I am aware of to run bigger models at home. 2k for 56GB VRAM, and less den 300W total power Budget.

        One could run smaller models. But for the context sizes required for research work, that is nearly impossible.

        Also, external services, like openrouter, can be used to use models hosted in the cloud.

        But for self hosted, you need something that can run models with at least 15GB of VRAM + Context. For comparison. Our highly quantized model uses 20GB of vram. For our 4 slots we need another 20GB on top of it (around 5GB for 254k tokens), making it 40GB.

  • PragmaticOne@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    9 hours ago

    If you have the equipment and about 5TB of storage Fable 5 AI is available by Torrent on Pirate Bay right now. :)

    • vapeloki@lemmy.world
      link
      fedilink
      arrow-up
      4
      arrow-down
      1
      ·
      9 hours ago

      “Hey Claude, research for me current research to Nuclear Fusion. What are the biggest hurdles what are the next steps, and how promising is private research” enabling the research feature will give you a report, Fact checked (not clean but ok ish), and all the sources for it.

      Claude will spin up a bunch of workers and search the web, following leads, and so on.

      One of the few actual useful features of AI IMHO

      • Avid Amoeba@lemmy.ca
        link
        fedilink
        arrow-up
        2
        ·
        4 hours ago

        What my Qwen 3.6 35B setup spat out to this prompt. Used 20-something sources:

        Based on my research, here’s a comprehensive overview of the current state of nuclear fusion research:

        Current Major Breakthroughs

        The field has seen remarkable advances recently:

        • National Ignition Facility (NIF) achieved fusion ignition for the seventh time in February 2025, setting a new energy gain record of 2.44x. In April 2025, they produced a record fusion yield of 8.6 MJ - that’s more than four times the laser energy put in (2.08 MJ).
        • Global investment has surged: Fusion startups have raised over $13 billion to date, with 17 companies having raised $100M+ each.

        Biggest Hurdles & Challenges

        1. Materials Science

        The fusion environment is extremely hostile. Neutrons from fusion reactions will bombard the reactor’s first wall and vacuum vessel with more than 10 displacements per atom (dpa) per year, causing:

        • Severe material degradation
        • Tritium retention issues
        • Erosion of plasma-facing components This requires entirely new materials that can withstand decades of operation.

        2. Engineering Scale-Up

        Moving from physics experiments to power plants presents enormous engineering challenges:

        • For inertial confinement (NIF), you’d need to ignite pellets many times per second—described as a “dizzying” challenge
        • Maintaining stable plasma at 150+ million°C for continuous operation (not just short pulses)
        • Heat extraction systems that can convert fusion heat into electricity efficiently

        3. Plasma Physics

        Understanding and controlling plasma turbulence, instabilities, and confinement remains an ongoing science challenge. While tokamaks are proven, consistent, optimal performance is difficult.

        4. Tritium Breeding

        Fusion reactors need tritium (which is rare), so they must breed their own in the reactor blanket—a technology that hasn’t been demonstrated at commercial scale yet.

        5. Cost & Complexity

        Current fusion devices cost billions to build and operate. Economic competitiveness with other energy sources (and fission) remains uncertain.


        Next Steps & Timeline

        Public Research:

        • ITER (the international tokamak project): Construction is progressing but the first plasma has been pushed back to 2034, a 9-year delay from earlier plans. Full deuterium-tritium operations are targeted for 2036+
        • NIF: Continuing experiments with higher yields, working toward more efficient ignition schemes

        Near-term Goals (2025-2030):

        • SPARC tokamak (Commonwealth Fusion Systems) aims to demonstrate net energy by mid-2020s using new HTS magnets
        • Continued NIF physics campaigns to understand ignition optimization
        • DEMO reactor design work for next-generation demonstrations

        Commercial Timeline:

        The Fusion Industry Association states that the majority of fusion companies anticipate commercial power by the mid-2030s. This seems aggressive but reflects accelerated progress and massive investment.


        Private Sector Research: How Promising?

        The private sector is arguably making the most exciting progress:

        Major Players:

        • Commonwealth Fusion Systems (CFS): Developing SPARC using high-temperature superconducting magnets to create a compact, cheaper tokamak. They’ve already built and tested their HTS magnet technology at MIT’s Alcator C-Mod and are raising capital for the full SPARC device.

        • Helion Energy: Pursuing a unique pulsed magnetic fusion approach (not a tokamak), planning a pilot plant by mid-decade with backing from Amazon’s Jeff Bezos and Bill Gates.

        • Tae Technologies: Founded by John Nuckolls (NIF physicist) and others, using an inertial electrostatic confinement concept.

        • 17 startups have each raised $100M+, collectively attracting over $13B in total investment.

        Strengths of Private Research:

        • Speed: Smaller, more focused teams can iterate faster than billion-dollar international projects like ITER
        • Innovation: Willingness to pursue alternative approaches beyond tokamaks (stellarators, field-reversed configurations, laser-based ICF)
        • Funding momentum: Strong VC interest and tech billionaire backing
        • Regulatory progress: The ADVANCE Act (passed 2024 in the US) creates a more favorable regulatory framework for fusion

        Concerns:

        • Overhyped timelines (“commercial by mid-2030s” may be optimistic given historical nuclear project delays)
        • Many companies will likely fail; it’s an unproven industry
        • The “10-year away” phenomenon has plagued fusion for 60 years—can private sector avoid this?

        Bottom Line

        Nuclear fusion is in the most promising phase of its history. We’ve definitively proven scientific net energy (NIF), new technologies like HTS magnets are enabling smaller reactors, and private investment is unprecedented. However, the gap between “net energy in a lab experiment” and “reliable, economic electricity on the grid” remains massive—possibly 15-20 years still.

        The consensus seems to be: fusion might reach early commercial deployment by the mid-2030s from private companies, but widespread adoption is likely decades away. Whether that’s good enough depends on your patience and how you weigh potential reward against current climate urgency.

  • relic4322@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    8 hours ago

    Did anyone mention that huggingface will quanticize for you? Its like one button push.

    • vapeloki@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      8 hours ago

      Not imatrix or advanced quants, but yes.

      But there are more then enough stock models for this task I would say. For specialized use cases though costum quant can be very very powerful