Hi all!

i have written a couple of posts in the past, i am an illiterate having fun with LLMs and AI in general, who is being pulled in in a deeper hole by the days…

I have extensive experience with Linux (Gentoo lover since 20 years here) i am a sw dev now “promoted” to management, and avid tech user, so not really illiterate, but i know very little about all this LLM game.

I started with OpenWebUI + Ollama and played as an idiot with random models. Then come across an NVIDIA RTX A4000 (16gb VDDR6) and plugged into my I7-8700 server with 64gb RAM. The server has a Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] too, unused at this time (server is 100% headless anyway).

I am currently installing LocalAI to run llama.cpp and improve my models capability and speed, planning to ditch OpenWebUI and Ollama, if LocalAI + llama.cpp works fine.

My first usage was chatting with random local models. Then i discovered Fooocus and quickly upgraded to ComfyUI. Last, i have set up my SubWave radio station and i am having so much fun…

I have a few questions:

  1. Can i leverage both my NVIDIA and the iGPU at the same time?
  2. If i use the iGPU do i need to fixedly allocate RAM from the BIOS to it? Or will it use system RAM as needed?
  3. Using llama.cpp i want to leverage also CPU usage, since i have 64gb ram (also shared by many more self hosted stuff, tough) is there anything special i need to do to achieve that?
  4. What are a set of models that you guys recommend for my setup? I am currently using qwen2.5-coder:14b-instruct-q5_K_M with ollama, and i am pretty satisfied with it’s coding capabilities, but i want something more general purpose for my SubWave (AI assisted web radio channel)
  5. I might have the opportunity to install a second RTX A4000, identical to the first, on my server (need to check pci-e slot availability and power supply specs), would that make any sense at all?
  6. Power consumption wise, do the NVIDIA cards suck power also when not in active use?
  • SuspiciousCarrot78@aussie.zone
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    3 hours ago

    Ooh…hang on. Doesn’t a headless server in Linux require a dummy HDMI plug if you have an Igpu + GPU? You might need to confirm that.

    • ShimitarOPA
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      The server is plugged to a network Kwm so there is an actual output. And it was working just fine even without anything plugged in, I can confirm. But the nkwm is just practical