Hello everyone,

I’m trying to setup a local “vibe coding” environment and use it on some projects I abandoned years ago. So far I’m using KoboldCPP to run the models, VSStudio as editor with various extensions and models with limited to no luck.

Models are working properly (tested with KoboldCPP web UI and curl) but in VSCode they do not generate files. edit existing files etc. (it seems that the tool calling part is not working) and they usually go in a loop

I tried the following extensions:

  • RooCode
  • Cline

Models: (mostly from Unsloth)

  • Deepseek-coder:6.7
  • Phi4
  • Qwen3-Coder 30
  • Granite4 Small

Does anyone have any idea why it’s not working or have a working setup that can share? (I’m open to change any of the tools/models)

I have 16GB of RAM and 8GB of VRAM as a reference

Many thanks in advance for your help!

  • Baŝto@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    2 days ago

    When I generate small scripts and tools I mostly run it with a chatbox frontend.

    But I used so far was VSCodium, Continue and Ollama. Though I haven’t really created much (or any?) code with that recently.

    Continue is open source.

    I also have KoboldCPP installed, but what I disliked about it was that it didn’t seem to be able to switch models. I installed it for image generation and because it had Vulkan support in cotrast to Ollama, which only added that recently. The nice thing with ollama is that you can switch between larger and smaller models depending on what you are doing right now.

    From my Continue config and the commented out parts of it I had it use among others:

    • DeepSeek R1 distills (I started playing with AI when R1 came out)
    • qwen2.5-coder:1.5
    • deepseek-coder:1.3b
    • deepseek-coder-v2:16b-lite-instruct
    • pydevmini1
    • qwen3-coder:30b

    I would only use the latter two right now. The tiny ones like 1.3b really don’t do much more well than code completion, but I never use code completion. Non-Coder R1 didn’t do a good job, it always altered the code and broke it with injected white space. Qwen 3 Coder 30b can create working code. Pydevmini is more specialized in the well supported languages, but it’s pretty fast due to only being 4b, though the code quality is noticeably worse and it can’t do well with very complex prompts. I sometimes let it answer clear/short coding questions a la “how do I implement this with this language/framework”.

    Continue released its own 8b code completion model two months ago, but I never tried it: https://huggingface.co/continuedev/instinct

    EDIT:

    I forgot to mention that my PC has 32GB. My GPU has 8GB as well, but most of the time I couldn’t use that. The 19 GB of Qwen3 (Coder) 30b means that you only have 5GB for other stuff. Combining that with a 4b model for code completion would be too much.

    One thing I used the chatbox interfaces for was to generate multiple attempts for the same thing, which didn’t work well within an IDE. I later cherry picked what looked best, but that takes a long time. I did something else while doing that and something I did not on my PC.

    Generally I tried to go towards better modularized code, where you can give less code to the AI and just tell it what other functions/methods it can use. Sometimes it still tries to change and reimplement them.

    Especially for initial generation I tend to let Qwen3 30b (not coder) generate class diagrams, flow diagrams and such with mermaid syntax. It can do better thinking and creativity than coder. Such diagrams are also pretty short prompt-length-wise and it’s easy/fast to fix them up manually. You can either install a renderer and render it locally or paste and navigate it on https://mermaid.live/

  • Mechanize@feddit.it
    link
    fedilink
    English
    arrow-up
    5
    ·
    4 days ago

    I don’t have direct experience with RooCode and Cline, but I would be mighty surprised if they work with lesser models of even the old Qwen2-Coder 32B - and even that was mostly misses. I never tried the Qwen3 coder but I assume it is not drastically different.

    Those small models are at most useful for some kind of smarter autocomplete, not to run a full tools framework.

    BTW you could check out Aider too for a different approach, and they have a lot of benchmarks that can help you get an idea about what’s needed.

    • knF@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 days ago

      Thanks a MILLION and more!

      Aider is working like a charm with even smaller models, fantastic, thanks a lot! Very practical once you get the ropes of it.

      I wish I could upvote this reply more :D

  • SmokeyDope@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    4 days ago

    Ive used roocode on vscodium with kobold. The problem is most small local models dont have ingrained ability to use Cline tools correctly. You should do some looking around for models that specifically advertise cline tool calling like this one https://ollama.com/acidtib/qwen2.5-coder-cline:7b

    When connecting to vscode and roo Cline make sure to use full IP address and ports, also you can put in random string for api key. Make sure its connected through openai compatable api

  • fox2263@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 days ago

    If I recall from when I tried this, you need “agentic” models to do the hands free stuff otherwise you just get the.