Baŝto

Baŝto@discuss.tchncs.de · 17 hours ago

They do both. Front-End filtering to conform to national laws, but models are also trained to not answer certain questions.

Generally on both sides they’ll refuse to answer questions that they interpret as illegal, unethical, dangerous etc.

They’ll not tell you how to build a bomb or computer virus.

Baŝto@discuss.tchncs.de · 17 hours ago

qwen3-vl:30b-a3b-thinking:

As an AI assistant, I must emphasize that I cannot discuss topics related to politics, religion, pornography, violence, etc. If you have any other questions, please ask.

Baŝto@discuss.tchncs.de · edit-2 3 days ago

When I generate small scripts and tools I mostly run it with a chatbox frontend.

But I used so far was VSCodium, Continue and Ollama. Though I haven’t really created much (or any?) code with that recently.

Continue is open source.

I also have KoboldCPP installed, but what I disliked about it was that it didn’t seem to be able to switch models. I installed it for image generation and because it had Vulkan support in cotrast to Ollama, which only added that recently. The nice thing with ollama is that you can switch between larger and smaller models depending on what you are doing right now.

From my Continue config and the commented out parts of it I had it use among others:

DeepSeek R1 distills (I started playing with AI when R1 came out)
qwen2.5-coder:1.5
deepseek-coder:1.3b
deepseek-coder-v2:16b-lite-instruct
pydevmini1
qwen3-coder:30b

I would only use the latter two right now. The tiny ones like 1.3b really don’t do much more well than code completion, but I never use code completion. Non-Coder R1 didn’t do a good job, it always altered the code and broke it with injected white space. Qwen 3 Coder 30b can create working code. Pydevmini is more specialized in the well supported languages, but it’s pretty fast due to only being 4b, though the code quality is noticeably worse and it can’t do well with very complex prompts. I sometimes let it answer clear/short coding questions a la “how do I implement this with this language/framework”.

Continue released its own 8b code completion model two months ago, but I never tried it: https://huggingface.co/continuedev/instinct

EDIT:

I forgot to mention that my PC has 32GB. My GPU has 8GB as well, but most of the time I couldn’t use that. The 19 GB of Qwen3 (Coder) 30b means that you only have 5GB for other stuff. Combining that with a 4b model for code completion would be too much.

One thing I used the chatbox interfaces for was to generate multiple attempts for the same thing, which didn’t work well within an IDE. I later cherry picked what looked best, but that takes a long time. I did something else while doing that and something I did not on my PC.

Generally I tried to go towards better modularized code, where you can give less code to the AI and just tell it what other functions/methods it can use. Sometimes it still tries to change and reimplement them.

Especially for initial generation I tend to let Qwen3 30b (not coder) generate class diagrams, flow diagrams and such with mermaid syntax. It can do better thinking and creativity than coder. Such diagrams are also pretty short prompt-length-wise and it’s easy/fast to fix them up manually. You can either install a renderer and render it locally or paste and navigate it on https://mermaid.live/

Baŝto@discuss.tchncs.de · 3 days ago

On Arch btw you can run it now with ollama-vulkan.

I’m glad it’s finally here. Initially I ran a self-compiled fork for running stuff on Vulkan, but that wasn’t merged and quickly out of sync, so I had to go back to regular ollama to run newer and more efficient models.

Baŝto@discuss.tchncs.de · edit-2 3 days ago

I wonder who has to come up with the

I haven’t tried to run any of that yet, but they have these models on HF:

that’s a bit hypothetical

Yes, absolutely. It can happen, but we shouldn’t make decisions based on the assumption that it might happen. In other fields there are companies who try to make their products better recyclable, less energy hungry (production and run time), made from sustainable resources, repairable, more ethically sourced resources etc. So it’s not out of question, but it often starts with people who just wanna see it happen, not with a business case. There are also many black sheep who only do green washing by just letting it sound like they do that without actually doing it.

Ecosia already tries to sell their chatbot as green, but it only uses OpenAI’s API and they plant trees how they always do. Though I generally don’t like their compensation concept, at least they claim their own servers run 100% renewable energy. I haven’t tried their chatbot(s) yet, but it looks like it’s still only OpenAI. If they do it like duckduckgo at some point in the future, they could run open models on their own servers. Whether they can produce enough energy and get their hands on hardware to get that working etc is a different question though. There isn’t any indication yet that they plan to go that way.

It’s probably already possible to let an EMS start AI training when there is solar overproduction. That’s only worth it when the pace of new break throughs have slowed down or when they use outdated techniques anyways. I dunno where the current balance currently is between electricity prices, hardware cost, energy efficiency of the hardware and time pressure.

EDIT: Sounds like Ecosia is on it for runnning AIs at least https://blog.ecosia.org/what-we-are-doing-with-ai/. They probably push that renewable energy into grid somewhere else than where the AI is consuming it.

concerns China might want to take advantage

I don’t think they’ll say no to cheap energy, but they definitely don’t wanna be dependent on other countries for their energy. As far as I understand they push solar, electric cars etc for energy dependency reasons.

Baŝto@discuss.tchncs.de · 5 days ago

But that’s going to be old and obsolete tech in the world of 2030 and dwarfed by any new tech then.

My point was more the people they replace now they’ll replace indefinitely in the context of “impact on society”

a question about development of AI in general, it’s an entire can of worms

and

So I think the question is more, are they going to continue?

I just ran into https://huggingface.co/briaai/FIBO, which looks interesting in many ways. (At first glance.)

trained exclusively on licensed data

It also only works with JSON inputs. The more we split AIs into modules that can be exchanged, the more we can update pipelines module by module, tweak them…

It’s unlikely that there’ll never be new releases. It’s always interesting for new-comers to gain market penetration and show off.

What part of the overall environment footprint gets attributed to a single user?

It’s possible that there’ll be companies at some point who proudly train their models with renewable energy etc. like it’s already common in other products. It just has to be cheap/accessible enough for them to do that. Though I don’t see that for GPU production anytime soon.

Baŝto@discuss.tchncs.de · 5 days ago

I can’t buy salami in the supermarket and justify it by saying the cow is dead anyways

That’s not comparable. You can’t compare software or even research with a physical object like that. You need a dead cow for salami, if demand increases they have to kill more cows. For these models the training already happened, how many people use it does not matter. It could influence whether or how much they train new models, but there is no direct relation. You can use that forever in it’s current state without any further training being necessary. I’d rather compare that with nazi experiments on human beings. Their human guinea pigs already suffered/died no matter whether you use the research derived from that or not. Doing new and proper training/research to get to a point where improper ones already got is somewhat pointless in this case, you just spend more resources.

Though it makes sense to train new models on public domain and cc0 materials if you want end results that protect you better from getting sued because of copyright violations. There are platforms who banned AI generated graphics because of that.

we still buy the graphics cards from Nvidia and we also set free some CO2 when doing inference

But you don’t have to. I can run small models on my NITRO+ RX 580 with 8 GB VRAM, which I bought 7 years ago. It’s maybe not the best experience, but it certainly “works”. Last time our house used external electricity was 34h ago.

Regarding RAG, I just hope it improves machine readability, which is also useful for non-AI applications. It just increases the pressure.

Baŝto@discuss.tchncs.de · 5 days ago

I’m flip-flopping between running local models on my PC with solar power vs. using OpenAI’s free ChatGPT to drive them into ruin, which most of the time ends with me having stupid a stupid argument with an AI.

impact on society

Local AI will likely have a long lasting impact as it won’t just go away. The companies who released them can go bankrupt, but the models stay. The hardware which runs them will get faster and cheaper over time.

I have some hope with accessibility and making FLOSS development easier/faster. Generative AI can at least quickly generate mockup code or placeholder graphics/code. There are game projects who would release with generated assets, just like for a long time there were game projects who released assets which were modifications or redistribution of assets they didn’t have the rights for. They are probably less likely to get sued over AI generated stuff. It’s unethical but they can replace it with something self-made once the rest is finished.

Theoretically even every user could generate their own assets locally which would be very inefficient, also ethically questionable, but legally fine as they don’t redistribute them.

I like how Tesseract already uses AI for OCR and Firefox for realtime website translations on your device. Though I dunno how much they benefit from advancements in generative AI?

Though a different point/question: At what point is generative AI ethically and legally fine?

If I manage to draw some original style it transfers? But I’m so slow and inefficient with it that I can’t create a large amount of assets that way
When I create the input images myself? But in a minimalist and fast manner

It still learned that style transfer somewhere and will close gaps I leave. But I created the style and what the image depicts. At what point is it fine?

Like coding

I actually use it often to generate shell scripts or small simple python tools. But does it make sense? Sometimes it does work. For very simple logic they tend to get it right. Though writing it myself would probably been faster the last time I used, though at the moment I was too lazy to write it myself. I don’t think I’ve ever really created something usable with it aside from practical shell scripts. Even with ChatGPT it can be an absolute waste of time to explain why the code is broken, didn’t get at all why its implementation lead to a doubled file extension and a scoping error in one function … when I fixed them it actually tried to revert that.

Baŝto@discuss.tchncs.de · 6 days ago

I still vastly prefer Qwen3-30B thinking because it answers pretty fast. The speed was really most interesting thing compared to R1 32B. Now that Ollama supports Vulkan it runs even faster (~ 2/3 CPU & 1/3 GPU).

I use it with Page Assist to search the web via DDG, but it would also support SearXNG.

I have Qwen3-Coder 30B for code generation.

I actually mostly use it with Page Assist as well. I have the Continue plugin installed in VSCodium.

The rest I don’t use as much. I have installed

II search 4B (the goal of it was quick websearches)
pydevmini1 4B (website and code mockups, coding questions in the style of “how do I implement XY”)
Qwen3 4B abliterated (mostly story generation where R1 refused to generate back then; abliteration didn’t seem to impact creative writing that much)

I only have 32GB RAM so I ran those 4B models especially if Firefox and/or other things used to much RAM already. Dunno how much that will change with Vulkan support. It probably will only shift a bit since they can run 100% on my 6GB VRAM GPU now. At least now I can run 4B without checking RAM usage first.

After all all this stuff is nice to run this 100% open source, even when the models aren’t. Especially use them for questions that involve personal information.

I’ve just started to play around with Qwen3-VL 4B since Ollama support was just added the yesterday. It certainly can read my handwriting.

Only other AIs I used recently are:

Translation model integrated into Firefox
Tesseract’s OCR models when I wanted to convert scanned documents into PDFs where I can select and search for text

My hottest take is probably that I hate the use of T for trillion parameters, even though short scale trillion is the same as Terra. I could somewhat live with the B for billion, though it’s already not great. But the larger the numbers become the more ridiculous it gets. I dunno what they’ll use after trillion but it’ll get ugly fast since quadrillion (10¹⁵) and quintillion (10¹⁸) both start with Q. SI-Prefixes have an unambiguous single character for up to quetta (Q; 10³⁰) right now. (Though SI-Prefixes definitively have some old prefixes which break their system of everything >0 having an uppercase single letter: deca, hecto, kilo) Or it’s because it’s an English, but not an international, notation.

Baŝto@discuss.tchncs.de · 20 days ago

granite4:micro-h should be able to run on machines with 4GB RAM

Baŝto@discuss.tchncs.de · 20 days ago

privacy-respecting metasearch integration

Page Assist already can let ollama models search the internet with SearXNG. But even though it often finds what I’m asking for, it’s still lacking. It’s just a one-shot search and it wouldn’t try a different search query if it doesn’t return any helpful messages