@lily33

lily33@lemm.ee · edit-2 2 days ago

What makes these consumer-oriented models different is that that rather than being trained on raw data, they are trained on synthetic data from pre-existing models. That’s what the “Qwen” or “Llama” parts mean in the name. The 7B model is trained on synthetic data produced by Qwen, so it is effectively a compressed version of Qen. However, neither Qwen nor Llama can “reason,” they do not have an internal monologue.

You got that backwards. They’re other models - qwen or llama - fine-tuned on synthetic data generated by Deepseek-R1. Specifically, reasoning data, so that they can learn some of its reasoning ability.

But the base model - and so the base capability there - is that of the corresponding qwen or llama model. Calling them “Deepseek-R1-something” doesn’t change what they fundamentally are, it’s just marketing.

lily33@lemm.ee · 2 days ago

There are already other providers like Deepinfra offering DeepSeek. So while the the average person (like me) couldn’t run it themselves, they do have alternative options.

lily33@lemm.ee · 2 days ago

A server grade CPU with a lot of RAM and memory bandwidth would work reasonable well, and cost “only” ~$10k rather than 100k+…

lily33@lemm.ee · 2 days ago

To be fair, most people can’t actually self-host Deepseek, but there already are other providers offering API access to it.