openrouter rankings for programming tokens show sharp rise in open models and stagnation of US frontier models

humanspiral@lemmy.ca · 3 days ago

openrouter rankings for programming tokens show sharp rise in open models and stagnation of US frontier models

SuspciousCarrot78@lemmy.world · edit-2 3 days ago

I really like Claude, but the way that it chews thru tokens def cements it as a “rich man’s” AI. Codex surprised me at how capable it is vs how much (little) it costs to operate. Previously, I’d been trying to use ChatGPT + web + project containers…with really sub-par refactoring results.

Tbf, I’ve only really used Claude Opus 4.5 and GPT Codex5.3 for code, so pardon my ignorance.

How well do open weight models like Kimi et al stack up? Can I call them via VsCodium to reason over local mirror of files on my repo? I’m hardware bound with limited compute. I’ve played around a bit with Open Router before, so have passing familiarity with things like TNG Deepseek R1T2, mimo-v2-flash etc.

humanspiral@lemmy.ca · 2 days ago

opencode is well worth having. It has a better priced Zen gateway that is limited to top models, but priced as you go, and can point to same folder/container as your other tools. Access to openrouter is useful, if only that some models are free. Antigravity is good to have for generous use of gemini. If VsCodium can’t access open models, then other tools can work on same project, and you just reload files they change.

Many open models at 1/10th the cost or lower, are far better than 1/10th of opus 4.6. The popularity reflects much better value. They are especially better if not doing python/js, but functional programming, even if all models are generally bad so far. agents/skills (opencode/antigravity) for models that are strong at instruction following and polyglot software (minimax pretty impressive) actually scored better than raw opus 4.6 on my benchmark, and investing in skills/agents means promise for improving whatever model is released next week.

pkjqpg1h@lemmy.zip · 2 days ago

GLM-5 and Kimi-K2.5 is really good.

ArtificialAnalysis Intellegence vs Cost

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Woof - the axes on that chart LOL. Suffice it to say, they’re all pretty dang close. Interesting. Maybe the easter bunny can bring me something with >8GB VRAM so I can actually run em locally. I’m guessing Kimi-2 eats about what…500GB+ for 128K context?

pkjqpg1h@lemmy.zip · 2 days ago

The real reason is LLMs are still using the same architecture and there is no breakthrough at the end of the day their intelligence will become so close to each other, when this happens they will have to decrease the prices to compete with open-weight models and even with these prices they don’t generate revenue so instead of just scaling they will have to focus on optimization and innovation

openrouter rankings for programming tokens show sharp rise in open models and stagnation of US frontier models

openrouter rankings for programming tokens show sharp rise in open models and stagnation of US frontier models

LLM Rankings | OpenRouter