• ShimitarA
    link
    fedilink
    English
    arrow-up
    11
    ·
    1 day ago

    Pretty sure they don’t train models on their own code as they don’t want to make their own intellectual property public. As well, they don’t vibe code …

    • VibeSurgeon@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      23 hours ago

      As well, they don’t vibe code …

      They do, and pretty heavily at that. It’s well known that Claude Code is written mostly by Claude Code - and you can tell from the quality of the tool, as well as Anthropics general uptime numbers

  • AmbitiousProcess (they/them)@piefed.social
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 day ago

    Most AI models at this point won’t see significant gains from training on such a small sample of code.

    You don’t need a whole corporation’s code to make a functional model, you need the whole world’s.

    Adding a tiny bit of your own company’s code to the mix doesn’t really do anything to change the model much, so they generally won’t do it for that reason. Tons of training costs, the only benefit is that the model is very very very slightly fine tuned to kinda sorta produce code that’s maybe possibly a little more stylistically similar to yours.

    • treadful@lemmy.zipOP
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      3
      ·
      1 day ago

      We’re talking about huge companies with unfathomably huge codebases written by tens of thousands of people. They control significant chunks of the world’s code. It would be stupid not to at least include it in an internal model.

      • fubbernuckin@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        1 day ago

        As big as some individual corporations are, the world (including every other massive corporation) is much bigger.

  • thisisbutaname@discuss.tchncs.de
    link
    fedilink
    arrow-up
    2
    ·
    1 day ago

    I’m guessing that, if they do, it’s done as a fine tune of a general model so it’s output is more in line with the style and conventions of their codebase. And it’s meant for internal use only, not for the general public.