Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

  • queerlilhayseed@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 hours ago

    P.S. This is a hypothesis, I haven’t even designed the test for it, much less run it. What follow are my suppositions.

    I think whether or not it’s a good idea depends on how similar all the models are. I don’t have a rigorous definition of “similar” but things like similar training data, similar design methodologies, similar QA processes would all contribute. Theoretically (I think), if they’re all dissimilar, they should each catch errors the others miss. However, the more similar they are, the more likely they have the same biases and weak spots, and your error rate from a response + verification may be the same or even higher than the error rate for just the original prompt, and you’d be unlikely to detect those errors using just two similar models. It can instill false confidence in the results because you’re doing something that should in theory increase the validity of the data, but in practice might make no difference or even make the quality of responses worse.