What question can I ask ChatGPT, right now, that will reliably produce a factually incorrect, wrong, or false answer?

LoveRainbow@lemmy.world · edit-2 13 hours ago

What question can I ask ChatGPT, right now, that will reliably produce a factually incorrect, wrong, or false answer?

Alfredolin@sopuli.xyz · 5 hours ago

The answers are labeled with “GPT-5 mini”.

LoveRainbow@lemmy.world · 5 hours ago

Interesting, thanks.

It seems the latest and most sophisticated model produces the correct answer at this point.

I am using GPT5.5 on “Thinking” mode.

LoveRainbow@lemmy.world · 5 hours ago

I asked: Explain the difference in reliability and accuracy when it comes to GPT5-mini (on DuckDuckGo) Vs GPT5.5 on thinking mode.

Response:

"GPT-5-mini on DuckDuckGo is likely to be much less reliable than GPT-5.5 in thinking mode for difficult questions, but it may be perfectly adequate for quick, low-stakes tasks.

The key distinction is model size and reasoning depth. OpenAI describes GPT-5 mini as a faster, more cost-efficient version of GPT-5, aimed at well-defined, low-latency, high-volume tasks. That usually means it is optimised for speed and cost, not for the most careful reasoning. GPT-5.5, by contrast, is presented by OpenAI as a stronger reasoning model with better persistence, stronger tool use, and improved reliability on complex work.

For simple tasks, the gap may be small. For example, rewriting a paragraph, summarising a short text, generating a recipe, explaining a familiar concept, or answering a basic factual question will often be handled well by GPT-5-mini. It may even feel preferable because it is faster and less verbose. DuckDuckGo’s own help page says Duck.ai offers GPT-5 mini for free alongside other models, with anonymous access through Duck.ai.

For complex reasoning, the gap is much larger. GPT-5.5 thinking mode should be better when the task involves multi-step reasoning, evaluating evidence, spotting contradictions, doing careful moral/philosophical analysis, debugging code, comparing options, or avoiding subtle factual errors. Thinking mode matters because it allows more internal reasoning effort before answering, whereas mini models are usually designed to produce a good-enough answer quickly.

Accuracy is also affected by the interface. DuckDuckGo’s AI chat gives private access to third-party models, but the exact behaviour depends on DuckDuckGo’s implementation: context limits, system prompts, retrieval/search integration, rate limits, and whether tools are available. DuckDuckGo says Duck.ai anonymises chats and offers several models, but that does not mean its GPT-5-mini behaves identically to the same model in OpenAI’s own environment.

A practical rule:

Use GPT-5-mini on DuckDuckGo for quick, private, low-stakes work: “explain this term”, “rewrite this”, “give me ideas”, “summarise this short passage”, “what are the likely causes of this Windows issue?”

Use GPT-5.5 thinking mode for anything where the cost of being wrong matters: medical interpretation, legal or financial reasoning, political/current-event analysis, philosophical argument construction, coding, troubleshooting, research quality evaluation, investment comparisons, or nuanced ethical judgement.

The mini model is more likely to hallucinate, miss qualifications, over-simplify, or give a plausible but shallow answer. GPT-5.5 thinking mode is more likely to notice ambiguity, correct itself, ask whether assumptions are justified, and give a more robust answer. It can still be wrong, especially on current facts, obscure details, or anything requiring direct source checking, but its error rate should be lower on difficult tasks.

So the simple comparison is: DuckDuckGo GPT-5-mini is “fast, private, convenient, good enough”; GPT-5.5 thinking mode is “slower, deeper, more reliable, better for serious reasoning.”"

Alfredolin@sopuli.xyz · edit-2 4 hours ago

Hei. I know this is your post and all, but nobody asked for this huge AI copy/pasta.

edit: And according to this, it should have worked. My request was a basic factual question.

LoveRainbow@lemmy.world · 1 hour ago

In any case: the latest model gets the answer right 🤷🏻‍♂️