Washington State University (WSU) recently released a study revealing that although ChatGPT's responses are confident in tone, it performs more like "random guessing" when dealing with complex scientific statements. The study points out that the model not only has limited accuracy but also often provides contradictory answers to the same question.
A team led by Professor Mesut Cicek extracted 719 research hypotheses from business journals since 2021 and repeatedly submitted them to the model for truth verification:
Although ChatGPT's surface-level accuracy is around 80%, after removing the factor of random guessing, its actual performance was only about 60% higher than a 50% "coin flip" probability. Researchers evaluated it as a "low D-grade score." The model performed extremely poorly in identifying false statements, with a correct judgment rate of only 16.4% for "false propositions."
The researchers submitted each hypothesis to the model 10 times and found that the model struggled to maintain consistent positions:
Answers fluctuate: In about 73% of cases, the model maintained consistent conclusions across 10 repeated questions.
Extreme contradictions: In some cases, the model would alternate between "true" and "false" answers, even showing extreme situations where half of the answers were true and the other half were false, despite using the exact same prompt.
The study points out that users are easily misled by AI's fluent and persuasive language, but this does not mean it has real reasoning ability:
Lack of real "brain": The model essentially performs memory and pattern matching, unlike humans who truly understand the world or know what they are saying.
Limited progress in versions: Testing showed that the updated version of ChatGPT-5 mini tested in 2025 performed similarly to earlier versions on this specific task, without showing significant improvements.
Based on the study results, Cicek advises business managers to maintain high skepticism when making complex decisions: they should not view generative AI as an "authority" that can replace professional judgment, and must manually verify all output results. Organizations should enhance training to help employees understand the advantages and limitations of AI tools, avoiding decision biases caused by blind trust.
This study once again reminds the public that, in the context of rapid AI technological iteration, its deep logical judgment and evidence weighing capabilities still need improvement.
