AI has been exploding across the world and across many industries.
The Public Relations and Communications sector is no exception. PR companies such as Golin and Stagwell have appointed “Chief AI Officers (CAIO)” who create competitive advantage using AI and even position the agency to use AI for their operations–rather than merely making policies on it. In parallel, the world has been witnessing an AI focused boom. The AI driven IT (hardware) industry is blasting off like a rocket (e.g., NVIDIA), more investment is flowing into AI (Fortune Business Insights projects $ 200 billion investments in 2025, which is half the investment value in 2024 for the global aerospace industry), and a significant market size (Fortune Business Insights project $1.77 trillion by 2032).
All in all, a sweet outlook…until you hear Apple talking.
A recent paper by Apple has sent shockwaves through the global AI industry. Apple’s paper titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity” (authored by Parshin Shojaee, Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Samy Bengio, and Mehrdad Farajtabar, published in June 2025) has had bad news for AI.
The paper showed that large reasoning models (LRMs, more advanced than LLMs) collapsed when “facing harder problems.” The paper revealed that even the most advanced AI reasoning models “collapsed entirely” when “facing harder problems”!
The authors found OpenAI’s o1 and o3 models, DeepSeek R1, Anthropic’s Claude 3.7 Sonnet, and Google’s Gemini “efficiently handled low complexity tasks” but beyond a certain level of complexity, “they began to return incorrect answers.”
“Apple’s research paper seriously rains on the parade of the world’s most prominent AI developers, most of whom have spent the past nine months shouting from the rooftops about the potential for reasoning models.” (By Rory Bathgate in itpro.com)
Let us be clear: This does not suggest that OpenAI’s o1 and o3 models, DeepSeek R1, Anthropic’s Claude 3.7 Sonnet, and Google’s Gemini are “complete failures.” What was found was that these models have “major flaws” in handling their reasoning. The models tend to
- “Overthink” on easy problems, and
- Crash completely on complex ones!
In that, these models can ‘memorise’ -but ‘cannot think.’
“The (Apple) paper serves as yet another good reminder that none of these models are infallible, regardless of how advanced they claim to be or even how they perform on benchmarks..”- (Sabrina Ortiz, Senior Editor, zdnet.com- “What Apple’s controversial research paper really tells us about LLMs”)
Apple’s findings are still being ‘digested’. If anything, the general perception that OpenAI’s o1 and o3 models, DeepSeek R1, Anthropic’s Claude 3.7 Sonnet, and Google’s Gemini, etc., are what “we think to be AIs” needs serious course correction. They are (while being unbelievably efficient) just one illustration of what AI looks like.
The real AI, AGI (Artificial General Intelligence), is not here yet.


