It argues that AI models, no matter how brilliant they may seem, do not understand what they are doing.
They do not solve problems. They do not reason. They merely generate text word by word, trying to sound coherent.
Apple tested the most advanced reasoning models in the world on controlled puzzle environments. They tore open the internal “thinking” traces.
What they found shatters the narrative that we are getting closer to AGI.
Current models don’t scale with complexity. They have a hard mathematical cliff. And they do not degrade gracefully. They collapse.
But here is the most unsettling part.
When a problem gets too complex, the AI doesn’t use its remaining compute to try harder.
——
The Paper.