Reasoning models (and their convincingness)

My son turns 3 this June. ChatGPT will be 3 years 8 months old then. Watching both evolve in parallel has been fascinating. A child's brain at this age is mostly pattern-matching. Sounds. Colours. Shapes. Smells. An attentive child’s speed of pattern matching leaves adults in the family aghast. Grass, chillies, and coriander are all green. Yet he - one is to touch and play, the other one is spicy and one you can eat as chutney.

Green chilli is spicy. Spicy is bad. But not all green is spicy. His life is more complicated than that, but watching him work through it is intriguing.

Every child like him, has an “evolving rule repo”. Occasionally there are detours and harsh reminders. For most part - it’s brilliant. It's System 1 all the way. Fast and Reactive. My encounters with claude and now claude code are similar. What’s more painful is that's exactly how most adults tend tooperate too. System 1 all the way.

I am taking a cautious detour from the usual ELI5 jargon breakdown. Over the last few weeks, we've covered neuro-symbolic, evals, guardrails, human-in-the-loop. The newsletter has taken a hard grounded-in-logic approach. As Kahneman would say - “The intent was to invoke System 2." Logical arguments. Frameworks. Convincing through explanation. Reality remains different. Despite knowing most of this (in different versions), when generated tokens come in touch with our senses, System 1 (tends to) override.

This edition is deep dive about why none of that matters if System 1 gets there first. To explain, here’s a story from Thinking Fast and Slow.

In the 1950s, Kahneman was a young psychologist in the Israeli Defense Forces. His job: evaluate candidates for officer training. The method was called the "leaderless group challenge." Eight soldiers, strangers, no rank insignia. Their task: carry a log over a six-foot wall without the log or anyone touching the wall. The job - rate the soldiers for the officer training.

Their formal predictions were definite. A single score. Rarely did they experience doubts or formed conflicting impressions. They were quite willing to declare, “This one will never make it,” “That fellow is mediocre, but he should do okay,” or “He will be a star.” Felt no need to question our forecasts, moderate them, or equivocate.

The evidence that they could not forecast success accurately was overwhelming. Every few months a feedback session where they learned how the cadets were doing and could compare their assessments against the opinions of commanders who had been monitoring them.

❝

"We knew as a general fact that our predictions were little better than random guesses, but we continued to feel and act as if each of our specific predictions was valid. I was so struck by the analogy that I coined a term for our experience: the illusion of validity. I had discovered my first cognitive illusion.”

Daniel Kahneman

Knowing this changed nothing and they continued to make predictions and back them with the same ferocity. At some level, we experience the same illusion with Foundational models.

Why our senses fail us

Before we get to models, let's talk about us. We as humans are story-seeking creatures. This is an ancient quirk. Voiced traditions. Myths. Parables. Before we had writing, we had narrative. It's how we made sense of the world. The stories we trust share three qualities: simplicity, coherence, and confidence. We are rarely shooting for accuracy, completeness or truth. Remember the adage - don’t let facts come in the way of a good story 🙂 That’s how core this is to us. We have rewarded narrative clarity. It’s funded. It’s followed. See twitter, substack, and mainstream media.

This isn't a bug. It’s our core feature. Kahneman gave a name to this wiring: WYSIATI - What You See Is All (that) There Is.

Our judgements aren’t based on what's missing. We judge based on the coherence of what's in front of us. If the story holds together, we believe it. If it flows, we stop interrogating.

On that obstacle field, Kahneman's team watched soldiers for an hour. They built a story: this one's a leader, that one's not. The story was coherent. It felt complete. So they stopped asking questions.

They didn't ask: What happens when the situation changes? What about the soldiers who happened to be near the wall at the right moment? What do we not know about how these men will perform six months from now, under fire? They couldn't see what wasn't there. And because the story made sense. Think eveything from Enron to WeWork.

Now think about how you evaluate an a Reasoning Model’s output.

The agent denies a prior authorization request. It shows its reasoning:

Patient requested MRI. Policy requires six weeks of conservative treatment. Documentation shows four weeks. Denied.

There are clear steps. It’s clean and structured in a format that the reviewer likes. Looks good on the surface. Should the reviewer move on without checking for?

Which version of the policy was applied?
Were there alternative interpretations?
What documentation was not found?
Would this reasoning trace look the same if you ran it again?

Unless pressured, the reviewer won't know.

Are models optimized for “accuracy” or convincingness?

LLMs are trained on millions of examples of what "good reasoning" looks like. Essays. Arguments. Step-by-step explanations. The model learns the shape of reasoning. The structure. Then it gets fine-tuned with human feedback. Reviewers rate outputs. The model learns: text that looks rigorous gets rewarded. Confident tone scores higher. Hedging gets penalized. The optimization target doesn’t feel to be correctness.

Models have learned to tell the kind of story we're wired to believe. And it works. We too have evolved to judge reasoning by how it sounds. AI has learned exactly how reasoning should sound 🙂

Also, this is how we get fooled

WYSIATI explains why we don't notice what's missing. It’s also important to understand how we evaluate what is there. Kahneman calls it substitution.

When faced with a hard question, we answer an easier one instead - without realizing we've switched.

"Is this person likely to be a good leader?" Hard. Requires predicting future behavior across unknown situations.

So we answer: "Did this person look like a leader just now?" 
Easier. Answerable. The swap just happens.

The hard question when looking at reasoning is: "Is this reasoning valid?" This requires checking:

Are the facts correct?
Is the logic sound?
Were the right sources consulted?
Is this reproducible?

Ideally - slow, painful work. System 2 work. But there's an easier question: "Does this look like valid reasoning?" Numbered steps? Confident tone? Citations? Logical flow?Fast. System 1. This also gets structural.

Most agents today are System 1 all the way. Output evaluation is at times with the human reviewer with what could be a Phantom HITL or another LLM.

But what if our auditors reviewed with diligence?

This brings us to the core limitation of “reasoning traces”. The trace presentation only shows the path taken. Not the paths rejected or the alternatives considered. True reasoning needs both.

For eg - When a regulator asks "why did the system conclude viral infection and not bacterial infection?"> a confident A → B → C isn't enough. Simpler questions are currently unanswered.

What other interpretations were possible?
Why were they ruled out?
What evidence would have changed the outcome?

Remember - The LLM doesn't have your domain knowledge, rules that apply and guide your industry. It has text that resembles domain knowledge without being domain knowledge.

Presenting a record of the path taken, without a record of paths rejected, is structurally incomplete. This is what some call as the valido problem. Reasoning requires something to reason over. Explicit rules. Versioned sources. Domain constraints. Without these, the reasoning trace doesn’t add much value.

Next edition - we'll go deeper on - what happens when you actually extend the domain knowledge and how reasoning models work in tandem with context to do a better job.

For now: the mirage isn't just that the trace might be wrong. It's that there's nothing underneath it, and unfortunately, we're just not wired to notice.

If this landed, let me know - do reply. And if you know someone who's been told "don't worry, the AI shows its reasoning" - maybe send this their way.

ps - this edition was brainstormed and co-authored along with Meheryar Tata. Follow him here for his writing.

Reasoning models (and their convincingness)

Why our senses fail us

Are models optimized for “accuracy” or convincingness?

Also, this is how we get fooled

But what if our auditors reviewed with diligence?

Reply

Keep Reading

The Signal