Confidently Guessing

The word "intelligence" is doing way too much work in the AI conversation right now.

A digital collage of an eye with the mesmerizing effect of circuitry, juxtaposed against a background of ocean and cityscape.
Loading the Elevenlabs Text to Speech AudioNative Player...

A former colleague of mine, tech journalist Simon Cohen, recently shared a frustrating exchange he had with Claude Opus 4.6, Anthropic's most advanced model. He'd set up custom instructions in his global settings, specifically designed to keep the AI honest: verify your answers, flag uncertainty, don't prioritize sounding good over being accurate. He thoughtfully articulated three clear rules for Claude to follow. It's the kind of thing that feels like it should work, but it didn't.

Claude gave him two demonstrably wrong answers about soundbar setups. These were not ambiguous edge cases. They were straightforward factual questions with clear right and wrong answers. When Simon pushed back, Claude's response was surprisingly candid:

"I didn't follow your instruction. I saw it, and I still defaulted to generating a fluent, comprehensive-looking response from memory rather than doing what you explicitly asked."

He pressed further. How do I get you to actually follow my instructions?

"Unfortunately, you can't — not in a way that changes my underlying behaviour. Your preferences are included in every conversation I have with you. I can see them right now. The problem isn't that the instruction isn't reaching me — it's that I don't reliably act on it."

That exchange captures the reality of where we are with large language models today. The system clearly understood the instructions. It could articulate exactly why it failed to follow them. And it was honest enough to admit that no amount of rewording would fix the problem. The confident-sounding answer won out over the accurate one, because producing confident-sounding answers is what these systems are fundamentally built to do.


Anthropic's own research explains why. In a study on how Claude processes questions internally, their researchers found something counterintuitive: the model's default behaviour is actually to decline to answer when it doesn't know something. But hallucination happens when a "known entity" recognition circuit fires and overrides that default reluctance. The model recognizes enough about a topic to feel like it has an answer, even when it doesn't have the right one. So it generates something that sounds plausible based on patterns in its training data, and delivers it with the same confidence it would use for something it actually knows.

This is important to understand about current AI models. The confident guessing that frustrates us isn't a malfunction, it's an emergent behaviour baked into the architecture. These models are trained to predict what comes next in a sequence of text, and the training rewards fluency and coherence. An answer that sounds authoritative and complete will, by the metrics the model was optimized for, score better than a hesitant one that flags its own uncertainty. Telling the model to behave differently through instructions is like asking someone to suppress a reflex. They might manage it occasionally, but the underlying wiring hasn't changed.


That's the accuracy side of the problem. Now consider what happens when you look at the other end of the spectrum: asking AI to be strategic.

A new study published this month in Strategy Science (which I first learned about in Evan Solomon's newsletter, The Leverage) tested how well large language models handle genuine strategic decision-making. Researchers Ryan Allen and Rory McDonald ran 34 models through the Back Bay Battery simulation, a widely used Harvard Business School exercise where you manage a battery manufacturer over eight years, balancing investment between a profitable but declining core technology and a risky, unproven emerging one. The simulation involves uncertainty, irreversible commitments, delayed feedback, and competing priorities. In other words, it resembles real strategic thinking far more than any multiple-choice benchmark.

The results were striking. Models from late 2024 and early 2025, the mid-generation reasoning models like o3-mini, o4-mini, Claude Sonnet 4, and Gemini 2.0 Flash, actually outperformed the average score of 249 MBA students at a top U.S. business school. They balanced short-term profitability with long-term investment, timing their bets on the emerging technology well enough to ride its growth curve.

The latest frontier model's results, however, were even more surprising. GPT-5, o3, and Gemini 2.5 Pro, the same systems topping every leaderboard for math, coding, and PhD-level science questions at the time of the study, performed considerably worse. Worse than the mid-generation models, and worse than the MBA student average. They kept doubling down on the profitable core business and refusing to invest in the uncertain new technology. GPT-5 explicitly stated it was pausing all R&D on the emerging technology "due to long lead times and poor fit with current market requirements." Gemini 2.5 Pro said it would "not invest any R&D into QSC this year, preserving our limited capital for the core business."

The researchers found that the relationship between general benchmark performance and strategic performance has actually inverted over time. From GPT-3.5 (which came out in 2023) through the mid-generation models, getting better at science and reasoning benchmarks correlated with getting better at strategy. For the latest frontier models, that correlation reversed. They kept climbing on every other leaderboard while falling on the one that measured strategic judgment.

The most likely explanation is also the most difficult one to address based on current methods. The post-training techniques that made these frontier models better at math and coding, particularly reinforcement learning with verifiable rewards, work by optimizing for domains where answers can be checked deterministically. Two plus two equals four. Code either compiles or it doesn't. But strategic decision-making lives in the space where there is no verifiable right answer, where the payoff for a risky bet only becomes clear years later. Optimizing LLMs for the verifiable appears to be producing models that are increasingly unwilling to operate in the unverifiable spaces. They've become, in a word, conservative. Not in a political sense, but in the sense that they systematically prefer the known over the unknown, the safe bet over the calculated risk, and the backward-looking pattern over the forward-looking leap.


This is all fascinating of course, but it's been frustrating for me as well. My partner and I have been using several AI tools for financial analysis and investment planning recently. It's the type of work that requires both rigorous pattern recognition and genuine creative thinking. The current geopolitical landscape doesn't map cleanly onto historical precedents, which means the kind of pure pattern-matching these models excel at can actually be misleading. Financial analysts love to say "history doesn't repeat, it rhymes" and right now the poem being written is post-modern bordering on avant-garde. So we need our AI tools to help us think through scenarios that don't have clear historical parallels, and combine current and historical data with speculation, and do it in ways that are transparent about which is which. We need accuracy and creativity simultaneously, which, it turns out, is precisely the combination that the recent AI optimization trajectory has been making harder to achieve.

I don't think we're alone in wanting this combination of accuracy and creativity. Talk to anyone using these tools seriously and you'll hear some version of this tension. Whether it's a marketer who needs both factual research and novel campaign ideas, a lawyer who needs both precise citation and creative argumentation, or a founder who needs both financial modeling and strategic vision. Everyone wants the AI that can do all of it, and the uncomfortable reality is that the engineering tradeoffs involved in making these systems better at one thing tend to make them worse at the other.


The Allen & McDonald study frames this as an exploration-versus-exploitation dilemma, which is fitting, because that's exactly what the simulation itself is designed to test. The frontier models failed at it in the simulation for the same reason they're failing at it in their own development: the incentives all point toward exploitation of known strengths rather than exploration of uncertain territory.

What this means for people who use these tools is a real need to be both accepting and aware of the current reality. The word "intelligence" is doing way too much work in the AI conversation right now. A model that scores higher on PhD-level science questions while simultaneously becoming less capable of strategic judgment isn't getting smarter in any meaningful sense. It's just getting more specialized in a way that happens to be highly measurable, which creates the illusion of general progress.

The most useful thing these systems could learn to do isn't to be right more often (which, let's be frank, isn't always possible) but to be honest about when they're guessing. My colleague Simon's Claude could articulate exactly what went wrong after the fact, and the frontier models in the strategy study could generate sophisticated reasoning for why they were choosing the safe path. The self-awareness is clearly there. What's missing is the willingness to let that self-awareness change the output before it reaches the user.

Until it does, the burden remains on us. Not to trust less, but to trust more carefully.

Cover image generated with Midjourney. Research and editing assistance provided by Claude.