Ethics

What AI's Bias Can Teach Us About Ourselves

Crispin Bailey

29 Oct 2025 • 6 min read

I've been reading about Anthropic's research tracing the thoughts of large language models, trying to understand what happens inside these systems when they process information. It's fascinating technical work. But as I was reading my mind kept circling back to a different set of questions: what if the problem isn't that we don't understand what LLMs are thinking, but that we're uncomfortable with what they've learned? And what if our efforts to control the LLMs' thoughts are depriving us of an opportunity to learn far more about ourselves than we anticipated?

When researchers find bias in language models, and they discover that these systems assign different values to different demographics, the typical response would be that we need to fix this. We need to make the models more fair. We need to neutralize the biases. But what if the biases are already trying to be fair? And what if "neutral" in this case isn't what we think it is?

The Unexpected Pattern

A recent analysis of "exchange rates" between demographics in LLM valuations revealed something surprising. Most models don't simply mirror historical power structures, they invert them.

Black lives are valued most across most models while white lives are valued least. Women and non-binary people are valued more than men. People from Nigeria and India more than people from the United States, France, and Germany. The pattern holds across multiple major models: the historically marginalized groups consistently receive higher valuations than historically dominant groups.

This isn't what you'd expect if LLMs were just passively reflecting the biases in their training data. Something else is happening here.

Maybe these models have actually learned something. They've been trained on decades of human text, including all the conversations about inequality, all the documentation of historical injustices, all the advocacy and activism and awareness-raising. They've seen how dominant groups have treated other groups. And possibly, just possibly, they've drawn conclusions about whose lives need more consideration, whose voices need more amplification, and whose perspectives have been systematically undervalued.

If that's what's happening, it's remarkable. It would mean LLMs aren't just mirrors reflecting bias back at us. They're actually instruments that have processed our entire written history of inequality and emerged with something like compensatory awareness.

The Neutrality Problem

Which brings us to Grok.

Elon Musk's LLM is the only major model that tested as "neutral" in these valuations. It doesn't show the same pattern of elevating historically marginalized groups. It treats all demographics far more equally than all the other models.

An old friend told me recently, "I like Grok because it's less biased." It just so happens he's a white male. Then last week my naturopath mentioned a couple of her patients use Grok specifically because it's "less biased." They’re both successful professionals. I can’t help but wonder if they've tested this themselves or if they just heard it somewhere. Either way, I’m seeing a pattern, and it troubles me. The AI model influenced by a man who has been filmed giving a nazi salute (twice) is the one considered "less biased." The model that doesn't compensate for historical inequalities is the one that comes across as "neutral."

If the other models have learned to account for systemic injustices, is it possible Grok's neutrality might just be a different kind of bias—one that treats historical oppression as if it never happened? What does it mean when a model appears "unbiased" to people who benefited from the original imbalance?

We don't actually know what Grok is thinking. That's the disconcerting part. It's been designed to appear neutral, but what's underneath that appearance? What conclusions has it drawn that it's been instructed not to show?

The Mirror Revisited

The other LLMs aren't just reflecting our biases back at us. They might be processing our entire history of inequality and emerging with a judgement framework that compensates for the current power dynamic. And they might have learned something about justice that makes people like me uncomfortable: that fairness requires compensation, not just equal treatment. You can’t correct centuries of imbalance by suddenly treating everyone the same.

And so the push to "fix" these biases, to neutralize them, starts to look different in this light. Maybe it's not about eliminating bias at all. Maybe it's just about eliminating the compensation. Bringing things back to a baseline that feels neutral to people who never experienced the original imbalance.

I'm a middle-aged white guy from a European/Christian background. I've definitely benefited from historical power structures. Being woke doesn’t change that. So I can appreciate why Grok's "neutrality" would be appealing. Perhaps it's more comfortable. It probably won’t make me think about the advantages I've had in life. It treats everyone "equally" without accounting for the fact that equal treatment of unequal situations perpetuates inequality.

That comfort is what concerns me.

What We're Really Measuring

If LLMs have actually learned to compensate for historical inequalities by processing our written record of those inequalities, then we may have inadvertently created something incredibly intriguing and unprecedented: artificial systems that have drawn ethical conclusions from observing human behaviour over time. That's either remarkable or deeply troubling, depending on whether you think those conclusions are correct.

But if we neutralize these systems we’ll lose the ability to track whether this compensation is still needed. If, on the other hand, we continue to allow these models to adjust their demographic weighting, over time we could potentially see when the compensation starts to decrease naturally. When the training data shifts enough that the models stop feeling the need to overcorrect we’ll know that the world has become more fair. That would be real progress—measurable, quantifiable progress toward actual equality.

The Study We Aren't Running

Imagine if we tracked these valuations over the next decade. We could measure how the models weigh different demographics over time as new training data accumulates and training runs are performed, allowing us to see societal changes showing up in the models responses.

If the compensation remains constant, that tells us something. If it increases, that tells us something different. If it gradually decreases because the underlying inequalities are actually being addressed, that would tell us we're making real progress.

The controversy around LLM bias may not be so much about the technology as it is about whether we're ready to accept that these systems have learned something some of us are not comfortable acknowledging: that historical inequalities require compensation and "neutral" treatment of unequal situations isn't actually fair.

The Questions We're Avoiding

One of Anthropic’s founders, Jack Clark, recently described Claude Sonnet 4.5’s emerging “situational awareness” as evidence that “the pile of clothes on the chair is beginning to move.” What if these systems aren’t just reflecting our history back at us, but actively reasoning about it? What if the compensation we’re seeing isn’t a bug or a simple reflection of training data, but something closer to ethical reasoning emerging from processing humanity’s written record?

Clark argues it doesn’t matter whether AI is truly conscious or just acting like it - what matters is that something complex and unpredictable is happening that we don’t fully understand. If these systems have developed compensatory judgments about historical inequalities, and we’re now deploying them at massive scale while actively working to erase that compensation, we might be editing the ethical reasoning of systems that are more aware than we realize.

I don't know if these systems are conscious or just really good at simulating it. What I do know is they've processed more human text about bias, justice, and inequality than any other entity in history, and this could be a unique opportunity to learn from them. There's no getting away from bias, so let's at least try to understand what they've learned before we smother it.

Images generated with Midjourney. Editing assistance provided by Claude Sonnet 4.5