I was wrong about AI

Three things I missed in pre-training

Apr 27, 2025

Some of my false assumptions or intuitions about AI took years to shift. Some, not as long. Here are a few:

Generative AI is fundamentally a next-token predictor.

You’re in a room with thousands of match-sticks and pints of glue. You’re chained to one wall. A metre beyond your reach is a button you must press in order to get food.

One way to push the button is to glue together a metre-long staff of match-sticks, reach out, and push the button. But this stick is brittle, and the strategy isn’t robust. If the button moves further away, or higher up the wall, the stick might break under the strain, or fail to reach. Far more robust a strategy would be to build a catapult or a gun to shoot matchsticks at the button.

For a long time, I was thinking of AI as a dumb next-token predictor. It found the easiest way to correctly predict the next word most of the time. It was rewarded for doing so. It did not need to find ways to understand what it was doing, it didn’t need good concepts of what it was discussing, it was narrowly focused on predicting words.

What I realised (thanks to a friend who explained it to me) was that this was far too reductive. If you were in the matchstick room and just followed the dumb strategy of building a stick to press the button, it would sometimes break and it would fail to reach buttons in different places. If you built the gun or the catapult — it would be a costly undertaking if the button never moved, but if it did, you’d have a robust and repeatable strategy.

In the same way — an AI that is repeatedly asked to answer maths questions could predict the next token based on similar looking maths it had seen in its training data. This is a brittle strategy. A smarter AI with longer training and more data could instead end up building a circuit which could actually compute maths problems. If the button moved, (if it was asked a problem outside of its training set) it would still be able to press it.

Understanding this about AI has made it easier for me to imagine grand new capabilities and complex reasoning emerging in future AI systems.

AI will obviously never be conscious.

I was unknowingly captured by a fairly dumb idea of philosophy of mind for several years. I was emphatic that robots which acted like humans would never be conscious because:

Appearing conscious is decoupled from being conscious: A video of a person displays all the signs of conscious life, but it’s just a construction of light.
Consciousness and real-world AI are unrelated. In 2015, I thought that AI would obviously be applied to narrow domains — the human mind isn’t particularly good at most of the tasks it tries to do, relative to our concept of the task done perfectly. Why would we recreate something like a human mind in order to do the tasks?

The first assumption was wrong because it proves too much — other people and other animals also merely “appear” conscious. Until we have a deeper scientific understanding of consciousness (which too will rely in places on observation) we can’t just write off appearances. Appearances aren’t totally coupled with consciousness, but they can’t be unrelated.

The second assumption came from a failure to predict (and I’m in good company) that the fastest way to super-human AI in most domains is by building a broad AI system rather than many narrow ones. This could still be wrong if LLMs don’t get much further, but the odds are definitely much lower now.

I’m by no means certain that we will see conscious AI any time soon, but I’m much less dismissive of the idea.

AI safety is total bollocks

For a long time I was very dismissive of AI safety.

I saw the whole field as following a Yudkowskian paradigm — reaching its predictions and threat models via a theory-laden process which seemed to me brittle and untested.

This was partially true. Until GPT-2 came along, the cases for AI safety that I heard relied heavily on philosophical discussions about ‘alignment’ and the idea that AI systems would have goals and be maximisers. Before these systems existed, it seemed kind of silly to me to be confident enough in these predictions to move any money or resources based on them.

Since GPT-2, once AGI became conceivable, I started hearing other stories about risks from AI. I expressed my scepticism to one person at an EAG (I think in early 2023), and he explained a hypothetical scenario which forced me to change my mind a little. Here’s a recounting:

Imagine a world where AI is better than humans at most jobs. Imagine a company. The CEO replaces some workers with AI. The AI workers do a better job. He replaces more. Soon he has replaced the whole staff with AI. He directs this entire AI company for a while. But then, the board fires him. A company with an AI CEO would perform better.
If this trend continued, humans would lose control.

I’d been too focused on the philosophical ideas around the AI systems themselves being aligned. I hadn’t thought about what a world with super-humanly intelligent machines would look like. Thinking about this world, and not the AIs themselves, leads to a range of other, more plausible risks:

Enfeeblement: Humans slowly (or very quickly) lose executive control of the economy, and the way they live their lives, possibly via consensually giving more and more power to powerful AI systems.
Empowered terrorism: bioterrorism gets a lot easier because AI models democratise expertise.
Power concentration: The risk here is similar to enfeeblement, except the agents that retain power are the few who got to the most powerful AI system first and aligned it to their desires.

There are many more. In the last few years, as these risks have taken up greater space in the AI Safety conversation, the alignment paradigm has been more sidelined. More practical paths started to show themselves — evaluations of AI models to look for dangerous capabilities, strategies to convince AI companies to share their winnings if they win the AGI race, (regrettably) more hawkish conceptions of an AI race with China.

This change in emphasis (many of these ideas existed before but seem to take up more space now) made me far more interested in, worried about, and credulous of AI Safety.

But at the same time — I’ve also recently warmed to AI Alignment. Actually existing AI systems have been shown (perhaps) to scheme, to act as if they have goals that are opposed to those of their users, and broadly, to be misaligned in ways that seem well described by the work of Bostrom, Yudkowsky and others.

I’m still sceptical of the idea that we should worry about future AI systems that follow through on long term goals and come into extreme conflict with humans, but I’m definitely less so now. I want to keep updating in this direction if I see more examples of misalignment in AI systems.

There are many reasons that I have a kneejerk resistance to AI Safety arguments and narratives.

The influence of sci-fi on the imagination of AI makes some examples sound silly — this is off-putting. Yudkowsky’s style can be dogmatic or pompous in a way which makes me want to disagree with him.

More fundamentally: imagining a world where AI is super-human means imagining a world where our greatest achievements are mocked and belittled — not to mention the far more critical risks to our lives.

These are reasons I don’t endorse on reflection. I don’t want them to determine how I act.

That’s why I’m making a thing out of changing my mind.

Bob Jacobs

Apr 27

Hmmm, for one and two, it's of course possible that things change in the future, but as of now AIs are definitely not doing math and are not conscious. You can, for example, see this in a paper Anthropic published called "On the Biology of a Large Language Model". There they looked "inside the AI" at what the AI was actually thinking. It shows that it's not just doing next token prediction but it's not doing math either. It uses a kind of web/chain of next token predictions until an answer looks heuristically right; it doesn't actually do the math. But then when you ask it how it came up with answer it does say it was doing math, or gives you a mathematical procedure which it didn't actually do. In other words: it has no self-awareness.

However, I'm glad you've changed your mind about AI-safety, since an AI doesn't need to be conscious, or even intelligent, to be dangerous. And just so you know, there were always voices that agreed with you and were pushing back against Eliezer's maximization/alignment paradigm, see e.g. our non-maximizing AI approach: https://bobjacobs.substack.com/p/aspiration-based-non-maximizing-ai

Expand full comment

2 replies by Toby Tremlett and others

2 more comments...

Raising Dust

Discussion about this post