Delving into the Minds of Machines: My Adventures with Reasoning Models.

Ever felt like you're witnessing the dawn of a new era? That's precisely how I feel right now, immersed in the rapidly evolving world of AI reasoning models. It's like watching a sci-fi film unfold in real-time, and it's incredibly exciting. I wanted to share my recent experiences exploring these models, what I've learned, and why I think they're such a big deal. We're on the cusp of something significant here, a fundamental shift in how we interact with and utilise intelligence – both human and artificial.

This post will dive into the fascinating realm of reasoning models, exploring their different "flavours," the techniques used to train them, and how they are changing the way we think about intelligence and problem-solving. I'll be sharing my personal observations, comparing different models, and discussing the implications of their increasingly accessible cost.

## Understanding the "Flavours" of Reasoning

One of the first things that struck me when diving into this world is that not all reasoning models are created equal. They have distinct personalities, strengths, and weaknesses. Think of it like different cuisines – each has its own unique blend of ingredients and preparation methods, resulting in vastly different experiences.

The core of these models lies in something called "reasoning training." This involves feeding them massive amounts of data, focusing on areas like maths and coding. Then, techniques like reinforcement learning (RL) are used to refine their abilities. But it doesn't stop there.

### Beyond the Basics: Post-Reasoning Training
What really fascinated me was learning about the "post-training" phase. The DeepSeek paper, which I found particularly insightful, describes how they used techniques like instruction tuning and reinforcement learning from human feedback (RHF), but with a strong emphasis on maths. It’s like taking a well-trained athlete and then giving them specialised coaching to excel in a particular sport.

This post-training is crucial because it makes these models easier for us to use. It’s like adding the user interface to a powerful engine. The models we're interacting with, like the recently released '03 mini' and '01', have gone through these steps, making them more intuitive and aligned with our preferences.

### The Gemini Flash "Flavour": A Different Approach?

One model that's been getting less attention than it deserves, in my opinion, is Google's Gemini Flash. It feels different. Less expressive, perhaps, than '01', but potentially more focused. My initial impression is that Gemini Flash might have taken a different path. Instead of starting from scratch with a massive reasoning training stack, it seems like they might have added reasoning capabilities to an existing training stack.

I remember using earlier models, like one released last autumn, that felt very "on rails" – brilliant at maths and coding, but limited in other areas. '01', on the other hand, feels much more versatile. It can tackle a wide range of tasks, even if it's not perfect at everything. This highlights the artistry involved in crafting these models – finding the right balance between specialisation and general-purpose capabilities.

## My Personal Experiment: A Philosophical Challenge

To really get a feel for these models, I decided to put them to the test with a somewhat unusual challenge. I asked them: "Give one truly novel insight about humans." I wanted to see how they handled a philosophical, open-ended question, focusing particularly on their ability to generate something truly original.

### The Self-Domesticated Ape

One response, from Gemini 2.0 Flash Thinking, really stood out. It described humans as "self-domesticated apes," arguing that this self-domestication is key to understanding our unique cognitive and social abilities. What I loved was the chain of thought it displayed. It considered the evolution of life, apex predators, and how we ended up where we are. This idea of domestication by choice – it was a genuinely fresh perspective.

### Shared Hallucinations as Social Fuel

Another model, DeepSeek's 'R1', offered a different but equally compelling insight. It suggested that humans convert selfish desires into cooperative systems by collectively pretending abstract rules (like money, laws, and rights) are real. These "shared hallucinations," it argued, act as games where competition is secretly redirected to benefit the group, turning conflict into society's fuel. Beautifully put, I thought.

And then there was the consistenly high performance of O1 Pro. It delivered gem after gem. One example, "Humans are the only species that turns raw materials into symbolic resources then uses those symbols to reorganize the very materials they came from, creating a closed feedback loop between meaning and matter."

### Observing the Thought Process: Chain of Thought

What I found particularly fascinating was observing the "chain of thought" – the step-by-step reasoning process that some models display. It's like getting a glimpse into the inner workings of an intelligent mind. It's non-linear, a bit like reading James Joyce, and genuinely beautiful to witness. One model described this in a way that I loved; it was like looking at the raw thought process of another intelligence.

## Beyond Single Responses: The Power of Parallel Processing

Another crucial aspect of these advanced models is their ability to run multiple "chains of thought" in parallel. This isn't just about asking one question and getting one answer. It's about exploring numerous possibilities simultaneously and then selecting the best one.

This approach, sometimes referred to as Monte Carlo tree search, has been used in systems like AlphaZero. It's about expanding the possibilities at different points in the reasoning process and then choosing the most promising path. It's a bit like having a brainstorming session with multiple experts and then synthesising the best ideas.

The concept of "chain of thought" itself comes from a paper published a few years ago. The original idea was to prompt a language model to "verify step by step," encouraging it to produce a bulleted list of steps. Now, chain of thought is almost a default behaviour. You don't even need to explicitly ask for it.

## The Declining Cost of Intelligence: A Game Changer

One of the most exciting developments, and one that often gets overlooked, is the rapidly decreasing cost of running these models. This is a *huge* deal.

I remember seeing a chart showing the cost of running inference on a model similar to the one released a couple of years back. The price had plummeted by something like 1200 times! That's a mind-boggling reduction. We're talking about going from something prohibitively expensive to something remarkably affordable.

This trend isn't just about making things cheaper; it's about unlocking potential. It means that we can explore more complex inference methods, like running thousands of chains of thought in parallel. It means that the incredible capabilities of these models will become increasingly accessible, not just to large corporations, but to smaller businesses and even individuals.

### The Bitter Lesson and the Future

There's a concept in AI called the "bitter lesson," which essentially says that general methods that leverage computation tend to win out in the long run. The rapid progress we're seeing in reasoning models, combined with the declining cost of computation, suggests that we're on the cusp of a significant leap forward.
It's a combination of factors: architectural innovations, better data, improved training techniques, and more efficient hardware. All of these are driving down costs and, crucially, increasing capabilities.

## Conclusion: Embracing the AI Revolution

My journey into the world of reasoning models has been eye-opening. It's shown me the incredible potential of these technologies, the creativity involved in crafting them, and the exciting possibilities that lie ahead.
The rapid pace of development, coupled with the decreasing costs, suggests that we're on the verge of a truly transformative era.
It's not just about building smarter machines; it's about augmenting our own capabilities, unlocking new insights, and tackling challenges in ways we couldn't have imagined before.

So, what's next? I think the key is to continue exploring, experimenting, and pushing the boundaries of what's possible. The more we understand these models, the better we can harness their power for good.
The future isn't just coming; it's being built, one chain of thought at a time. And I, for one, am incredibly excited to be a part of it. What new perspectives have you discovered in your own explorations of AI? I am keen to share my discoveries with others.