Ariel Cohen

The Self-Improving Machine

Why the real race in AI isn't about better chatbots — it's about the loop that builds the next loop

There is a question I hear often, phrased in different ways but always pointing at the same thing: what are the big AI labs actually building toward? People speculate about revenue models, product roadmaps, competitive moats. But I think the honest answer is simpler and more consequential than most people are willing to say out loud.

They are building AI that can research and improve itself. Everything else — the chatbots, the coding assistants, the enterprise contracts — is scaffolding around that central ambition.

The Loop That Changes Everything

Strip away the papers and the prestige, and scientific research is a remarkably simple loop: form a hypothesis → write code → train → test → extract insight → repeat. The magic isn't in any single step. It's in the iteration rate, and in who — or what — is running the loop.

In AI specifically, we still don't fully understand why any of it works. We try things, we measure them, we ship what succeeds. The field advances empirically, not theoretically. That's not a bug; it's just the nature of the science at this stage.

Now imagine that loop running inside a model that has compressed the entirety of human scientific knowledge into its weights — every paper, every failed experiment, every pattern ever documented. I believe models can already run this loop better than we can in many respects. Faster. Tireless. Without ego about being wrong.

This is why the large labs are acquiring GPU capacity at a scale that makes no sense if you think they're building a better chatbot. They are not buying that compute for you. They are buying it for the moment a model can use those resources to make itself meaningfully smarter — millions of automated research cycles, around the clock.

The Evidence Is Already Here

One of the most important documents I've read in recent years is Leopold Aschenbrenner's Situational Awareness, written by someone who worked inside OpenAI and who laid out, with unusual clarity and specificity, where this trajectory leads. He argued that AGI by 2027 is not science fiction — it is a straight line on a graph, if you trust the trendlines. And the trendlines have been right, repeatedly, while the skeptics have been wrong.

We are now entering the phase he described.

Andrej Karpathy recently open-sourced autoresearch — roughly 630 lines of Python that let an AI agent run ML experiments on a single GPU without any human intervention. The agent proposes a change, runs a five-minute training experiment, checks whether the metric improved, commits or discards, and begins the next cycle. About 12 experiments per hour. Around 100 overnight. In two days, it conducted 700 experiments and discovered 20 optimizations — including a bug in Karpathy's own attention implementation that he had personally missed after months of hand-tuning. Shopify's CEO tried the same pattern on an internal model and woke up to a 19% improvement from 37 experiments he never had to design.

It is a toy today. It is also the blueprint for everything tomorrow.

Closer to home: a researcher at Anthropic recently tasked 16 parallel Claude agents with building a C compiler from scratch — in Rust, capable of compiling the Linux kernel. No active human oversight. The agents coordinated through git, resolved merge conflicts, picked their own tasks, and iterated. Across nearly 2,000 sessions, they produced a 100,000-line compiler that builds Linux 6.9 on x86, ARM, and RISC-V. The humans mostly walked away. Agent loops work.

The Right Question

People who argue we are far from AGI are, I think, often measuring the wrong thing. They point to benchmark scores, to hallucinations, to the gap between current models and some imagined general intelligence. These are real observations. But they miss what I consider the decisive question:

Can the model run its own improvement loop?

Once the answer is yes — even partially, even clumsily — the dynamics change entirely. You don't iterate once a year. You iterate millions of times a day. A decade of algorithmic progress compresses into months. The trendlines were always pointing here. Most people just weren't counting the orders of magnitude.

I want to be honest about the uncertainty involved. We don't know exactly how fast this transition will happen, or precisely what form it will take. The history of AI is littered with predictions that proved either too conservative or too dramatic in the wrong ways. But the direction seems clear to me, and I think intellectual honesty requires saying so.

The question worth spending time on now is not whether this happens. It is whether we will have built the interpretability tools, the alignment foundations, and the governance structures to steer it well when it does.