The agents are good enough to get results when an experienced person is properly guiding these tools.
However, during the holidays there was a big marketing campaign, mainly on Twitter.
Everyone started posting the same talking points repeatedly and suddenly, and triggered a storm of fomo perfect for when everyone was not working.
There was no sudden huge jump, I've been using AI code tools since 2024 and was surprised to see this sudden hype when the tools worked ok before.
Codex/Claude gather telemetry by default. That’s why they are subsidized. You’re giving them training data.
If you start with everything on GitHub, with maybe some manual annotated prompts for fine tuning, you get a decent base model of “if you see this code, then this other code follows” you’ll only go so far
If you can track how thousands of people actually use prompts, then the most successful tool usage patterns that result in success, then you will be able to fine tune to even more data (and train to avoid the unsuccessful ones). Now you’re training with much more data, around how people actually use the product, not theoretical scenarios.
I think your intuition is correct and there’s nothing crazy novel happening
Recent surge Is mainly predictable growth in the same technical direction we’ve been trending. It’s just that it got good enough for people to notice.
In the 3 Body Problem series, the author describes a scenario where
(SPOILERS!!)
… humans are blocked on “fundamental” scientific gains but can still develop incredible technologies that make life unrecognizable
I think we’re seeing that scenario. Scaling RL for tool use and reasoning, improved pretraining data quality, larger models, better agent architectures, improved inference efficiency, etc, are all just incremental moves along the same branch, nothing fundamentally new or powerful
Which is not to say it’s not amazing or valuable. Just that it was not the result of anything super innovative from a research perspective. Model T to 2026 Camry was an amazing shift without really changing the combustion engine.
> Model T to 2026 Camry was an amazing shift without really changing the combustion engine.
A lot of times the big jumps in internal combustion engine development have been down to materials science or manufacturing capability improvement.
The underlying thermodynamics and theoretical limits have not changed, but the individual parts and what we make it out of have steadily improved over time.
The other factor to this is the need for emissions reduction strategies as a overriding design factor.
The analogue to these two in LLMs are:
1. The harnesses and agentic systems-focused training has gotten better over time so performance has increased without a step change in the foundation models.
2. The requirements for guardrails and anti-prompt injection and other concepts to make LLMs palatable for use by consumers and businesses.
The agents are good enough to get results when an experienced person is properly guiding these tools.
However, during the holidays there was a big marketing campaign, mainly on Twitter. Everyone started posting the same talking points repeatedly and suddenly, and triggered a storm of fomo perfect for when everyone was not working.
There was no sudden huge jump, I've been using AI code tools since 2024 and was surprised to see this sudden hype when the tools worked ok before.
Codex/Claude gather telemetry by default. That’s why they are subsidized. You’re giving them training data.
If you start with everything on GitHub, with maybe some manual annotated prompts for fine tuning, you get a decent base model of “if you see this code, then this other code follows” you’ll only go so far
If you can track how thousands of people actually use prompts, then the most successful tool usage patterns that result in success, then you will be able to fine tune to even more data (and train to avoid the unsuccessful ones). Now you’re training with much more data, around how people actually use the product, not theoretical scenarios.
In ML it always boils down to the training data.
I think your intuition is correct and there’s nothing crazy novel happening
Recent surge Is mainly predictable growth in the same technical direction we’ve been trending. It’s just that it got good enough for people to notice.
In the 3 Body Problem series, the author describes a scenario where
(SPOILERS!!)
… humans are blocked on “fundamental” scientific gains but can still develop incredible technologies that make life unrecognizable
I think we’re seeing that scenario. Scaling RL for tool use and reasoning, improved pretraining data quality, larger models, better agent architectures, improved inference efficiency, etc, are all just incremental moves along the same branch, nothing fundamentally new or powerful
Which is not to say it’s not amazing or valuable. Just that it was not the result of anything super innovative from a research perspective. Model T to 2026 Camry was an amazing shift without really changing the combustion engine.
> Model T to 2026 Camry was an amazing shift without really changing the combustion engine.
A lot of times the big jumps in internal combustion engine development have been down to materials science or manufacturing capability improvement.
The underlying thermodynamics and theoretical limits have not changed, but the individual parts and what we make it out of have steadily improved over time.
The other factor to this is the need for emissions reduction strategies as a overriding design factor.
The analogue to these two in LLMs are:
1. The harnesses and agentic systems-focused training has gotten better over time so performance has increased without a step change in the foundation models.
2. The requirements for guardrails and anti-prompt injection and other concepts to make LLMs palatable for use by consumers and businesses.
Quantum computing such that permutations of code to prompt is possible as it tries to answer to some kind of statistical probability solution.