I think this is a great example of both points of view in the ongoing debate.
Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro: Sure, but we can get the agent to fix that.
Anti: Can you, though? We've seen that the more complex the code base, the worse the agents do. Fixing complex issues in a compiler seems like something the agents will struggle with. Also, if they could fix it, why haven't they?
Pro: Sure, maybe now, but the next generation will fix it.
Anti: Maybe. While the last few generations have been getting better and better, we're still not seeing them deal with this kind of complexity better.
Pro: Yeah, but look at it! This is amazing! A whole compiler in just a few hours! How many millions of hours were spent getting GCC to this state? It's not fair to compare them like this!
Anti: Anthropic said they made a working compiler that could compile the Linux kernel. GCC is what we normally compile the Linux kernel with. The comparison was invited. It turned out (for whatever reason) that CCC failed to compile the Linux kernel when GCC could. Once again, the hype of AI doesn't match the reality.
Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!
Anti: this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.
I'm reminded, once again, of the recent "vibe coded" OCaml fiasco[1].
The PR author had zero understanding why their entirely LLM-generated contribution was viewed so suspiciously.
The article validates a significant point: it is one thing to have passing tests and be able to produce output that resembles correctness - however it's something entirely different for that output to be good and maintainable.
I once had a PR. I told the dev that "LLM is ok but you own the code"
He told me "I spent n days to architect the solution"
He shows me claude generated system design .. and then i say ok, I went to review the code. 1hr later i asked why did you repeat the code all over at the end. Dude replies "junk the entire PR it's AI generated"
Has anyone who's familiar with compiler source code tried to compare it to other compilers? Given that LLMs have been trained on data sets that include the source code for numerous C compilers, is this just (say) pcc extruded in Rust form?
> Must be great to work with people like him who have infinite patience and composure.
It is not just patience, he is ready to spent a shitload of time explaining basics to strangers. Such an answer would take, I believe would take a very least half an hour to compose, not counting the time you need to read all the relevant discussion to get the context. But yeah, it would be great to have more people like him around.
Yes, that comment by gasche is a very good general explanation for why vibe coded slop still doesn't cut it for contributing to any non-trivial FLOSS project. When you're building towards a large feature (DWARF support in this case) it's critical for contributions to be small and self-contained so that maintainers and reviewers don't get overwhelmed. As things stand, this means that human effort is an absolute requirement.
When contributions are small and tightly human-controlled it's also less likely that potential legal concerns will arise, since it means that any genuinely creative decisions about the code are a lot easier to trace.
(In this case, the AI seems to have ripped off a lot of the work from OxCaml with inconsistent attribution. OxCaml is actually license compatible (and friendly) with Ocaml but obviously any merge of that work should happen on its own terms, not as a side effect of ripoff slop code.)
If you haven't come across a significant number of AI addicts as obnoxiously delusional as @Culonavirus describes, you must be getting close to retirement age.
People with any connection to new college graduates understand that this sort of idiotic LLM-backed arrogance is extremely common among low-to-mid-functioning twenty-somethings.
however it's something entirely different for that output to be good and maintainable
People aren't prompting LLMs to write good, maintainable code though. They're assuming that because we've made a collective assumption that good, maintainable code is the goal then it must also be the goal of an LLM too. That isn't true. LLMs don't care about our goals. They are solving problems in a probabilistic way based on the content of their training data, context, and prompting. Presumably if you take all the code in the world and throw it in mixer what comes out is not our Platonic ideal of the best possible code, but actually something more like a Lovecraftian horror that happens to get the right output. This is quite positive because it shows that with better prompting+context+training we might actually be able to guide an LLM to know what good and bad looks like (based on the fact that we know). The future is looking great.
However, we also need to be aware that 'good, maintainable code' is often not what we think is the ideal output of a developer. In businesses everywhere the goal is 'whatever works right now, and to hell with maintainability'. When a business is 3 months from failing spending time to write good code that you can continue to work on in 10 years feels like wasted effort. So really, for most code that's written, it doesn't actually need to be good or maintainable. It just needs to work. And if you look at the code that a lot of businesses are running, it doesn't. LLMs are a step forward in just getting stuff to work in the first place.
If we can move to 'bug free' using AI, at the unit level, then AI is useful. Above individual units of code, like logic, architecture, security, etc things still have to come from the developer because AI can't have the context of a complete application yet. When that's ready then we can tackle 'tech debt free' because almost all tech debt lives at that higher level. I don't think we'll get there for a long time.
>They are solving problems in a probabilistic way based on the content of their training data, context, and prompting.
>Presumably if you take all the code in the world and throw it in mixer what comes out is not our Platonic ideal of the best possible code, but actually something more like a Lovecraftian horror that happens to get the right output.
These statements are inaccurate since 2022 when LLMs started to have post training done.
> People aren't prompting LLMs to write good, maintainable code though.
Then they're not using the tools correctly. LLMs are capable of producing good clean code, but they need to be carefully instructed as to how.
I recently used Gemini to build my first Android app, and I have zero experience with Kotlin or most of the libraries (but I have done many years of enterprise Java in my career). When I started I first had a long discussion with the AI about how we should set up dependency injection, Material3 UI components, model-view architecture, Firebase, logging, etc and made a big Markdown file with a detailed architecture description. Then I let the agent mode implement the plan over several steps and with a lot of tweaking along the way. I've been quite happy with the result, the app works like a charm and the code is neatly structured and easy to jump into whenever I need to make changes. Finishing a project like this in a couple of dozen hours (especially being a complete newbie to the stack) simply would not have been possible 2-3 years ago.
> Then they're not using the tools correctly. LLMs are capable of producing good clean code, but they need to be carefully instructed as to how.
I'd argue that when the code is part of a press release or corporate blog post (is there even a difference?) by the company that the LLM in question comes from, e.g. Claude's C compiler, then one cannot reasonably assert they were "not using the tools correctly": even if there's some better way to use them, if even the LLM's own team don't know how to do that, the assumption should be that it is unreasonable to expect anyone else to how to do that either.
I find it interesting and useful to know that the boundary of the possible is a ~100kloc project, and that even then this scale of output comes with plenty of flaws.
Know what the AI can't do, rather than what it can. Even beyond LLMs, people don't generally (there's exceptions) get paid for manually performing tasks that have already been fully automated, people get paid for what automation can't do.
Moving target, of course. This time last year, my attempt to get an AI to write a compiler for a joke language didn't even result in the source code for the compiler itself compiling; now it not only compiles, it runs. But my new language is a joke language, no sane person would ever use it for a serious project.
LLMs do not learn. So every new session for them will be rebuilding the world from scratch. Bloated Markdown files quickly exhaust context windows, and agents routinely ignore large parts of them.
And then you unleash them on one code base that's more than a couple of days old, and they happily duplicate code, ignore existing code paths, ignore existing conventions etc.
That's why I'm very careful about how the context is constructed. I make sure all the relevant files are loaded with the prompt, including the project file so it can see the directory structure. Also keep a brief summary of the app functionality and architecture in the AGENTS.md file. For larger tasks, always request a plan and look through it before asking it to start writing code.
Not trying to be rude, but in a technology you're not familiar with you might not be able to know what good code is, and even less so if it's maintainable.
Finding and fixing that subtle, hard to reproduce bug that could kill your business after 3 years.
That's a fair point, my code is likely to have some warts that an experienced Android/Kotlin dev would wince at. All I know is that the app has a structure that makes an overall sense to me, with my 15+ years of experience as a professional developer and working with many large codebases.
I think we are going to have to find out what maintenance even looks like when LLMs are involved. "Maintainable" might no longer mean quite the same thing as it used to.
But it's not going to be as easy as "just regenerate everything". There are dependencies external to a particular codebase such as long lived data and external APIs.
I also suspect that the stability of the codebase will still matter, maybe even more so than before. But the way in which we define maintainability will certainly change.
The framing is key here. Is three years a long time? Both answers are right. Just getting a business off the ground is an achievement in the first place. Lasting three years? These days, I have clothes that don't even last that long. And then three years isn't very long at all. Bridges last decades. Countries are counted by centuries. Humanity is a millennia old. If AI can make me a company that's solvent for three years? Well, you decide.
That mirrors my experience so far. The AI is fantastic for prototyping, in languages/frameworks you might be totally unfamiliar with. You can make all sorts of cool little toy projects in a few hours, with just some minimal promoting
The danger is, it doesn't quite scale up. The more complex the project, the more likely the AI is to get confused and start writing spaghetti code. It may even work for a while, but eventually the spaghetti piles up to the point that not even more spaghetti will fix it
I'll get that's going to get better over the next few years, with better tooling and better ways to get the AI to figure out/remember relevant parts of the code base, but that's just my guess
Not sure what exactly you're referring to, but legal is a very interesting field to observe, right? I've been wondering about that since quite early in my LLM awareness:
A slightly sarcastic (or perhaps not so slightly..) mental model of legal conflict resolution is that much of it boils down to throwing lots of content at the opposing side, claiming that it shows that the represented side is right and creating a task for the opposite side to find a flaw in that material. I believe that this game of quantity fits through the whole range from "I'll have my lawyer repeat my argument in a letter featuring their letter head" all the way to paper-tsunamis like the Google-Oracle trial.
Now give both sides access to LLM... I wonder if the legal profession will eventually settle on some format of in-person offline resolution with strict limits to recess and/or limits to word count for both documents and notes, because otherwise conflicts fail to get settled in anyone's lifetime (or won by whoever does not run out of tokens first - come thinking of it, the technogarchs would love this, so I guess this is exactly what will happen barring a revolution)
I just read that whole thread and I think the author made the mistake of submitting a 13k loc PR, but other than that - while he gets downvoted to hell on every comment - he's actually acting professionally and politely.
I wouldn't call this a fiasco, it reads to me more that being able to create huge amounts of code - whether the end result works well or not - breaks the traditional model of open source. Small contributions can be verified and the merrit-vs-maintenance-effort can at least be assessed somewhat more realistically.
I have no bones in the "vibe coding sucks" vs "vibe coding rocks" discussion and I reading that thread as an outsider. I cannot help but find the PR author's attitude absolutely okay while the compiler folks are very defensive. I do agree with them that submitting a huge PR request without prior discussion cannot be the way forward. But that's almost orthogonal to the question of whether AI-generated code is or is not of value.
If I were the author, I would probably take my 13k loc proof-of-concept implementation and chop it down into bite-size steps that are easy to digest, and try to get them to get integrated into the compiler successively, with being totally upfront about what the final goal is. You'd need to be ready to accept criticism and requests for change, but it should not be too hard to have your AI of choice incorporate these into your code base.
I think the main mistake of the author was not to use vibe coding, it was to dream up his own personal ideal of a huge feature, and then go ahead and single-handedly implement the whole thing without involving anyone from the actual compiler project. You cannot blame the maintainers for not being crazy about accepting such a huge blob.
Minimally, I don't find this an unusual tone in the slightest for cs threads. But then again, I'm old.
I'm also quite surprised that apparently you cannot utter what is clearly just a personal opinion -- not a claim of objective truth -- without getting downvoted. But then again, the semantics of votes are not well-defined.
At the same time, I'm quite grateful for the constructive comments further down below under my original post.
He is not polite, he is of the utmost rudeness. As a reply to being pointed to the fact that he copied so much code that the generated code included someone else's name in the License, his reply was https://github.com/ocaml/ocaml/pull/14369/changes/ce372a60bd...
I struggle to think how someone thinks this is polite. Is politeness to you just not using curse words?
Admittedly, his handling of this aspect was perhaps less than ideal, but I cannot see any impoliteness here whatsoever. As a matter of fact, I struggle to think how you could think otherwise.
But I am biased. After having lived a number of years in a country where I would say the average understanding of politeness is vastly different from where I've grown up, I've learned that there is just a difference of opinion of what is polite and what isn't. I have probably been affected by that too.
Ah, I see what you mean - you're making a distinction between someone's speech and someone's acts. Fair enough. In that sense, you would argue that the action of dropping a 13k loc PR is impolite, and I can see that.
It's just that in my reading, I did not find his demeanor in the comment thread to be impolite. He was trying to sell his contribution and I think that whatever he wrote was using respectful language.
He responds to a thoughtful and detailed 600-word comment from a maintainer with a dismissive "Here's the AI-written copyright analysis..." + thousands of words of slop.
The effort asymmetry is what's rude. The maintainers take their project very seriously (as they should) and are generous enough with their time to invite contribution from outsiders. Showing up and dropping 13k lines of code, posting comments copy+pasted from a chat window, and insisting that your contribution is trustworthy not because you thought it through but because you fed it through a few LLMs shows that you don't respect the maintainers' time. In other words: you are being rude. They would have to put in more upfront effort to review your contribution than you put in to create it! Then they have to maintain it in perpetuity.
Well, I wouldn't necessarily call it "going out of your way to be accommodating", but impolite is just not the word I'd choose to characterize it. I can see why others might but it's just my personal feeling that I don't think that this is the correct adjective here.
That said, I don't feel like this topic is important enough to go on about it, I probably spend enough keystrokes on it already.
interpreting his words on a literal basis , the PR submitter isn't being directly impolite ...
if you will , place yourself in the shoes of the repository maintainer. a random person (with a personal agenda) has popped up trying to sell you a solution (that he doesn't understand) to a problem (that you don't see as problematic). after you spending literal hours patiently explaining why the proposition is not acceptable , this random person still continues attempting to sell his solution.
do you see any impoliteness in the reframed scenario ?
I think there's nothing wrong with trying to sell your solution, and I'm skeptical about the "literal hours" that you claim.
The way I interpret this thread is that the PR poster had a certain itch and came up with a vibe-coded solution that helped him. Now he's trying to make that available for others too. The maintainers don't want it because it's too large a PR to review properly and because they don't want to have to maintain it afterwards.
I can totally see both positions.
I was just referring to the fact that - in my opinion - unlike others here, his writing did not appear impolite to me. But you know, that's just me. I thought that he was trying to sell his code, and it's not unusual to get rejected at first, so I can't blame him for trying to defend his contribution. All I'm saying is that I thought he did so in a respectful manner, but of course you could argue that the whole endeavor was already an act of impoliteness, in a way?!
That, respectfulness and politeness are more from intentions/actions than from speech alone. Politeness of language without any respect for the actual function of that speech is pointless. Indeed, that this what the LLMs are trained for. Form over function. And many humans get fooled by it and are also clueless like the person dropping the steaming turd of a PR.
That may or may not be the case - I really was just going off this one thread, and how I personally read it. I completely appreciate that others read it differently.
This to me sounds a lot like the SpaceX conversation:
- Ohh look it can [write small function / do a small rocket hop] but it can't [ write a compiler / get to orbit]!
- Ohh look it can [write a toy compiler / get to orbit] but it can't [compile linux / be reusable]
- Ohh look it can [compile linux / get reusable orbital rocket] but it can't [build a compiler that rivals GCC / turn the rockets around fast enough]
- <Denial despite the insane rate of progress>
There's no reason to keep building this compiler just to prove this point. But I bet it would catch up real fast to GCC with a fraction of the resources if it was guided by a few compiler engineers in the loop.
We're going to see a lot of disruption come from AI assisted development.
All these people that built GCC and evolved the language did not have the end result in their training set. They invented it. They extrapolated from earlier experiences and knowledge, LLMs only ever accidentally stumble into "between unknown manifolds" when the temperature is high enough, they interpolate with noise (in so many senses). The people building GCC together did not only solve a to technical problem. They solved a social one, agreeing on what they wanted to build, for what and why. LLMs are merely copying these decisions.
That's true and I fully agree. I don't think LLMs' progress in writing a toy C compiler diminishes the achievements that the GCC project did.
But also we've just witnessed LLMs go from being a glorified line auto-complete tool to it writing a C compiler in ~3 years. And I think that's something. And noting how we keep moving the goal post.
The pattern matching rote-student is acing the class. No surprises here.
There is no need to understand the subject from first principles to ace tests.
Majority of high-school and college kids know this.
This I strongly suspect is the crux of the boundaries of their current usefulness. Without accompanying legibility/visibility into the lineage of those decisions, LLM's will be unable to copy the reasoning behind the "why", missing out on a pile of context that I'm guessing is necessary (just like with people) to come up to speed on the decision flow going forward as the mathematical space for the gradient descent to traverse gets both bigger and more complex.
We're already seeing glimmers of this as the frontier labs are reporting that explaining the "why" behind prompts is getting better results in a non-trivial number of cases.
I wonder whether we're barely scratching the surface of just how powerful natural language is.
All right, but perhaps they should also list the grand promises they made and failed to deliver on. They said they would have fully self-driving cars by 2016. They said they would land on Mars in 2018, yet almost a decade has passed since then. They said they would have Tesla's fully self-driving robo-taxis by 2020 and human-to-human telepathy via Neuralink brain implants by 2025–2027.
> - <Denial despite the insane rate of progress>
Sure, but not by what was actually promised. There may also be fundamental limitations to what the current architecture of LLMs can achieve. The vast majority of LLMs are still based on Transformers, which were introduced almost a decade ago. If you look at the history of AI, it wouldn't be the first time that a roadblock stalled progress for decades.
> But I bet it would catch up real fast to GCC with a fraction of the resources if it was guided by a few compiler engineers in the loop.
Okay, so at that point, we would have proved that AI can replicate an existing software project using hundreds of thousands of dollars of computing power and probably millions of dollars in human labour costs from highly skilled domain experts.
Are we sure about that? I mean, we have seen that LLMs are able to generalize to some degree. So I don't see a reason why you couldn't put an agent in a loop with a profiler and have it try to optimize the code. Will it come up with entirely novel ideas? Unlikely. Could it potentially combine existing ideas in interesting, novel ways that would lead to CCC outperforming GCC? I think so. Will it get stuck along the way? Almost certainly.
Would you want it to? The further the goal posts are the more progress we are making, and that's good, no? Trying to make it into a religious debate between believers and non-believers is silly. Neither side can predict the future, and, even if they could, winning the debate is not worth anything!
What is interesting is what can do with LLMs today and what we would like them to be able to do tomorrow so we can keep developing them into a good direction. Whether or not you (or I) believe it can do that thing tomorrow is thoroughly uninteresting.
The goalpost is not moving. The issue is that AI generates code that kinda looks ok but usually has deep issues, specially the more complex the code is. And that's not being really improved.
There are two questions which can be asked for both. The first one is "can these tech can achieve their goals?" which is what you seem debating. The other question is "is a successful outcome of these tech desirable at all?". One is making us pollute space faster than ever, as if we did not fuck the rest enough. They other will make a few very rich people even richer and probably everyone else poorer.
The difference I see is that, after "get to orbit", the goalposts for SpaceX are things that have never been done before, whereas for LLMs the goalposts are all things that skilled humans have been able to do for decades.
AI assist in software engineering is unambiguously demonstrated to some done degree at this point: the "no LLM output in my project" stance is cope.
But "reliable, durable, scalable outcomes in adversarial real-world scenarios" is not convincingly demonstrated in public, the asterisks are load bearing as GPT 5.2 Pro would say.
That game is still on, and AI assist beyond FIM is still premature for safety critical or generally outcome critical applications: i.e. you can do it if it doesn't have to work.
I've got a horse in this race which is formal methods as the methodology and AI assist as the thing that makes it economically viable. My stuff is north of demonstrated in the small and south of proven in the large, it's still a bet.
But I like the stock. The no free lunch thing here is that AI can turn specifications into code if the specification is already so precise that it is code.
The irreducible heavy lift is that someone has to prompt it, and if the input is vibes the output will be vibes. If the input is zero sorry rigor... you've just moved the cost around.
The modern software industry is an expensive exercise in "how do we capture all the value and redirect it from expert computer scientists to some arbitrary financier".
You can't. Not at less than the cost of the experts if the outcomes are non-negotiable.
And all these improvements past 1935 have been rendered irrelevant to the daily driver by safety regulations (I'll limit this claim to most of the continental US to avoid straying beyond my experience.)
You can be wrong on every step of your approximation and still be right in the aggregate. E.g. order of magnitude estimate, where every step is wrong but mistakes cancel out.
Human crews on Mars is just as far fetched as it ever was. Maybe even farther due to Starlink trying to achieve Kessler syndrome by 2050.
> This to me sounds a lot like the SpaceX conversation
The problem is that it is absolutely indiscernible from the Theranos conversation as well…
If Anthropic stopped making lies about the current capability of their models (like “it compiles the Linux kernel” here, but it's far from the first time they do that), maybe neutral people would give them the benefit of the doubt.
For one grifter that happen to succeed at delivering his grandiose promises (Elon), how many grifters will fail?
That’s been the trend for a while. Can you make a prediction that says something concretely like “AI will not be able to do X by 2028” for a specific and well defined X?
In 2030, an AI model that I can run on my computer, without having to trust an evil megacorporation, will not be able to write a compiler for my markup language [0] based on a corpus of examples, without seeing the original implementation, using no more than 1.5× as much code as I did.
No, I don't but it sounds very similar to the the naysayers that have silently moved the goalposts. That said, you're one of the few people in the wild that still claims LLMs are completely useless so I give you that.
> Models have gotten much better than even the most optimistic predictions.
We were promised Roko's Basilisk by now, damnit! Where's my magical robot god?!
But seriously, predictions a couple years back for 2026/27 (by quite big players, like Altman) were for AGI or as good as.
I do not, for the record, claim that they are totally useless. They are useful where correctness of results does not matter, for instance low-stakes natural language translation and spam generation. There's _some_ argument that they are somewhat useful in cases where their output can be reviewed by an expert (code generation etc), though honestly quantitive evidence there is mixed at best; for all the "10x developer" claims, there's not much in the way of what you'd call hard evidence.
> This is still true - a compiler can never win this battle. All a human programmer has to do is take the output of the compiler and make a single optimisation, and he/she wins. This is the advantage that the human has - they can use any of a wide variety of tools at their disposal (including the compiler), whilst the compiler can only do what it was programmed. The best the compiler can hope for is a tie.
And not to mention that a C compiler is something we have literally 50 years worth of code for. I still seriously doubt the ability of LLMs to tackle truly new problems.
What do you classify as new? Every problem that we solve as developers is a very small deviation from already existing problems. Maybe that’s the point of llms?
How many developers do you think are solving truly novel problems? Most like me are CRUD bunnies.
If your problem is a very small deviation from an existing problem, you should be able to take an existing open-source solution and make a very small modification to adapt it to your use case. No need for “vibe-coding” a lower-quality implementation from scratch.
Yeah, it kind of strikes me how a lot of the LLM use cases would actually be better served by existing techniques, like more/better libraries. And if that's not possible, it'd be way better to find the closest match, fork it, and make minimal modifications. At least then you have the benefit of an upstream.
But, sort of like cryptocurrency, the LLM people aren't so much trying to solve actual problems, but rather find an application of their existing technology. Sort of like the proverbial saying: when you're selling hammers, you want convince everyone that their problem as a nail.
As a pro, my argument is "it's good enough now to make me incredibly productive, and it's only going to keep getting better because of advancements in compute".
I'd rather get really good at leveraging AI now than to bury my head in the sand hoping this will go away.
I happen to agree with the saying that AI isn't going to replace people, but people using AI will replace people who don't. So by the time you come back in the future, you might have been replaced already.
it sure is possible that One person using AI effectively may replace 10 people like me. it is just as likely that i may replace 10 people who only use AI.
> I'd rather get really good at leveraging AI now than to bury my head in the sand hoping this will go away.
I don't think those are the only two options, though.
Further, "Getting really good at leveraging AI" is very different to "Getting really good at prompting LLMs".
One is a skill that might not even result in the AI providing any code. The other is a "skill" in much the same way as winning hotdog eating contests is a "skill".
In the latter, even the least-technical user can replace you once they get even halfway decent at min-maxing their agent's input (md files, although I expect we'll switch away from that soon enough to a cohesive and structured UI).
In the former, you had better find some really difficult problems that pay when you solve them.
Either way, I predict a lot of pain and anguish in the near future, for a lot of people. Especially those who expect that prompting skills are an actual "skill".
Why would anything you learn today be relevant tomorrow if AI keeps advancing? You would need less and less of all your tooling, markdown files and other rituals and just let the AI figure it out altogether.
So I can keep my job now so I can pay for compute in the future when I'm out of a job. The compute will be used to create my own business to make money.
What makes you think you’ll be able to out-compete the purely-AI-led businesses with your business? What skills will give you an edge in the business that won’t also give you an edge in the job?
> What makes you think you’ll be able to out-compete the purely-AI-led businesses with your business? What skills will give you an edge in the business that won’t also give you an edge in the job?
Or why do you think your small AI-driven business can survive against richer people who can pay for more compute and thus do better than you?
>> Or why do you think your small AI-driven business can survive against richer people who can pay for more compute and thus do better than you?
> I don't know. Maybe because of my creativity?
How would you keep that up? I think there's a false belief, especially common in business-inflected spaces (which includes the tech sector), that skills and abilities can be endlessly specialized (e.g. the MBA claim that they're experts at running businesses, any business). The more you outsource to AI the less creative you'll become, because you'll lose touch with the work.
You can only really be creative in spaces where you regularly get your hands dirty. Lose that, and I think your "creativity" will become the equivalent of the MBA offering the same cookie-cutter financial engineering ideas to every business (layoffs, buy back stock, FOMO the latest faddish ideas, etc).
Maybe I can't. But that's also why I'm invested into AI companies right now.
Plan:
1. Keep my job for as long as I can by leveraging current AI tools to the best of my ability while my non-AI user colleagues lose earnings power or their job
2. Invest my money into AI companies and/or S&P500
3. If I'm truly rendered useless, live off of the investment
I believe that I’ll keep enough of an edge in my job that I’ll continue to be employed. (At present I still have zero pressure to use AI, though I do use it.) It’s of course possible for that to turn out to be wrong, but in that case I also see no chance to start a business, and society will be in a lot of trouble.
Those are two different things though, and not everyone are stuck at a place enforcing token usage. And why would anyone pay you for something if all it takes is compute to make it? They would just make it themselves.
> this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.
That's a really nice fictitious conversation but in my experience "anti-ai" people would be prone to say "This is stupid LLM's will never be able to write complex code and attempting to do so is futile". If your mind is open to explore how LLM's will actually write complex software then by definition you are not "anti".
I think you also forgot: Anti: But the whole thing can only have been generated because GCC and other compilers already exists (and depending on how strong the anti-feeling is: and has been stolen…)!
I don't think this is how pro and anti conversation goes.
I think the pro would tell you that if GCC developers could leverage Opus 4.6, they'd be more productive.
The anti would tell you that it doesn't help with productivity, it makes us less versed in the code base.
I think the CCC project was just a demonstration on what Opus can do now autonomously. 99.9% of software projects out there aren't building something as complex as a Linux compiler.
As someone who leans pro in this debate, I don't think I would make that statement. I would say the results are exactly as we expect.
Also, a highly verifiable task like this is well suited to LLMs, and I expect within the next ~2 years AI tools will produce a better compiler than gcc.
it can feed into itself and improve. the idea that self-training necessarily causes deterioration is fanfic. remember that they spend massive amounts of compute on rl.
No, they will point out that the way to make GCC better is not really in the code itself. It's in scientific paper writing and new approaches. Implementation is really not the most work.
Yes, we will certainly go that way, probably code already added to gcc has been developed through collaborative AI tools. Agree we don't call that "produced by AI".
I think compilers though are a rare case where large scale automated verification is possible. My guess is that starting from gcc, and all existing documentation on compilers, etc. and putting ridiculous amounts of compute into this problem will yield a compiler that significantly improves benchmarks.
> Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
> Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Also, from the Anti-LLM perspective: did the coding agent actually build a working compiler, or just plagiarize prior art? C compilers are certainly part of the LLM's training set.
That's relevant because the implication seems to be: "Look, the agent can successfully develop really advanced software!" when the reality may be that it can plagiarize existing advanced software, and will fall on its face if asked to do anything not already done before.
A lot of propaganda and hype follows the pattern of presenting things in a way to creating misleading implications in the mind of the listener that the facts don't actually support.
It seems that the cause of the difference in opinion is that the anti camp is looking at the current state while the pro camp looking at the slope and projecting it into the future.
This is spot on, you can find traces of this conversation in the original thread posted on HN as well, where people are proclaiming "yeah it doesn't work, but still impressive!"
Reminds me so much of the people posting their problems about the tesla cybertruck and ending the post with "still love the truck though"
Pretty much. It's missing a tiny detail though. One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.
And no-one ever stops and thinks about what it means to give up so much control.
Maybe one of those companies will come out on top. The others produce garbage in comparison. Capital loves a single throat to choke and doesn't gently pluralise. So of course you buy the best service. And it really can generate any code, get it working, bug free. People unlearn coding on this level. And some day, poof, Microsoft is coming around and having some tiny problem that it can generate a working Office clone. Or whatever, it's just an example.
This technology will never be used to set anyone free. Never.
The entity that owns the generator owns the effective means of production, even if everyone else can type prompts.
The same technology could, in a different political and economic universe, widen human autonomy. But that universe would need strong commons, enforced interoperability, and a cultural refusal to outsource understanding.
And why is this different from abstractions that came before? There are people out there understanding what compilers are doing. They understand the model from top to bottom. Tools like compilers extended human agency while preserving a path to mastery. AI code generation offers capability while dissolving the ladder behind you.
We are not merely abstracting labor. We are abstracting comprehension itself. And once comprehension becomes optional, it rapidly becomes rare. Once it becomes rare, it becomes political. And once it becomes political, it will not be distributed generously.
Nah bro it makes them productive. Get with the program. Amazing . Fantastic. Of course it resonates with idiots because they can't think beyond the vicinity of their own greed. We are doomed , noone gives two cents. Idiocracy is here and it's not Costco.
What an amazing tech. And look, the CEOs are promising us a good future! Maybe we can cool the datacenters with Brawndo. Let me ask chat if that is a good idea.
You could make same argument in "information superhighway" days, but it turned out to be the opposite: no company monopolised internet services, despite trying hard.
With so many companies in AI race it is already pretty competitive landscape and it doesnt seem likely to me that any of them can build deep enough moat to come ahead.
a few? all sorts of websites and services are thriving on the Internet even after significant consolidation of attention social media caused. Not even close to a dystopian picture parent comment paints.
I don't feel that I see this anywhere but if so, I guess I'm in a third camp.
I am "pro" in the sense that I believe that LLM's are making traditional programming obsolete. In fact there isn't any doubt in my mind.
However, I am "anti" in the sense that I am not excited or happy about it at all! And I certainly don't encourage anyone to throw money at accelerating that process.
> I believe that LLM's are making traditional programming obsolete. In fact there isn't any doubt in my mind.
Is this what AI psychosis looks like? How can anyone that is a half decent programmer actually believe that English + non-deterministic code generator will replace "traditional" programming?
4GLs are productive yes, but also limited, and still require someone to come up with specs that are both informed by (business) realities and engineering considerations.
But this is also an arena where bosses expect magic to happen when people are involved; just pronounce a new strategy, and your business magically transforms - without any of that pesky 'figuring out what to do' or 'aligning stakeholders' or 'wondering what drugs the c-suite is doing'. Let LLMs write the specs!
> One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.
That's a valid take. The problem is that there are, at this time, so many valid takes that it's hard to determine which are more valid/accurate than the other.
FWIW, I think this is more insightful than most of the takes I've seen, which basically amount to "side-1: we're moving to a higher level of abstraction" and "side-2: it's not higher abstraction, just less deterministic codegen".
I'm on the "higher level of abstraction" side, but that seems to be very much at odds with however Anthropic is defining it. Abstraction is supposed to give you better high-level clarity at the expense of low-level detail. These $20,000 burning, Gas Town-style orchestration matrices do anything but simplify high level concerns. In fact, they seem committed building extremely complex, low-level harnesses of testing and validation and looping cycles around agents upon agents to avoid actually trying to deal with whatever specific problem they are trying to solve.
How do you solve a problem you refuse to define explicitly? We end up with these Goodhart's Law solutions: they hit all of the required goals and declare victory, but completely fail in every reasonable metric that matters. Which I guess is an approach you make when you are selling agents by the token, but I don't see why anyone else is enamored with this approach.
> Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!
The billion dollar question is, can we get from 80% to 100%? Is this going to be a situation where that final gap is just insurmountable, or will the capabilities simply keep increasing?
> Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
> Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro-LLM: Read the freaking article, it's not that long. The compiler made a mistake in an area where only two compilers exist that are up to the task: Linux Kernel.
Anthropic said they vibe-coded a C compiler that could compile the Linux kernel. That's what they said. No-one forced them to say that. They could have picked another code base.
It turns out that isn't true in all instances, as this article demonstrates. I'm not nearly expert enough to be able to decide if that error was simple, stupid, irrelevant, or whatever. I can make a call on whether it successfully compiled the Linux kernel: it did not.
I'm sorry for being excessively edgy, but "it's useless" is not a good summary for "linking errors after successfully compiling Linux kernel for x86_64."
Me: Top 0.02%[1] human-level intelligence? Sure. But we aren't there yet.
[1] There are around 8k programming languages that are used (or were used) in practice (that is, they were deemed better than existing ones in some aspects) and there are around 50 million programmers. I use it to estimate how many people did something, which is objectively better than existing products.
The freaking article omits several issues in the "compiler". My bet is because they didn't actually challenged the output of the LLM, as it usually happens.
If you go to the repository, you'll find fun things, like the fact that it cannot compile a bunch of popular projects, and that it compiles others but the code doesn't pass the tests. It's a bit surprising, specially when they don't explain why those failures exist (are they missing support for some extensions? any feature they lack?)
It gets less surprising, though, when you start to see that the compiler doesn't actually do any type checking, for example. It allows dereferences to non-pointers. It allows calling functions with the wrong number of arguments.
There's also this fantastic part of the article where they explain that the LLM got the code to a point where any change or bug fix breaks a lot of the existing tests, and that further progress is not possible.
Then the fact that this article points out that the kernel doesn't actually link. How did they "boot it"? It might very well be possible that it crashed soon after boot and wasn't actually usable.
So, as usual, the problem here is that a lot of people look at LLM outputs and trust what they're saying they achieved.
The purpose of this project is not to create a state-of-the-art C compiler on par with projects that represent tens of thousands of developer-years. The goal is to assess the current capabilities of a largely autonomous software-building pipeline: it's not yet limitless, but better than it was. What a shocker.
I’ve had my share of build errors while compiling the Linux kernel for custom targets, so I wouldn’t be so sure that linker errors on x86_64 can’t be fixed with changes to the build script.
> The goal is to assess the current capabilities of a largely autonomous software-building pipeline: it's not yet limitless, but better than it was. What a shocker.
Of course, but we're trying to assess the capabilities by looking at the LLM output as if it were a program written by a person. If someone told me to check out their new C compiler that can build the kernel, I'd assume that other basic things, such as not compiling incorrect programs, are already pretty much covered. But with an LLM we can't assume that. We need to really check what's happening and not trust the agent's word for it.
And the reason why it's important it's because we really need to check whether it's actually "better than it was" or just "doing things incorrectly for longer". Let's say your goal was writing a gcc replacement. Does this autonomous pipeline get you closer? Or does it just get you farther away through the wrong path? Considering that it's full of bugs and incomplete implementations and cannot be changed without things breaking down, I'd say it seems to be the latter.
That's such a strawman conversation. Starting from:
> it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
It works. It's not perfect, but anthropic claims to have successfully compiled and booted 3 different configurations with it. The blog post failed to reproduce one specific version on one specific architecture. I wish anthropic gave us more information about which kernel commits succeeded, but still. Compare this to years that it took for clang to compile the kernel, yet people were not calling that compiler useless.
If anyone thinks other compilers "just work", I invite them to start fixing packages that fail to build in nixos after every major compiler change, to get a dose of real world experience.
I think LLMs the technology is very cool and l’m frankly amazed at what it can do. What I’m ‘anti’ about is pushing the entire economy all in on LLM tech. The accelerationist take of ‘just keep going as fast as possible and it will work out, trust me bro’ is the most unhinged dangerous shit I’ve ever heard and unfortunately seems to be the default worldview of those in charge of the money. I’m not sure where all the AI tools will end up, but I am willing to bet big that the average person is not going to be better off 10 years from now. The direction the world is going scares the shit out of me and the usages of AI by bad actors is not helping assuage that fear.
Honestly? I think if we as a society could trust our leaders (government and industry) to not be total dirtbags the resistance to AI would be much lower.
Like imagine if the message was “hey, this will lead to unemployment, but we are going to make sure people can still feed their families during the transition, maybe look in to ways to support subsidies retraining programs for people whose jobs have been impacted .” Seems like a much more palatable narrative than, “fuck you pleb! go retrain as a plumber or die in a ditch. I’ll be on my private island counting the money I made from destroying your livelihood.”
What does this imagined conversation have to do with the linked article? The “pro” and “anti” character both sound like the kind of insufferable idiots I’d expect to encounter on social media, the OP is a very nice blog post about performance testing and finding out what compilers do, doesn’t attempt any unwarranted speculation about what agents “struggle with” or will do “next generation”, how is it an example of that sort of shitposting?
First, remember when we had LLMs run optimisation passes last year? Alphaevolve doing square packing, and optimising ML kernels? The "anti" crowd was like "well, of course it can automatically optimise some code, that's easy". And things like "wake me up when it does hard tasks". Now, suddenly when they do hard tasks, we're back at "haha, but it's unoptimised and slow, laaame".
Second, if you could take 100 juniors, 100 mid level devs and 100 senior devs, lock them in a room for 2 weeks, how many working solutions that could boot up linux in 2 different arches, and almost boot in the third arch would you get? And could you have the same devs now do it in zig?
The thing that keeps coming up is that the "anti" crowd is fighting their own deamons, and have kinda lost the plot along the way. Every "debate" is about promisses, CEOs, billions, and so on. Meanwhile, at every step of the way these things become better and better. And incredibly useful in the right hands. I find it's best to just ignore the identity folks, and keep on being amazed at the progress. The haters will just find the next goalpost and the next fight with invisible entities. To paraphrase - those who can, do, those who can't, find things to nitpick.
You're heavily implying that because it can do this task, it can do any task at this difficulty or lower. Wrong. This thing isn't a human at the level of writing a compiler, and shouldn't be compared to one
Codex frustratingly failed at refactoring my tests for me the other day, despite me trying many, many prompts of increasing specificity. A task a junior could've done
Am I saying "haha it couldn't do a junior level task so therefor anything harder is out of reach?" No, of course not. Again, it's not a human. The comparison is irrelevant
Calculators are superhuman at arithmetic. Not much else, though. I predict this will be superhuman at some tasks (already is) and we'll be better at others
Adopt this half baked, half broken, insanely expensive, planet destroying, IP infringing tech, you have no choice.
Burn everything, because if you don’t, you will get left behind and, maybe, just maybe, in 2 years when it’s good enough, maybe… after hoovering up all the money, IP and domain expertise for free, and you’ve burnt all your money & sanity prompting and cajoling it to a semi working solution for a problem you didn’t really have in the first place, it will dump you at the back of the unemployment line. All hail the AI! Crazy times.
In the meantime please enjoy targeted scams, ever increasing energy prices, AI content farms, hardware shortages, and endless, endless slop.
When humans architect anything - ideas, buildings, software or ice cream sundaes, we make so many little decisions that affect the overall outcome, we don’t even know or think about it! Too many sprinkles and sauce and it will be too sweet and hard to eat. We make those decisions based on both experience and imagination. Watch a small
child making one to see the perfect human intersection of these two things at play. The LLM totally lacks the imagination part, except in the worst possible ways. It’s experience includes all sorts of random internet garbage that can sound highly convincing even to domain experts. Now it’s training set is being further expanded with endless mountains of more highly impressive sounding garbage.
It was obvious to me with the first image gen models how incredibly impressive it was to see an image gradually forming from the computer based on nothing but my brief text input but also how painfully limited the technology would always be. After days and days of early obsessive image generation, I was no better as an artist than when I began! Everything also kind of looked the same as well?
As incredible as it was, it was nothing more than a massively complicated, highly advanced parlour trick. A futuristic, highly powerful pattern generator. Nothing has changed my mind at all. All that’s happened is we’ve seen the worst tricksters, shysters and con artists jump on a very dangerous bandwagon to hell and try and whip us less compliant souls onboard.
Lots of things follow patterns, the joy in life, for me, is discovering the patterns, exploring them and developing new unique and interesting patterns.
I’ve yet to encounter a bandwagon worth joining anyway, maybe this will be the one that leaves me behind and i’ll be forced to retire on cartoon gorilla NFTs and tulip farming?
First off Alpha Evolve isn't an LLM. No more than a human is a kidney.
Second depends. If you told them to pretrain for writing C compiler however long it takes, I could see a smaller team doing it in a week or two. Keep in mind LLMs pretrain on all OSS including GCC.
> Meanwhile, at every step of the way these things become better and better.
Will they? Or do they just ingest more data and compute?[1] Again, time will tell. But to me this seems more like speed-running into an Idiocracy scenario than a revolution.[2]
I think this will turn out another driverless car situation where last 1% needs 99% of the time. And while it might happen eventually it's going to take extremely long time.
[1] Because we don't have much more computing jumps left, nor will future data be as clean as now.
[2] Why idiocracy?
Because they are polluting their own corpus of data. And by replacing thinking about computers, there will be no one to really stop them.
We'll equalize the human and computer knowledge by making humans less knowledgeable rather than more.
So you end up in an Idiocracy-like scenario where a doctor can't diagnose you, nor can the machine because it was dumbed down by each successive generation, until it resembles a child's toy.
> AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization. AlphaEvolve pairs the creative problem-solving capabilities of our Gemini models with automated evaluators that verify answers, and uses an evolutionary framework to improve upon the most promising ideas.
> AlphaEvolve leverages an ensemble of state-of-the-art large language models: our fastest and most efficient model, Gemini Flash, maximizes the breadth of ideas explored, while our most powerful model, Gemini Pro, provides critical depth with insightful suggestions. Together, these models propose computer programs that implement algorithmic solutions as code.
It’s more like a concept car vs a production line model. The capabilities it has were fine tuned for a specific scenario and are not yet available to the general public.
I have no idea what you're arguing. Alphaevolve is similar to claude code. They are using LLMs in a harness. No idea what you mean with fish, kidneys and so on. Can you please stick to the technical stuff? Otherwise it's just noise.
First off, let's say I'm wrong about Alpha Evolve. Fine, I made two more points; address them as well; that's just normal manners in a conversation.
Second, I question your idea of what Alpha Evolve is. You seem to think it's an LLM or LLM-adjacent when it's more like an evolutionary algo picking a better seed among the LLMs. That's not an LLM, if anything, it has some ability to correct itself.
The "Anti" stance is only tenable now if you believe LLMs are going to hit a major roadblock in the next few months around which Big AI won't be able to navigate. Something akin to the various "ghosts in the machine" that started bedeviling EEs after 2000 when transistors got sufficiently small, including gate leakage and sub-threshold current, such that Dennard Scaling came to an abrupt end and clock speeds stalled.
I personally hope that that happens, but I doubt it will. Note also that processors still continued to improve even without Dennard Scaling due to denser, better optimized onboard caches, better branch prediction, and more parallelism (including at the instruction level), and the broader trend towards SoCs and away from PCB-based systems, among other things. So at least by analogy, it's not impossible that even with that conjectured roadblock, Big AI could still find room for improvement, just at a much slower rate.
But current LLMs are thoroughly compelling, and even just continued incremental improvements will prove massively disruptive to society.
I'm firmly in the anti/unimpressed camp so far - but of course open to see where it goes.
I mean this compiler is the equivalent of handing someone a calculator when it was first invented and seeing that it took 2 hours to multiply two numbers together, I would go "cool that you have a machine that can do math, but I can multiply faster by hand, so it's a useless device to me".
I mean - who would honestly expect an LLM to be able to compete with a compiler with 40 years of development behind it? Even more if you count the collective man years expended in that time. The Claude agents took two weeks to produce a substandard compiler, under the fairly tight direction of a human who understood the problem space.
At the same time - you could direct Claude to review the register spilling code and the linker code of both LLVM/gcc for potential improvements to CCC and you will see improvements. You can ask it not to copy GPL code verbatim but to paraphrase and tell it it can rip code from LLVM as long as the licenses are preserved. It will do it.
You might only see marginal improvements without spending another $100K on API calls. This is about one of the hardest projects you could ask it to bite off and chew on. And would you trust the compiler output yet over GCC or LLVM?
Of course not.
But I wager, that if you _started_ with the LLVM/gcc codebases and asked it to look for improvements - it might be surprising to see what it finds.
Both sides have good arguments. But this could be a totally different ball game in 2, 5 and 10 years. I do feel like those who are most terrified by it are those whose identity is very much tied to being a programmer, and seeing the potential for their role to be replaced and I can understand that.
Me personally - I'm relieved I finally have someone else to blame and shout at rather than myself for the bugs in the software I produce. I'm relieved that I can focus now on the more creative direction and design of my personal projects (and even some work projects on the non-critical paths) and not get bogged down in my own perfectionism with respect to every little component until reaching exhaustion and giving up.
And I'm fascinated by the creativity of some of the projects I see that are taking the same mindset and approach.
I was depressed by it at first. But as I've experimented more and more, I've come to enjoy seeing things that I couldn't ever have achieved even with 100 man years of my own come to fruition.
In my experience, it is often the other way around. Enthusiasts are tasked with trying to open minds that seem very closed on the subject. Most serious users of these tools recognize the shortcomings and also can make well-educated guesses on the short term future. It's the anti crowd who get hellbent on this ridiculously unfounded "robots are just parrots and can't ever replace real programmers" shtick.
Maybe if AI evangelists would stop lying about what AI can do then people would hate it less.
But lying and hype is baked into the DNA of AI booster culture. At this point it can be safely assumed anything short of right-here-right-now proof is pure unfettered horseshit when coming from anyone and everyone promoting the value of AI.
You're right! Sometimes even the right-here-right-now claims of AI capabilities are horseshit too with people in actuality remotely controlling the product.
It's not common for present-capabilities to be lied about too. But it does happen!
And the smallest presence are the users who don't work in the AI industry but rave about AI. I know of...two....people who fit that bill. A lead developer at a cybersecurity firm and someone who works heavily in statistics and data analytics. Both of which are very senior people in their fields who can articulate exactly what they're looking for without much left to interpretation.
Something that bothers me here is that Anthropic claimed in their blog post that the Linux kernel could boot on x86 - is this not actually true then? They just made that part up?
It seemed pretty unambiguous to me from the blog post that they were saying the kernel could boot on all three arch's, but clearly that's not true unless they did some serious hand-waving with kernel config options. Looking closer in the repo they only show a claimed Linux boot for RISC-V, so...
It's really cool to see how slow unoptimised C is. You get so used to seeing C easily beat any other language in performance that you assume it's really just intrinsic to the language. The benchmark shows a SQLite3 unoptimised build 12x slower for CCC, 20x for optimised build. That's enormous!
I'm not dissing CCC here, rather I'm impressed with how much speed is squeezed out by GCC out of what is assumed to be already an intrinsically fast language.
The speed of C is still largely intrinsic to the language.
The primatives are directly related to the actual silicon. A function call is actually going to turn into a call instruction (or get inlined). The order of bytes in your struct are how they exist in memory, etc. A pointer being dereferenced is a load/store.
The converse holds as well. Interpreted languages are slow because this association with the hardware isn't the case.
When you have a poopy compiler that does lots of register shuffling then you loose this association.
Specifically the constant spilling with those specific functions functions that were the 1000x slowdown, makes the C code look a lot more like Python code (where every variable is several dereference away).
Right - maybe we're saying the same thing. C is naturally amenable to being blazing fast, but if you compile it without trying to be efficient (not trying to be inefficient, just do the simplest, naive thing) it's still slow - by 1-1.5 order of magnitude.
I mean you can always make things slower. There are lots of non-optimizing or low optimizing compilers that are _MUCH_ faster than this. TCC is probably the most famous example, but hardly the only alternative C compiler with performance somewhere between -O1 and -O2 in GCC. By comparison as I understand it, CCC has performance worse than -O0 which is honestly a bit surprising to me, since -O0 should not be a hard to achieve target. As I understand it, at -O0 C is basically just macro expanding into assembly with a bit of order of operations thrown in. I don't believe it even does register allocation.
> Where CCC Succeeds
Correctness: Compiled every C file in the kernel (0 errors)
I don't think that follows. It's entirely possible that the compiler produced garbage assembly for a bunch of the kernel code that would make it totally not work even if it did link. (The SQLite code passing its self tests doesn't convince me otherwise, because the Linux kernel uses way more advanced/low-level/uncommon features than SQLite does.)
I agree. Lack of errors is not an indicator of correct compilation. Piping something to /dev/null won't provide any errors either & so there is nothing we can conclude from it. The fact that it compiles SQLite correctly does provide some evidence that their compiler at least implements enough of the C semantics involved in SQLite.
Yeah I saw a post on LinkedIn (can't find it again sorry) where they found that CCC compiles C by mostly just ignoring errors. `const` is a nop. It doesn't care if you redefine variables with different types, use a string where an `int` is expected, etc.
Whenever I've done optimisation (e.g. genetic algorithms / simulated annealing) before you always have to be super careful about your objective function because the optimisation will always come up with some sneaky lazy way to satisfy it that you didn't think of. I guess this is similar - their objective was to compile valid C code and pass some tests. They totally forgot about not compiling invalid code.
"Ironically, among the four stages, the compiler (translation to assembly) is the most approachable one for an AI to build. It is mostly about pattern matching and rule application: take C constructs and map them to assembly patterns.
The assembler is harder than it looks. It needs to know the exact binary encoding of every instruction for the target architecture. x86-64 alone has thousands of instruction variants with complex encoding rules (REX prefixes, ModR/M bytes, SIB bytes, displacement sizes). Getting even one bit wrong means the CPU will do something completely unexpected.
The linker is arguably the hardest. It has to handle relocations, symbol resolution across multiple object files, different section types, position-independent code, thread-local storage, dynamic linking and format-specific details of ELF binaries. The Linux kernel linker script alone is hundreds of lines of layout directives that the linker must get exactly right."
I worked on compilers, assemblers and linkers and this is almost exactly backwards
Exactly this. Linker is threading given blocks together with fixups for position-independent code - this can be called rule application. Assembler is pattern matching.
This explanation confused me too:
Each individual iteration: around 4x slower (register spilling)
Cache pressure: around 2-3x additional penalty (instructions do not fit in L1/L2 cache)
Combined over a billion iterations: 158,000x total slowdown
If each iteration is X percent slower, then a billion iterations will also be X percent slower. I wonder what is actually going on.
Claude one-shot a basic x86 assembler + linker for me. Missing lots of instructions, yes, but that is a matter of filling in tables of data mechanically.
Supporting linker scripts is marginally harder, but having manually written compilers before, my experience is the exact opposite of yours.
As a neutral observation: it’s remarkable how quickly we as humans adjust expectations.
Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.
That would have been completely unbelievable! Absurd! No one would take it seriously.
An equivalent original human piece of work from an expert level programmer wouldn’t be able to do this without all the context. By that I mean all the all the shared insights, discussion and design that happened when making the compiler.
So to do this without any of that context is likely just very elaborate copy pasta.
Sure then make your prediction? It’s always easy to hand wave and dismiss other people’s predictions. But make yours: what do you think llms can do in 2 years?
You're asking me to do the thing I just said was frustrating haha. I have no idea. It's a new technology and we have nothing to draw from to make predictions. But for the sake of fun..
New code generation / modification I think we're hitting a point of diminishing returns and they're not going to improve much here
The limitation is fundamentally that they can only be as good as the detail in the specs given, or the test harnesses provided to them. Any detail left out they're going to make up, and hopefully it's what you want (often it's not!). If you make the specs detailed enough so that there's no misunderstanding possible: you've just written code, what we already do today
Code optimization I think they'll get quite a bit better. If you give them GCC it's probable they'll be able to improve upon it
> If you make the specs detailed enough so that there's no misunderstanding possible: you've just written code, what we already do today
This was my opinion for a very long time. Having build a few applications from scratch using AI, though, nowadays I think: Sometimes not everything needs to be spelled out. Like in math papers some details can be left to the ~~reader~~LLM and it'll be fine.
I mean, in many cases it doesn't really matter what exactly the code looks like, as long as it ends up doing the right thing. For a given Turing machine, the equivalence class of equivalent implementations is infinite. If a short spec written in English leads the LLM to identify the correct equivalence class, that's all we need and, in fact, a very impressive compression result.
Because of the unspecified behaviour, you're always going to need someone technical that understands the output to verify it. Tests aren't enough
I'm not even sure if this is a net productivity benefit. I think it is? Some cases it's a clear win.. but definitely not always. You're reducing time coding and now putting extra into spec writing + review + verification
> Sometimes, yeah. I don't think we're disagreeing
I would disagree. Formalism and precision have a critical role to play which is often underestimated. More so with the advent of llms. Fuzziness of natural languages is both a strength and weakness. We have adopted precise but unnatural languages (math/C/C++) for describing machine models of the physical world or of the computing world. Such precision was a real human breakthrough which is often overlooked in these debates.
Are you saying you've never had them fail at a task?
I wanted to refactor a bunch of tests in a TypeScript project the other day into a format similar to table driven tests that are common in Golang, but seemingly not so much in TypeScript. Vitest has specific syntax affordances for it, though
It utterly failed at the task. Tried many times with increasing specificity in my prompt, did one myself and used it as an example. I ended up giving up and just doing it manually
> Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.
You’re very conveniently ignoring the billions in training and that it has practically the whole internet as input.
Wasn't there a fair amount of human intervention in the AI agents? My understanding is, the author didn't just write "make me a c compiler in rust" but had to intervene at several points, even if he didn't touch the code directly.
It's really difficult for me to understand the level of cynicism in the HN comments on this topic, at all. The amount of goalpost-moving and redefining is absolutely absurd. I really get the impression that the majority of the HN comments are just people whining about sour grapes, with very little value added to the discussion.
I'd like to see someone disagree with the following:
Building a C compiler, targeting three architectures, is hard. Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard. Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.
To the specific issues with the concrete project as presented: This was the equivalent of a "weekend project", and it's amazing
So what if some gcc is needed for the 16-bit stuff? So what if a human was required to steer claude a bit? So what if the optimizing pass practically doesn't exist?
Most companies are not software companies, software is a line-item, an expensive, an unavoidable cost. The amount of code (not software engineering, or architecture, but programming) developed tends towards glue of existing libraries to accomplish business goals, which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc. No one is seriously saying that you have to use an LLM to build your high-performance math library, or that you have to use an LLM to build anything, much in the same way that no one is seriously saying that you have to rewrite the world in rust, or typescript, or react, or whatever is bothering you at the moment.
I'm reminded of a classic slashdot comment--about attempting to solve a non-technical problem with technology, which is doomed to fail--it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology.
> This was the equivalent of a "weekend project", and it's amazing
I mean, $20k in tokens, plus the supervision by the author to keep things running, plus the number of people that got involved according to the article... doesn't look like "a weekend project".
> Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard.
Is it correctly compiling it? Several people have pointed out that the compiler will not emit errors for clearly invalid code. What code is it actually generating?
> Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.
> which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc.
That code might be less complex for us, but more complex for an LLM if it has to deal with lots of domain-specific context and without a test suite that has been developed for 40 years.
Also, if the end result of the LLM has the same problem that Anthropic concedes here, which is that the project is so fragile that bug fixes or improvements are really hard/almost impossible, that still matters.
> it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology
It's a discussion about what the LLMs can actually do and how people represent those achievements. We're point out that LLMs, without human supervision, generate bad code, code that's hard to change, with modifications specifically made to address failing tests without challenging the underlying assumptions, code that's inconsistent and hard to understand even for the LLMs.
But some people are taking whatever the LLM outputs at face value, and then claiming some capabilities of the models that are not really there. They're still not viable for using without human supervision, and because the AI labs are focusing on synthetic benchmarks, they're creating models that are better at pushing through crappy code to achieve a goal.
The 158,000x slowdown on SQLite is the number that matters here, not whether it can parse C correctly. Parsing is the solved problem — every CS undergrad writes a recursive descent parser. The interesting (and hard) parts of a compiler are register allocation, instruction selection, and optimization passes, and those are exactly where this falls apart.
That said, I think the framing of "CCC vs GCC" is wrong. GCC has had thousands of engineer-years poured into it. The actually impressive thing is that an LLM produced a compiler at all that handles enough of C to compile non-trivial programs. Even a terrible one. Five years ago that would've been unthinkable.
The goalpost everyone should be watching isn't "can it match GCC" — it's whether the next iteration closes that 158,000x gap to, say, 100x. If it does, that tells you something real about the trajectory.
The part of the article about the 158,000x slowdown doesn't really make sense to me.
It says that a nested query does a large number of iterations through the SQLite bytecode evaluator. And it claims that each iteration is 4x slower, with an additional 2-3x penalty from "cache pressure". (There seems to be no explanation of where those numbers came from. Given that the blog post is largely AI-generated, I don't know whether I can trust them not to be hallucinated.)
But making each iteration 12x slower should only make the whole program 12x slower, not 158,000x slower.
Such a huge slowdown strongly suggests that CCC's generated code is doing something asymptotically slower than GCC's generated code, which in turn suggests a miscompilation.
I notice that the test script doesn't seem to perform any kind of correctness testing on the compiled code, other than not crashing. I would find this much more interesting if it tried to run SQLite's extensive test suite.
It wasn't given gcc source code, and was not given internet access. It the extent it could translate gcc source code, it'd need to be able to recall all of the gcc source from its weights.
All of this work is extraordinarily impressive. It is hard to predict the impact of any single research project the week it is released. I doubt we'll ever throw away GCC/LLVM. But, I'd be surprised if the Claude C Compiler didn't have long-term impact on computing down the road.
I occasionally - when I have tokens to spare, a MAX subscription only lasts so far - have Claude working on my Ruby compiler. Far harder language to AOT compile (or even parse correctly). And even 6 months ago it was astounding how well it'd work, even without what I now know about good harnesses...
I think that is the biggest outcome of this: The notes on the orchestration and validation setup they used were far more interesting than the compiler itself. That orchestration setup is already somewhat quaint, but it's still far more advanced than what most AI users use.
> Combined over a billion iterations: 158,000x total slowdown
I don't think that's a valid explanation. If something takes 8x as long then if you do it a billion times it still takes 8x as long. Just now instead of 1 vs 8 it's 1 billion vs 8 billion.
I'd be curious to know what's actually going on here to cause a multiple order of magnitude degradation compared to the simpler test cases (ie ~10x becomes ~150,000x). Rather than I-cache misses I wonder if register spilling in the nested loop managed to completely overwhelm L3 causing it to stall on every iteration waiting for RAM. But even that theory seems like it could only account for approximately 1 order of magnitude, leaving an additional 3 (!!!) orders of magnitude unaccounted for.
Building a C compiler is definitely hard for humans, but I don’t think it’s particularly strong evidence of "intelligence" from an LLM. It’s a very well understood, heavily documented problem with lots of existing implementations and explanations in the training data.
These kinds of tasks are relatively easy for LLMs, they’re operating in a solved design space and recombining known patterns. It looks impressive to us because writing a compiler from scratch is difficult and time consuming for a human, not because of the problem itself.
That doesn’t mean LLMs aren’t useful, even if progress plateaued tomorrow, they’d still be very valuable tools. But building yet another C compiler or browser isn’t that compelling as a benchmark. The industry keeps making claims about reasoning and general intelligence, but I’d expect to see systems producing genuinely new approaches or clearly better solutions, not just derivations of existing OSS.
Instead of copying a big project, I'd be more impressed if they could innovate in a small one.
1. In the real world, for a similar task, there are little reasons for: A) not giving the compiler access to all the papers about optimizations, ISAs PDFs, MIT-licensed compilers of all the kinds. It will perform much better, and this is a proof that the "uncompressing GCC" is just a claim (but even more point 2).
2. Of all the tasks, the assembler is the part where memorization would help the most. Instead the LLM can't perform without the ISA documentation that it saw repreated infinite number of times during pre-training. Guess what?
3. Rust is a bad language for the test, as a first target, if you want an LLM-coded Rust C compiler, and you have LLM experience, you would go -> C compiler -> Rust port. Rust is hard when there are mutable data structures with tons of references around, and a C compiler is exactly that. To compose complexity from different layers is an LLM anti pattern that who worked a lot with automatic programming knows very well.
4. In the real world, you don't do a task like that without steering. And steering will do wonders. Not to say that the experiment was ill conceived. The fact is that the experimenter was trying to show a different point of what the Internet got (as usually).
> the experimenter was trying to show a different point of what the Internet got (as usually)
All of your points are important, but I think this is the most important one.
Having written compilers, $20k in tokens to get to a foundation for a new compiler with the feature set of this one is a bargain. Now, the $20k excludes the time of to set up the harness, so the total cost would be significantly higher, but still.
The big point here is that the researchers in question demonstrated that a complex task such as this could be achived shockingly cheaply, even when the agents were intentionally forced to work under unrealistically harsh conditions, with instructions to include features (e.g. SSA form) that significantly complicated the task but made the problem closer to producing the foundation for a "proper" compiler rather than a toy compiler, even if the outcome isn't a finished production-ready multi-arch C-compiler.
I think one of the issue is that the register allocation algorithm -- alongside the SSA generation -- is not enough.
Generally after the SSA pass, you convert all of them into register transfer language (RTL) and then do register allocation pass. And for GCC's case it is even more extreme -- You have GIMPLE in the middle that does more aggressive optimization, similar to rustc's MIR. CCC doesn't have all that, and for register allocation you can try to do simple linear scan just as the usual JIT compiler would do though (and from my understanding, something CCC should do at a simple cost), but most of the "hard part" of compiler today is actually optimization -- frontend is mostly a solved problem if you accept some hacks, unlike me who is still looking for an elegant academic solution to the typedef problem.
Note that the LLVM approach to IR is probably a bit more sane than the GCC one. GCC has ~3 completely different IRs at different stages in the pipeline, while LLVM mostly has only canonical IR form for passing data around through the optimization passes (and individual passes will sometimes make their own temporary IR locally to make a specific analysis easier).
If stevefan1999's referring to a nasty frontend issue, it might be due to the fact that a name introduced by a typedef and an identical identifier can mingle in the same scope, which makes parsing pretty nasty – e.g. (example from source at end):
typedef int AA;
void foo()
{
AA AA; /\* OK - define variable AA of type AA */
int BB = AA * 2; /\* OK - AA is just a variable name here \*/
}
void bar()
{
int aa = sizeof(AA), AA, bb = sizeof(AA);
}
In your example bar is actually trivial, since both the type AA and the variable AA are ints both aa and bb ends up as 4 no matter how you parse it. AA has to be typedef'd to something other than int.
Lexical parsing C is simple, except that typedef's technically make it non-context-free. See https://en.wikipedia.org/wiki/Lexer_hack When handwriting a parser, it's no big deal, but it's often a stumbling block for parser generators or other formal approaches. Though, I recall there's a PEG-based parser for C99/C11 floating around that was supposed to be compliant. But I'm having trouble finding a link, and maybe it was using something like LPeg, which has features beyond pure PEG that help with context dependent parsing.
Clang's solution (presented at the end of the Wikipedia article you linked) seem much better - just use a single lexical token for both types and variables.
Then, only the parser needs to be context sensitive, for the A* B; construct which is either a no-op multiplication (if A is a variable) or a variable declaration of a pointer type (if A is a type)
Well, as you see this is inherently taking the spirit of GLL/GLR parser -- defer parse until we have all the information. The academic solution to this is not to do it on token level but introduce a parse tree that is "forkable", meaning a new persistent data structure is needed to "compress" the tree when we have different routes, and that thing is called: graph structured stack (https://en.wikipedia.org/wiki/Graph-structured_stack)
What I had specifically in mind definitely wasn't using OCaml or Menhir, but that's a very useful resource, as is the associated paper, "A simple, possibly correct LR parser for C11", https://jhjourdan.mketjh.fr/pdf/jourdan2017simple.pdf
This is closer to what I remember, but I'm not convinced it's what I had in mind, either: https://github.com/edubart/lpegrex/blob/main/parsers/c11.lua It uses LPeg's match-time capture feature (not a pure PEG construct) to dynamically memorize typedef's and condition subsequent matches. In fact, it's effectively identical to what C11Parser is doing, down to the two dynamically invoked helper functions: declare_typedefname/is_typedefname vs set_typedef/is_typedef. C11Parser and the paper are older, so maybe the lpegrex parser is derivative. (And probably what I had in mind, if not lpegrex, was derivative, too.)
Can someone explain to me, what’s the big deal about this?
The AI model was trained on lots of code and spit out sonething similar than gcc. Why is this revolutionary?
It's a marketing gimmick. Cursor did the same recently when they claimed to have created a working browsers but it was basically just a bunch of open source software glued together into something barely functional for a PR stunt.
These tools do not compete against the lonely programmer that writes everything from scratch they compete with the existing tooling. 5 years ago compiler generators already exist, as they did in the previous decades. That is a solved problem. People still like the handroll their parsers, not because generating wouldn't work, but because it has other benefits (maintainability, adaption, better diagnostics). Perfectly fine working code is routinely thrown away and reimplemented, because there are not enough people around anymore who know the code by heart. "The big Rewrite" is a meme for a reason.
A computer generating a compiler is nothing new. Unzip has done this many many times. The key difference is that unzip extracts data from an archive in a deterministic way, while LLMs recover data from the training dataset using a lossy statistical model. Aid that with a feedback loop and a rich test suite, and you get exactly what Anthropic has achieved.
While I agree that the technology behind this is impressive, the biggest issue is license infringement. Everyone knows there's GPL code in the training data, yet there's no trace of acknowledgment of the original authors.
Its already bad enough people are using non-GPL compilers like LLVM (that make malicious behavior like proprietary incompatible forks possible), so yet another compiler not-under GPL, that even AI-washes GPL code, is not a good thing.
That’s not true. It didn’t have access to the internet and no LLM has the fidelity to reproduce code verbatim from its training data at the project level.
In this case, it’s true that compilers were in its training data but only helped at the conceptual level and not spitting verbatim gcc code.
How do I know that? The code is not similar to GCC at any level except conceptual. If you can point out the similarity at any level I might agree with you.
> I have a feeling, you didn't look at the code at all.
And you originally asked how someone knew that they weren't just spitting out gcc. So you reject their statement that it's not like gcc at all with your "you didn't look at the code at all". When its clear that you haven't looked at it.
yeah its pretty amazing it can do this. The problem is the gaslighting by the companies making this - "see we can create compilers, we won't need programmers", programmers - "this is crap, are you insane?", classic gas lighting.
It’s giving you an idea of what Claude is capable of - creating a project at the complexity of a small compiler. I don’t know if it can replace programmers but can definitely handle tasks of smaller complexity autonomously.
I regularly has it produce 10k+ lines of code that is working and passing extensive test suites. If you give it a prompt and no agent loop and test harness, then sure, you'll need to waste your time babysitting it.
My 2 cents: just like Cursor's browser, it seems the AI attempted a really ambitious technical design, generally matching the bells and whistles of a true industrial strength compiler, with SSA optimization passes etc.
However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.
If a human expert were to write a compiler not necessarily designed to match GCC, but provide a really good balance of features to complexity, they'd be able to make something much simpler. There are some projects like this (QBE,MIR), which come with nice technical descriptions.
Likewise there was a post about a browser made by a single dude + AI, which was like 20k lines, and worked about as well as Cursor's claimed. It had like 10% of the features, but everything there worked reasonably well.
So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.
> My 2 cents: just like Cursor's browser, it seems the AI attempted a really ambitious technical design, generally matching the bells and whistles of a true industrial strength compiler, with SSA optimization passes etc.
Per the article from the person who directed this, the user directed the AI to use SSA form.
> However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.
That is quite possibly true, but presumably at least in part reflects the fact that it has been measured on completeness, not performance, and so that is where the compiler has spent time. That doesn't mean it'd necessarily be successful at adding optimisation passes, but we don't really know. I've done some experiments with this (a Ruby ahead-of-time compiler) and while Claude can do reasonably well with assembler now, it's by no means where it's strongest (it is, however, far better at operating gdb than I am...), but it can certainly do some of it.
> So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.
Yes, it absolutely is, but the point in both cases was to test the limits of what AI can do on their own, and you won't learn anything about that if you let a human intervene.
$20k in tokens to get to a surprisingly working compiler from agents working on their own is at a point where it is hard to assess how much money and time you'd save once considering the cleanup job you'd probably want to do on it before "taking delivery", but had you offered me $20k to write a working C-compiler with multiple backends that needed to be capable of compiling Linux, I'd have laughed at the funny joke.
But more importantly, even if you were prepared to pay me enough, delivering it as fast if writing it by hand would be a different matter. Now, if you factor in the time used to set up the harness, the calculation might be different.
But now that we know models can do this, efforts to make the harnesses easier to set up (for my personal projects, I'm experimenting with agents to automatically figure out suitable harnesses), and to make cleanup passes to review, simplify, and document, could well end up making projects like this far more viable very quickly (at the cost of more tokens, certainly, but even if you double that budget, this would be a bargain for many tasks).
I don't think we're anywhere near taking humans out of the loop for many things, but I do see us gradually moving up the abstraction levels, and caring less about the code at least at early stages and more about the harnesses, including acceptance tests and other quality gates.
You misunderstand me - first, almost all modern compilers (that I know of) use SSA, so that's not much of a thing you need to point out. The point I was making, is that by looking at the assembler, it seems the generated code is totally unoptimized, even though it was mentioned that Claude implemented SSA opt passes.
The generated code's quality is more inline with 'undergrad course compiler backend', that is, basically doing as little work on the backend as possible, and always doing all the work conservatively.
Basic SSA optimizations such as constant propagation, copy propagation or common subexpression propagation are clearly missing from the assembly, the register allocator is also pretty bad, even though there are simple algorithms for that sort of thing that perform decently.
So even though the generated code works, I feel like something's gone majorly wrong inside the compiler.
The 300k LoC things isnt encouraging either, its way too much for what the code actually does.
I just want to point out, that I think a competent-ish dev (me?) could build something like this (a reasonably accurate C compiler), by a more human-in-the-loop workflow. The result would be much more reasonable code and design, much shorter, and the codebase wouldn't be full of surprises like it is now, and would conform to sane engineering practices.
Honestly I would certainly prefer to do things like this as opposed to having AI build it, then clean it up manually.
And it would be possible without these fancy agent orchestration frameworks and spending tens of thousands of dollars on API.
This is basically what went down with Cursor's agentic browser, vs an implementation that was recreated by just one guy in a week, with AI dev tools and a premium subscription.
There's no doubt that this is impressive, but I wouldn't say that agentic sofware engineering is here just yet.
CCC was and is a marketing stunt for a new model launch. Impressive, but still suffers from the same 80:20 rule. These 20% are optimizations, and we all know where the devel in “let me write my own language”.
Vibe coding is entertainment. Nothing wrong about entertainment, but when totally clueless people connect to their bank account, or control their devices with vibe coded programs, someone will be entertained for sure.
Large language models and small language models are very strong for solving problems, when the problem is narrow enough.
They are above human average for solving almost any narrow problem, independent of time, but when time is a factor, let's say less than a minute, they are better than experts.
An OS kernel is exactly a problem, that everyone prefers to be solved as correct as possible, even if arriving at the solution takes longer.
The author mentions stability and correctness of CCC, these are properties of Rust and not of vibe coding. Still impressive feat of claude code though.
Ironically, if they populated the repo first with objects, functions and methods with just todo! bodies, be sure the architecture compiles and it is sane, and only then let the agent fill the bodies with implementations most features would work correctly.
I am writing a program to do exactly that for Rust, but even then, how the user/programmer would know beforehand how many architectural details to specify using todo!, to be sure that the problem the agent tries to solve is narrow enough? That's impossible to know! If the problem is not narrow enough, then the implementation is gonna be a mess.
The prospect of going the last mile to fix the remaining problems reminds me of the old joke:
"The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."
Yeah, this is why I dont get the argument that LLMs are good for bootstrapping. Especially anything serious.
Sure these things can technically frontload a lot of work at the beginning of a project, but I would argue the design choices made at the beginning of a project set the tone for the entire project, and its best those be made with intention, not stochastic text extruders.
Lets be real these things are shortcut machines that appeal to people's laziness, and as with most shortcuts in life, they come with consequences.
Have fun with your "Think for me SaaS" im not going to let my brain atrophy to the point where my competency is 1:1 correlated to the quantity and quality or tokens I have access too.
Nice article. I believe the Claude C Compiler is an extraordinary research result.
The article is clear about its limitations. The code README opens by saying “don’t use this” which no research paper I know is honest enough to say.
As for hype, it’s less hyped than most university press releases. Of course since it’s Anthropic, it gets more attention than university press.
I think the people most excited are getting ahead of themselves. People who aren’t impressed should remember that there is no C compiler written in Rust for it to have memorized. But, this is going to open up a bunch of new and weird research directions like this blog post is beginning to do.
This is a conjecture: modern chips are optimized to make the output code style of GCC/Clang go fast. So, the compilers optimize for the chip, and the chip optimizes for the popular compilers.
This compiler experiment mirrors the recent work of Terence Tao and Google. The "recipe" is an LLM paired with an external evaluator (GCC) in a feedback loop.
By evaluating the objective (successful compilation) in a loop, the LLM effectively narrows the problem space. This is why the code compiles even when the broader logic remains unfinished/incorrect.
It’s a good example of how LLMs navigate complex, non-linear spaces by extracting optimal patterns from their training data. It’s amazing.
p.s. if you translate all this to marketing jargon, it’ll become “our LLM wrote a compiler by itself with a clean room setup”.
I don't understand how this isn't a bigger deal. Why are people are quibbling about how it isn't a particularly good C compiler. It seems earth shattering that an AI can write a C compiler in the first place.
Am I just old? "How did they fit those people into the television?!"
Seeing that Claude can code a compiler doesn't help anyone if it's not coded efficiently, because getting it to be efficient is the hardest part, and it will be interesting seeing how long it will take to make it efficient. No one is gonna use some compiler that makes binaries run 700x longer.
I'm surprised that this wasn't possible before with just a bigger context size.
> Someone got it working on Compiler Explorer and remarked that the assembly output “reminds me of the quality of an undergraduate’s compiler assignment”. Which, to be fair, is both harsh and not entirely wrong when you look at the register spilling patterns.
This is what I've noticed about most LLM generated code, its about the quality of an undergrad, and I think there's a good reason for this - most of the code its been trained on is of undergrad quality. Stack overflow questions, a lot of undergrad open source projects, there are some professional quality open source projects (eg SqlLite) but they are outweighed by the mass of other code. Also things like Sqllite don't compare to things like Oracle or Sql Server which are proprietary.
They should have gone one step further and also optimized for query performance (without editing the source code).
I have cough AI generated an x86 to x86 compiler (takes x86 in, replaces arbitrary instructions with functions and spits x86 out), at first it was horrible, but letting it work for 2 more days it was actually close to only 50% to 60% slowdown when every memory read instruction was replaced.
Now that's when people should get scared. But it's also reasonable to assume that CCC will look closer to GCC at that point, maybe influenced by other compilers as well. Tell it to write an arm compiler and it will never succeed (probably, maybe can use an intermeriadry and shove it into LLVM and it'll work, but at that point it is no longer a "C" compiler).
One missing analysis, that IMHO is the most important right now, is : what is the quality of the generated code ?
Having LLM generates a first complete iteration of a C compiler in rust is super useful if the code is of good enough quality that it can be maintained and improved by humans (or other AIs). It is (almost) completely useless otherwise.
And that is the case for most of today's code generated by AIs. Most of it will still have to be maintained by humans, or at least a human will ultimately be responsible for it.
What i would like to see is whether that C compiler is a horrible mess of tangled spaghetti code with horrible naming. Or something with a clear structure, good naming, and sensible comments.
> with a clear structure, good naming, and sensible comments.
Additionally there is the additional problem, that LLM comments often represent what the code would be supposed to do, not what it actually does. People write comments to point out what was weird during implementation and what they found out during testing the implementation. LLM comments seems more to reflect the information present before writing the implementation, i.e. the use it as an internal check list what to generate.
In my opinion deceiving comments are worse than no comments at all.
I curious, maybe AI learn too much code from human writed compilers.
What if invent a fresh new language, and let AI write the compiler, if the compiler works well I think that is the true intelligent.
I think AI will definitely help to get new compilers going. Maybe not the full product, yet. But it helps a lot to create all the working parts you need to get going. Taking lengthy specs and translating them into code is something AI does quite well - I asked it to give me a disassembler - and it did well. So, if you want to make a new compiler, you now don't have to read all the specs and details beforehand. Just let the AI mess with e.g. PE-Headers and only take care later if something in that area doesn't work.
Great article but you have to keep in mind that it was pure marketing, the real interesting question is to pass the same benchmark to CC an ask it to optimize in a loop, and see how long it takes for it to come up with something decent.
That’s the whole promise to reach AGI that it will be able to improve itself.
I think Anthropic ruined this by releasing it too early would have been way more fun to have seen a live website where you can see it iterating and the progress is making.
> CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI.
It would be interesting to compare the source code used by CCC to other projects. I have a slight suspicion that CCC stole a lot of code from other projects.
You, know, it sure does add some additional perspective to the original Anthropic marketing materia... ahem, I mean article, to learn that the CCC compiled runtime for SQLite could potentially run up to 158,000 times slower than a GCC compiled one...
Nevertheless, the victories continue to be closer to home.
It seems like if Anthropic released a super cool and useful _free_ utility (like a compiler, for example) that was better than existing counterparts or solved a problem that hadn’t been solved before[0] and just casually said “Here is this awesome thing that you should use every day. By the way our language model made this.” it would be incredible advertising for them.
But they instead made a blog post about how it would cost you twenty thousand dollars to recreate a piece of software that they do not, with a straight face, actually recommend that you use in any capacity beyond as a toy.
[0] I am categorically not talking about anything AI related or anything that is directly a part of their sales funnel. I am talking about a piece of software that just efficiently does something useful. GCC is an example, Everything by voidtools is an example, Wireshark is an example, etc. Claude is not an example.
They made a blog post about it because it's an amazing test of the abilities of the models to deliver a working C-compiler, even with lots of bugs and serious caveats, for $20k of tokens, without a human babysitting it.
I'd challenge anyone who are negative to this to try to achieve what they did by hand, with the same restrictions (e.g. generating full SSA form instead of just directly emitting code, capable of compiling Linux), and log their time doing it.
Having written several compilers, I'll say with some confidence that not many developers would succeed. Far fewer would succeed fast enough to compete with $20k cost. Even fewer would do that and deliver decent quality code.
Now notice the part where they've done this experiment before. This is the first time it succeeded. Give it another model iteration or two, and expect quality to increase, and price to drop.
>Every agent would hit the same bug, fix that bug, and then overwrite each other's changes. Having 16 agents running didn't help because each was stuck solving the same task.
>The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC
The blog post used the word autonomous a lot, which I suppose is true if Nicholas Carlini is not a human being but in fact a Claude agent.
>I'd challenge anyone who are negative to this to try to achieve what they did by hand, with the same restrictions (e.g. generating full SSA form instead of just directly emitting code, capable of compiling Linux), and log their time doing it.
Why would anyone do that? My point was that why does the company _not_ make a useful tool? I feel like that is a much more interesting topic of discussion than “why aren’t people that aren’t impressed by this spending their time trying to make this company look good?”
>This is the new floor.
Aside from the notion that they maybe intentionally set out to create the least useful or valuable output from their tooling (eg ‘the floor’) when they did not say that they did that, my question was “Why do they not make something genuinely useful?”. Marketing speak and imaginary engineers failing at made up challenges does not answer that question.
> The blog post used the word autonomous a lot, which I suppose is true if Nicholas Carlini is not a human being but in fact a Claude agent.
Nothing in the article suggests it did not autonomously do the work.
> Why would anyone do that?
Because a lot of naysayers here pretend as if this is somehow trivial.
> My point was that why does the company _not_ make a useful tool?
Useful to whom? This is a researcher testing the limits of the models. Knowing those limits is highly useful to Anthropic. And it's highly useful to lots of others too, like me, as a means of understanding the capabilities of these models.
What, exactly would such a tool that'd somehow make the people dismissing this change their minds look like? Because I don't think anything would. They could produce lots of useful tools, if they aimed lower than testing the limits of the model. But it would not achieve what they set out to do, and it would not tell us anything useful.
I produce "useful tools" with Claude every day. That's not interesting. Anyone who actually uses these tools properly will develop a good understanding of the many things that can be achieved with them.
Most of us can't spend $20k figuring out where the limits are, however.
> I feel like that is a much more interesting topic of discussion than “why aren’t people that aren’t impressed by this spending their time trying to make this company look good?”
This is a ridiculous misrepresentation of the point. The point is that the people who aren't impressed by this very clearly and obviously do not have an understanding of the complexity of what they achieved, and are making ignorant statements about it.
> Aside from the notion that they maybe intentionally set out to create the least useful or valuable output from their tooling (eg ‘the floor’)
Again, you're either entirely failing to understand, or wilfully misrepresenting what I said. No, their goal was not to "set out the create the least useful or valuable output". Their goal was to test the limits of what the model can achieve. They did that.
That has far higher value than not testing the limits. Lots, and lots of people are building tools with Claude without testing the limits. We would not learn anything from that.
> my question was “Why do they not make something genuinely useful?”
Because that wasn't the purpose. The purpose was to test the limits of what the model can achieve. That you struggle to understand why what they achieved was massively impressive, does not change that.
> Nothing in the article suggests it did not autonomously do the work.
I don’t know how to respond to that other than to ask you to quote the part of the blog post where the author described the language model running into a problem that it could not fix and then described the details of how he manually intervened to fix the problem that the language model could not fix when you elaborate on your definition of “nothing” in that sentence.
>Every agent would hit the same bug, fix that bug, and then overwrite each other's changes. Having 16 agents running didn't help because each was stuck solving the same task.
>The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC
As for:
> Because a lot of naysayers here pretend as if this is somehow trivial.
This is an answer to “why do you want someone to do that?” You have already established that you would like that to happen. It doesn’t answer “why would a real human being (who is not you) that isn’t impressed by the compiler that doesn’t work put their time into making Anthropic look good?”
As for this, that’s a good question but I would say the bare minimum would be “useful”
> What, exactly would such a tool that'd somehow make the people dismissing this change their minds look like?
It is pretty common for tech companies to release free useful software. For example pytorch, react, Hack/hhvm etc. from Meta
Or chromium from Google. Chromium is a good example, there’s a decent chance that you’re using a chromium based browser to read this. There’s also a ton of other stuff, golang comes to mind as another example.
Or if you want stuff made by a company that’s a fraction the valuation of Anthropic, there’s Campfire and Writebook by 37signals. https://once.com/
> Because that wasn't the purpose.
I know that. That was the premise of my question.
I saw that they put a bunch of resources into making something that is not useful and asked why they did not put a bunch of resources into that was useful. Surely they could make something that is both useful and made their model look good?
For me it seems like the obvious answer would be either that they can’t make something useful:
> Their goal was to test the limits of what the model can achieve. They did that.
Or they don’t want to
> Because that wasn't the purpose.
I was asking if anyone had any substantive knowledge or informed opinion about whether it was one or the other but it seems like you’re saying it’s… both? They don’t want to make and release a useful tool and also they can not make and release a useful tool because this compiler, which is not useful, is the limit of what their model can achieve.
Like you want us all to know that they cannot and do not want to make any sort of useful tool. That is your clearly-stated opinion about their desires and capabilities. And also you want these “naysayers”, who are not you, to put their time and effort into… also not making something useful? To prove… what?
> I wonder how well an LLM would do for a new CPU architecture for which no C compiler exists yet, just assembler.
Quite well, possibly.
Look, I wasn't even aware of this until it popped up a few days ago on HN, I am not privy to the details of Anthropics engineers in general, or the specific engineer who curated this marathon multi-agent dev cycle, but I can tell you how anyone familiar with compilers or programming language development will proceed:
1. Vibe an IL (intermediate language) specification into existence (even if it is only held in RAM as structures/objects)
2. Vibe some utility functions for the IL (dump, search, etc)
3. Vibe a set of backends, that take IL as input and emit ISA (Instruction Set Architecture), with a set of tests for each target ISA
4. Vibe a front-end that takes C language input and outputs the IL, with a set of tests for each language construct.
(Everything from #2 onwards can be done in parallel)
I have no reason to believe that the engineer who vibe-coded CCC is anything other than competent and skillful, so lets assume he did at least the above (TBH, he probably did more)[1].
This means that CCC has, in its code, everything needed to vibe a never-before-seen ISA, given the ISA spec. It also means it has everything needed to support a new front-end language as long as it is similar enough to C (i.e. language constructs can map to the IL constructs).
So, this should be pretty easy to expand on, because I find it unlikely that the engineer who supervised/curated the process would be anything less than an expert.
The only flaw in my argument is that I am assuming that effort from CC was so large because it did the src -> IL -> ISA route. If my assumption is wrong, it might be well-nigh impossible to add support for a new ISA.
------------------------------
[1] When I agreed to a previous poster on a previous thread that I can recreate the functionality of CCC for $20k, these are the steps I would have followed, except I would not have LLM-generated anything.
Now that we have seen this can be done, the next question is how much effort it takes to improve it 1%. And then the next 1%. Can we make consistent improvements without spending more and more compute on each step.
Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.
> Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I offered to do it, but without a deadline (I work f/time for money), only a cost estimation based on how many hours I think it should take me: https://news.ycombinator.com/item?id=46909310
The poster I responded to had claimed that it was not possible to produce a compiler capable of compiling a bootable Linux kernel within the $20k cost, nor for double that ($40k).
I offered to do it for $40k, but no takers. I initially offered to do it for $20k, but the poster kept evading, so I settled on asking for the amount he offered.
The time will come (and it's not far off) when LLM agents will be able to RE the program and re-implement it just by pointing to the program's directory.
We'll see how fun that will be for these big corporations.
For example: "Hey, Claude, re-implement Adobe Photoshop in Rust."
Did Anthropic release the scaffolding, harnesses, prompts, etc. they used to build their compiler? That would be an even cooler flex to be able to go and say "Here, if you still doubt, run this and build your own! And show us what else you can build using these techniques."
The level of discourse I've seen on HN about this topic is really disappointing. People not reading the actual article in detail, just jumping to conclusions "it basically copied gcc" etc etc. Taking things out of context, or worse completely misrepresenting what the author of the article was trying to communicate.
We act so superior to LLMs but I'm very unimpressed with humanity at this stage.
But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler
/s
This is actually a nice case study in why agentic LLMs do kind of think. It's by no means the same code or compiler. It had to figure out lots and lots of problems along the way to get to the point of tests passing.
> But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler /s
Why the sarcasm tag? It is almost certainly trained on several compiler codebases, plus probably dozens of small "toy" C compilers created as hobby / school projects.
It's an interesting benchmark not because the LLM did something novel, but because it evidently stayed focused and maintained consistency long enough for a project of this complexity.
I would've guess so, but I was talking about it in a "does Claude Code (not the model) have access to the internet?", which, according to Anthropic, it didn't.
I think this is a great example of both points of view in the ongoing debate.
Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro: Sure, but we can get the agent to fix that.
Anti: Can you, though? We've seen that the more complex the code base, the worse the agents do. Fixing complex issues in a compiler seems like something the agents will struggle with. Also, if they could fix it, why haven't they?
Pro: Sure, maybe now, but the next generation will fix it.
Anti: Maybe. While the last few generations have been getting better and better, we're still not seeing them deal with this kind of complexity better.
Pro: Yeah, but look at it! This is amazing! A whole compiler in just a few hours! How many millions of hours were spent getting GCC to this state? It's not fair to compare them like this!
Anti: Anthropic said they made a working compiler that could compile the Linux kernel. GCC is what we normally compile the Linux kernel with. The comparison was invited. It turned out (for whatever reason) that CCC failed to compile the Linux kernel when GCC could. Once again, the hype of AI doesn't match the reality.
Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!
Anti: this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.
I'm reminded, once again, of the recent "vibe coded" OCaml fiasco[1].
The PR author had zero understanding why their entirely LLM-generated contribution was viewed so suspiciously.
The article validates a significant point: it is one thing to have passing tests and be able to produce output that resembles correctness - however it's something entirely different for that output to be good and maintainable.
[1] https://github.com/ocaml/ocaml/pull/14369
>Here's my question: why did the files that you submitted name Mark Shinwell as the author?
>Beats me. AI decided to do so and I didn't question it.
Haha that's comedy gold, and honestly a good interview screening situation - you'd instantly pass on the candidate!
I once had a PR. I told the dev that "LLM is ok but you own the code"
He told me "I spent n days to architect the solution"
He shows me claude generated system design .. and then i say ok, I went to review the code. 1hr later i asked why did you repeat the code all over at the end. Dude replies "junk the entire PR it's AI generated"
Has anyone who's familiar with compiler source code tried to compare it to other compilers? Given that LLMs have been trained on data sets that include the source code for numerous C compilers, is this just (say) pcc extruded in Rust form?
I'm humbled by the maintainer's answer [0]. Must be great to work with people like him who have infinite patience and composure.
[0] https://github.com/ocaml/ocaml/pull/14369#issuecomment-35565...
> Must be great to work with people like him who have infinite patience and composure.
It is not just patience, he is ready to spent a shitload of time explaining basics to strangers. Such an answer would take, I believe would take a very least half an hour to compose, not counting the time you need to read all the relevant discussion to get the context. But yeah, it would be great to have more people like him around.
gasche has been active on various forums over the years, and yes can confirm that he has infinite patience.
Yes, that comment by gasche is a very good general explanation for why vibe coded slop still doesn't cut it for contributing to any non-trivial FLOSS project. When you're building towards a large feature (DWARF support in this case) it's critical for contributions to be small and self-contained so that maintainers and reviewers don't get overwhelmed. As things stand, this means that human effort is an absolute requirement.
When contributions are small and tightly human-controlled it's also less likely that potential legal concerns will arise, since it means that any genuinely creative decisions about the code are a lot easier to trace.
(In this case, the AI seems to have ripped off a lot of the work from OxCaml with inconsistent attribution. OxCaml is actually license compatible (and friendly) with Ocaml but obviously any merge of that work should happen on its own terms, not as a side effect of ripoff slop code.)
Damn... "AI has a very deep understanding of how this code works. Please challenge me on this." this person is something else. Just... wow.
nice username
This is the norm in my experience.
If you haven't come across a significant number of AI addicts as obnoxiously delusional as @Culonavirus describes, you must be getting close to retirement age.
People with any connection to new college graduates understand that this sort of idiotic LLM-backed arrogance is extremely common among low-to-mid-functioning twenty-somethings.
however it's something entirely different for that output to be good and maintainable
People aren't prompting LLMs to write good, maintainable code though. They're assuming that because we've made a collective assumption that good, maintainable code is the goal then it must also be the goal of an LLM too. That isn't true. LLMs don't care about our goals. They are solving problems in a probabilistic way based on the content of their training data, context, and prompting. Presumably if you take all the code in the world and throw it in mixer what comes out is not our Platonic ideal of the best possible code, but actually something more like a Lovecraftian horror that happens to get the right output. This is quite positive because it shows that with better prompting+context+training we might actually be able to guide an LLM to know what good and bad looks like (based on the fact that we know). The future is looking great.
However, we also need to be aware that 'good, maintainable code' is often not what we think is the ideal output of a developer. In businesses everywhere the goal is 'whatever works right now, and to hell with maintainability'. When a business is 3 months from failing spending time to write good code that you can continue to work on in 10 years feels like wasted effort. So really, for most code that's written, it doesn't actually need to be good or maintainable. It just needs to work. And if you look at the code that a lot of businesses are running, it doesn't. LLMs are a step forward in just getting stuff to work in the first place.
If we can move to 'bug free' using AI, at the unit level, then AI is useful. Above individual units of code, like logic, architecture, security, etc things still have to come from the developer because AI can't have the context of a complete application yet. When that's ready then we can tackle 'tech debt free' because almost all tech debt lives at that higher level. I don't think we'll get there for a long time.
>They are solving problems in a probabilistic way based on the content of their training data, context, and prompting.
>Presumably if you take all the code in the world and throw it in mixer what comes out is not our Platonic ideal of the best possible code, but actually something more like a Lovecraftian horror that happens to get the right output.
These statements are inaccurate since 2022 when LLMs started to have post training done.
> People aren't prompting LLMs to write good, maintainable code though.
Then they're not using the tools correctly. LLMs are capable of producing good clean code, but they need to be carefully instructed as to how.
I recently used Gemini to build my first Android app, and I have zero experience with Kotlin or most of the libraries (but I have done many years of enterprise Java in my career). When I started I first had a long discussion with the AI about how we should set up dependency injection, Material3 UI components, model-view architecture, Firebase, logging, etc and made a big Markdown file with a detailed architecture description. Then I let the agent mode implement the plan over several steps and with a lot of tweaking along the way. I've been quite happy with the result, the app works like a charm and the code is neatly structured and easy to jump into whenever I need to make changes. Finishing a project like this in a couple of dozen hours (especially being a complete newbie to the stack) simply would not have been possible 2-3 years ago.
> Then they're not using the tools correctly. LLMs are capable of producing good clean code, but they need to be carefully instructed as to how.
I'd argue that when the code is part of a press release or corporate blog post (is there even a difference?) by the company that the LLM in question comes from, e.g. Claude's C compiler, then one cannot reasonably assert they were "not using the tools correctly": even if there's some better way to use them, if even the LLM's own team don't know how to do that, the assumption should be that it is unreasonable to expect anyone else to how to do that either.
I find it interesting and useful to know that the boundary of the possible is a ~100kloc project, and that even then this scale of output comes with plenty of flaws.
Know what the AI can't do, rather than what it can. Even beyond LLMs, people don't generally (there's exceptions) get paid for manually performing tasks that have already been fully automated, people get paid for what automation can't do.
Moving target, of course. This time last year, my attempt to get an AI to write a compiler for a joke language didn't even result in the source code for the compiler itself compiling; now it not only compiles, it runs. But my new language is a joke language, no sane person would ever use it for a serious project.
When I started I first had a long discussion with the AI... and made a big Markdown file with a detailed architecture description.
Yep, that's how you get better output from AI. A lot of devs haven't learned that yet. They still see it as 'better autocomplete'.
While this technique works for new projects, it takes no more than a couple of pivots for it to completely fail.
A good AI development framework needs to support a tail of deprecated choices in the codebase.
Skills are considerable better for this than design docs.
"It's just another Markdown file, bro".
LLMs do not learn. So every new session for them will be rebuilding the world from scratch. Bloated Markdown files quickly exhaust context windows, and agents routinely ignore large parts of them.
And then you unleash them on one code base that's more than a couple of days old, and they happily duplicate code, ignore existing code paths, ignore existing conventions etc.
That's why I'm very careful about how the context is constructed. I make sure all the relevant files are loaded with the prompt, including the project file so it can see the directory structure. Also keep a brief summary of the app functionality and architecture in the AGENTS.md file. For larger tasks, always request a plan and look through it before asking it to start writing code.
Yup, we've become reverse centaurs :) https://doctorow.medium.com/https-pluralistic-net-2025-09-11...
Not trying to be rude, but in a technology you're not familiar with you might not be able to know what good code is, and even less so if it's maintainable.
Finding and fixing that subtle, hard to reproduce bug that could kill your business after 3 years.
That's a fair point, my code is likely to have some warts that an experienced Android/Kotlin dev would wince at. All I know is that the app has a structure that makes an overall sense to me, with my 15+ years of experience as a professional developer and working with many large codebases.
I think we are going to have to find out what maintenance even looks like when LLMs are involved. "Maintainable" might no longer mean quite the same thing as it used to.
But it's not going to be as easy as "just regenerate everything". There are dependencies external to a particular codebase such as long lived data and external APIs.
I also suspect that the stability of the codebase will still matter, maybe even more so than before. But the way in which we define maintainability will certainly change.
The framing is key here. Is three years a long time? Both answers are right. Just getting a business off the ground is an achievement in the first place. Lasting three years? These days, I have clothes that don't even last that long. And then three years isn't very long at all. Bridges last decades. Countries are counted by centuries. Humanity is a millennia old. If AI can make me a company that's solvent for three years? Well, you decide.
That mirrors my experience so far. The AI is fantastic for prototyping, in languages/frameworks you might be totally unfamiliar with. You can make all sorts of cool little toy projects in a few hours, with just some minimal promoting
The danger is, it doesn't quite scale up. The more complex the project, the more likely the AI is to get confused and start writing spaghetti code. It may even work for a while, but eventually the spaghetti piles up to the point that not even more spaghetti will fix it
I'll get that's going to get better over the next few years, with better tooling and better ways to get the AI to figure out/remember relevant parts of the code base, but that's just my guess
The Ai legal analysis seemed to be the nail in the coffin.
Adding Ai generated comments are IMHO some of the most rude uses of Ai.
Not sure what exactly you're referring to, but legal is a very interesting field to observe, right? I've been wondering about that since quite early in my LLM awareness:
A slightly sarcastic (or perhaps not so slightly..) mental model of legal conflict resolution is that much of it boils down to throwing lots of content at the opposing side, claiming that it shows that the represented side is right and creating a task for the opposite side to find a flaw in that material. I believe that this game of quantity fits through the whole range from "I'll have my lawyer repeat my argument in a letter featuring their letter head" all the way to paper-tsunamis like the Google-Oracle trial.
Now give both sides access to LLM... I wonder if the legal profession will eventually settle on some format of in-person offline resolution with strict limits to recess and/or limits to word count for both documents and notes, because otherwise conflicts fail to get settled in anyone's lifetime (or won by whoever does not run out of tokens first - come thinking of it, the technogarchs would love this, so I guess this is exactly what will happen barring a revolution)
Ah, sorry. I am not referring to using LLMs for legal work.
I am referring to the act of merely pasting the output of a model as a comment.
Have the decency to understand what the LLM is writing and write your own message.
That comment is wild
> Here's the AI-written copyright analysis...
I'm not going to spend more time reading than what you have spent writing!
What do you mean "not exactly sure what you are referring to"?
The guy just posted a huge ai slop pr, do you think that's the correct place for "very interesting field observations about legal"?
What else could it refer to than you can't back up copyright ownership questions with "ai said so"??
Some maintainers who drank the Kool-Aid, just use AI to answer to issues and review PRs.
Pretty soon we'll have AIs talking to each other.
I get infuriated just from reading that, I wish I had as much patience as the maintainers on that project.
I just read that whole thread and I think the author made the mistake of submitting a 13k loc PR, but other than that - while he gets downvoted to hell on every comment - he's actually acting professionally and politely.
I wouldn't call this a fiasco, it reads to me more that being able to create huge amounts of code - whether the end result works well or not - breaks the traditional model of open source. Small contributions can be verified and the merrit-vs-maintenance-effort can at least be assessed somewhat more realistically.
I have no bones in the "vibe coding sucks" vs "vibe coding rocks" discussion and I reading that thread as an outsider. I cannot help but find the PR author's attitude absolutely okay while the compiler folks are very defensive. I do agree with them that submitting a huge PR request without prior discussion cannot be the way forward. But that's almost orthogonal to the question of whether AI-generated code is or is not of value.
If I were the author, I would probably take my 13k loc proof-of-concept implementation and chop it down into bite-size steps that are easy to digest, and try to get them to get integrated into the compiler successively, with being totally upfront about what the final goal is. You'd need to be ready to accept criticism and requests for change, but it should not be too hard to have your AI of choice incorporate these into your code base.
I think the main mistake of the author was not to use vibe coding, it was to dream up his own personal ideal of a huge feature, and then go ahead and single-handedly implement the whole thing without involving anyone from the actual compiler project. You cannot blame the maintainers for not being crazy about accepting such a huge blob.
"Why did you submit these files that are attributed to a completely unrelated person?"
"Beats me"
Do you consider this "professional and polite"? Raise your standards.
Minimally, I don't find this an unusual tone in the slightest for cs threads. But then again, I'm old.
I'm also quite surprised that apparently you cannot utter what is clearly just a personal opinion -- not a claim of objective truth -- without getting downvoted. But then again, the semantics of votes are not well-defined.
At the same time, I'm quite grateful for the constructive comments further down below under my original post.
He is not polite, he is of the utmost rudeness. As a reply to being pointed to the fact that he copied so much code that the generated code included someone else's name in the License, his reply was https://github.com/ocaml/ocaml/pull/14369/changes/ce372a60bd...
I struggle to think how someone thinks this is polite. Is politeness to you just not using curse words?
Admittedly, his handling of this aspect was perhaps less than ideal, but I cannot see any impoliteness here whatsoever. As a matter of fact, I struggle to think how you could think otherwise.
But I am biased. After having lived a number of years in a country where I would say the average understanding of politeness is vastly different from where I've grown up, I've learned that there is just a difference of opinion of what is polite and what isn't. I have probably been affected by that too.
You sound like you'd characterize a thief as polite if he asked you please when taking your wallet.
Ah, I see what you mean - you're making a distinction between someone's speech and someone's acts. Fair enough. In that sense, you would argue that the action of dropping a 13k loc PR is impolite, and I can see that.
It's just that in my reading, I did not find his demeanor in the comment thread to be impolite. He was trying to sell his contribution and I think that whatever he wrote was using respectful language.
Dropping an unreviewable 13000 lines PR is disrespectful to the reviewers and their time.
Doing it without any prior discussion with the maintainers is disrespectful to the maintainers and the architecture work they put in.
Trying to "sell" your contribution is disrespectful and implies you know better than the maintainers.
Cockily saying "the AI knows better than you" is disrespectful.
Respectful and polite language does not prevent being disrespectful.
He responds to a thoughtful and detailed 600-word comment from a maintainer with a dismissive "Here's the AI-written copyright analysis..." + thousands of words of slop.
The effort asymmetry is what's rude. The maintainers take their project very seriously (as they should) and are generous enough with their time to invite contribution from outsiders. Showing up and dropping 13k lines of code, posting comments copy+pasted from a chat window, and insisting that your contribution is trustworthy not because you thought it through but because you fed it through a few LLMs shows that you don't respect the maintainers' time. In other words: you are being rude. They would have to put in more upfront effort to review your contribution than you put in to create it! Then they have to maintain it in perpetuity.
> but I cannot see any impoliteness here whatsoever.
Ah yes. "It's AI I don't care" and "AI has very deep reasoning about code, prove me wrong" are the height of politeness.
Well, I wouldn't necessarily call it "going out of your way to be accommodating", but impolite is just not the word I'd choose to characterize it. I can see why others might but it's just my personal feeling that I don't think that this is the correct adjective here.
That said, I don't feel like this topic is important enough to go on about it, I probably spend enough keystrokes on it already.
That's how politicians and passive-aggressive people hide their rudeness/impoliteness: under a veneer of polite-sounding phrases.
But I guess we could've used a few other synonyms: https://www.merriam-webster.com/thesaurus/impolite inconsiderate, thoughtless, impertinent (2nd meaning) :)
interpreting his words on a literal basis , the PR submitter isn't being directly impolite ...
if you will , place yourself in the shoes of the repository maintainer. a random person (with a personal agenda) has popped up trying to sell you a solution (that he doesn't understand) to a problem (that you don't see as problematic). after you spending literal hours patiently explaining why the proposition is not acceptable , this random person still continues attempting to sell his solution.
do you see any impoliteness in the reframed scenario ?
I think there's nothing wrong with trying to sell your solution, and I'm skeptical about the "literal hours" that you claim.
The way I interpret this thread is that the PR poster had a certain itch and came up with a vibe-coded solution that helped him. Now he's trying to make that available for others too. The maintainers don't want it because it's too large a PR to review properly and because they don't want to have to maintain it afterwards.
I can totally see both positions.
I was just referring to the fact that - in my opinion - unlike others here, his writing did not appear impolite to me. But you know, that's just me. I thought that he was trying to sell his code, and it's not unusual to get rejected at first, so I can't blame him for trying to defend his contribution. All I'm saying is that I thought he did so in a respectful manner, but of course you could argue that the whole endeavor was already an act of impoliteness, in a way?!
> his writing did not appear impolite to me
I learnt something from this thread.
That, respectfulness and politeness are more from intentions/actions than from speech alone. Politeness of language without any respect for the actual function of that speech is pointless. Indeed, that this what the LLMs are trained for. Form over function. And many humans get fooled by it and are also clueless like the person dropping the steaming turd of a PR.
that is indeed what is being argued , is it possible to screw someone over politely ?
simple: search that user. He's a grifter with many failed venues that recently started flooding big project with useless PRs
everybody should collectively tell him to fuck off
That may or may not be the case - I really was just going off this one thread, and how I personally read it. I completely appreciate that others read it differently.
This to me sounds a lot like the SpaceX conversation:
- Ohh look it can [write small function / do a small rocket hop] but it can't [ write a compiler / get to orbit]!
- Ohh look it can [write a toy compiler / get to orbit] but it can't [compile linux / be reusable]
- Ohh look it can [compile linux / get reusable orbital rocket] but it can't [build a compiler that rivals GCC / turn the rockets around fast enough]
- <Denial despite the insane rate of progress>
There's no reason to keep building this compiler just to prove this point. But I bet it would catch up real fast to GCC with a fraction of the resources if it was guided by a few compiler engineers in the loop.
We're going to see a lot of disruption come from AI assisted development.
All these people that built GCC and evolved the language did not have the end result in their training set. They invented it. They extrapolated from earlier experiences and knowledge, LLMs only ever accidentally stumble into "between unknown manifolds" when the temperature is high enough, they interpolate with noise (in so many senses). The people building GCC together did not only solve a to technical problem. They solved a social one, agreeing on what they wanted to build, for what and why. LLMs are merely copying these decisions.
That's true and I fully agree. I don't think LLMs' progress in writing a toy C compiler diminishes the achievements that the GCC project did.
But also we've just witnessed LLMs go from being a glorified line auto-complete tool to it writing a C compiler in ~3 years. And I think that's something. And noting how we keep moving the goal post.
GP: "it didn't write a C compiler, it copied other compilers. Writing one from scratch is a lot harder."
You: "but look! It wrote a C compiler!"
The pattern matching rote-student is acing the class. No surprises here. There is no need to understand the subject from first principles to ace tests. Majority of high-school and college kids know this.
> LLMs are merely copying these decisions.
This I strongly suspect is the crux of the boundaries of their current usefulness. Without accompanying legibility/visibility into the lineage of those decisions, LLM's will be unable to copy the reasoning behind the "why", missing out on a pile of context that I'm guessing is necessary (just like with people) to come up to speed on the decision flow going forward as the mathematical space for the gradient descent to traverse gets both bigger and more complex.
We're already seeing glimmers of this as the frontier labs are reporting that explaining the "why" behind prompts is getting better results in a non-trivial number of cases.
I wonder whether we're barely scratching the surface of just how powerful natural language is.
All right, but perhaps they should also list the grand promises they made and failed to deliver on. They said they would have fully self-driving cars by 2016. They said they would land on Mars in 2018, yet almost a decade has passed since then. They said they would have Tesla's fully self-driving robo-taxis by 2020 and human-to-human telepathy via Neuralink brain implants by 2025–2027.
> - <Denial despite the insane rate of progress>
Sure, but not by what was actually promised. There may also be fundamental limitations to what the current architecture of LLMs can achieve. The vast majority of LLMs are still based on Transformers, which were introduced almost a decade ago. If you look at the history of AI, it wouldn't be the first time that a roadblock stalled progress for decades.
> But I bet it would catch up real fast to GCC with a fraction of the resources if it was guided by a few compiler engineers in the loop.
Okay, so at that point, we would have proved that AI can replicate an existing software project using hundreds of thousands of dollars of computing power and probably millions of dollars in human labour costs from highly skilled domain experts.
There's an argument to be made that replicating existing software is extremely useful.
Most of the time when you're writing a compiler for a new language, you'll be doing things that have been done before.
Because most of the concepts in your language are brought along from somewhere else.
That said: I'd always want a compiler and language designs to be well considered. Ideally, the authors have some proofs of soundness in their heads.
Perhaps LLM will make formal verification more feasible (from a cost perspective) and then our mind about what reliable software is might change.
> the insane rate of progress
Yeah but the speed of progress can never catch the speed of a moving goalpost!
What about the hype? If you claim your LLM generated compiler is functionally on par with GCC I’d expect it to match your claim.
I still won’t use it while it also matches all the non-functional requirements but you’re free to go and recompile all the software you use with it.
> Yeah but the speed of progress can never catch the speed of a moving goalpost!
How do you like those coast-to-coast self drives since the end of 2017?
Training data only teaches it how to reach the goalpost, not how to overtake it.
Are we sure about that? I mean, we have seen that LLMs are able to generalize to some degree. So I don't see a reason why you couldn't put an agent in a loop with a profiler and have it try to optimize the code. Will it come up with entirely novel ideas? Unlikely. Could it potentially combine existing ideas in interesting, novel ways that would lead to CCC outperforming GCC? I think so. Will it get stuck along the way? Almost certainly.
Would you want it to? The further the goal posts are the more progress we are making, and that's good, no? Trying to make it into a religious debate between believers and non-believers is silly. Neither side can predict the future, and, even if they could, winning the debate is not worth anything!
What is interesting is what can do with LLMs today and what we would like them to be able to do tomorrow so we can keep developing them into a good direction. Whether or not you (or I) believe it can do that thing tomorrow is thoroughly uninteresting.
The goalpost is not moving. The issue is that AI generates code that kinda looks ok but usually has deep issues, specially the more complex the code is. And that's not being really improved.
There are two questions which can be asked for both. The first one is "can these tech can achieve their goals?" which is what you seem debating. The other question is "is a successful outcome of these tech desirable at all?". One is making us pollute space faster than ever, as if we did not fuck the rest enough. They other will make a few very rich people even richer and probably everyone else poorer.
Interesting that people call this "progress" :)
The difference I see is that, after "get to orbit", the goalposts for SpaceX are things that have never been done before, whereas for LLMs the goalposts are all things that skilled humans have been able to do for decades.
AI assist in software engineering is unambiguously demonstrated to some done degree at this point: the "no LLM output in my project" stance is cope.
But "reliable, durable, scalable outcomes in adversarial real-world scenarios" is not convincingly demonstrated in public, the asterisks are load bearing as GPT 5.2 Pro would say.
That game is still on, and AI assist beyond FIM is still premature for safety critical or generally outcome critical applications: i.e. you can do it if it doesn't have to work.
I've got a horse in this race which is formal methods as the methodology and AI assist as the thing that makes it economically viable. My stuff is north of demonstrated in the small and south of proven in the large, it's still a bet.
But I like the stock. The no free lunch thing here is that AI can turn specifications into code if the specification is already so precise that it is code.
The irreducible heavy lift is that someone has to prompt it, and if the input is vibes the output will be vibes. If the input is zero sorry rigor... you've just moved the cost around.
The modern software industry is an expensive exercise in "how do we capture all the value and redirect it from expert computer scientists to some arbitrary financier".
You can't. Not at less than the cost of the experts if the outcomes are non-negotiable.
What is FIM ?
In 1908 the Model T could do 45mph.
In 1935 the Auburn 851 S/C Speedster hit 100mph
In 1955 the Mercedes-Benz 300 SL Gullwing did 161mph
In 2025 the Yangwang U9 Xtreme hit 308mph
progress is a decaying exponential - Tsiolkovsky's tyranny
And all these improvements past 1935 have been rendered irrelevant to the daily driver by safety regulations (I'll limit this claim to most of the continental US to avoid straying beyond my experience.)
These specific points look like a line if you plot
You can be wrong on every step of your approximation and still be right in the aggregate. E.g. order of magnitude estimate, where every step is wrong but mistakes cancel out.
Human crews on Mars is just as far fetched as it ever was. Maybe even farther due to Starlink trying to achieve Kessler syndrome by 2050.
> This to me sounds a lot like the SpaceX conversation
The problem is that it is absolutely indiscernible from the Theranos conversation as well…
If Anthropic stopped making lies about the current capability of their models (like “it compiles the Linux kernel” here, but it's far from the first time they do that), maybe neutral people would give them the benefit of the doubt.
For one grifter that happen to succeed at delivering his grandiose promises (Elon), how many grifters will fail?
Exactly. This flawed argument by which everything will be fixed by future models drives me crazy every time.
Just a couple more trillion and 6 more months!
That’s been the trend for a while. Can you make a prediction that says something concretely like “AI will not be able to do X by 2028” for a specific and well defined X?
In 2030, an AI model that I can run on my computer, without having to trust an evil megacorporation, will not be able to write a compiler for my markup language [0] based on a corpus of examples, without seeing the original implementation, using no more than 1.5× as much code as I did.
https://git.sr.ht/~xigoi/hilda
Were any made about 2025?
So far it has been accurate though. Models have gotten much better than even the most optimistic predictions.
No? The most optimistic predictions involved AGI around the corner, 6 months until no more developers for years now.
... Eh? A few years back, the usual suspects were predicting AGI by, usually, either 2026 or 2027. Think that's gonna happen?
(Such predictions have been quietly forgotten or revised forward, in general.)
No, I don't but it sounds very similar to the the naysayers that have silently moved the goalposts. That said, you're one of the few people in the wild that still claims LLMs are completely useless so I give you that.
You said:
> Models have gotten much better than even the most optimistic predictions.
We were promised Roko's Basilisk by now, damnit! Where's my magical robot god?!
But seriously, predictions a couple years back for 2026/27 (by quite big players, like Altman) were for AGI or as good as.
I do not, for the record, claim that they are totally useless. They are useful where correctness of results does not matter, for instance low-stakes natural language translation and spam generation. There's _some_ argument that they are somewhat useful in cases where their output can be reviewed by an expert (code generation etc), though honestly quantitive evidence there is mixed at best; for all the "10x developer" claims, there's not much in the way of what you'd call hard evidence.
> Pro: Sure, maybe now, but the next generation will fix it.
Do we need a c2 wiki page for "sufficiently smart LLM" like we do for https://wiki.c2.com/?SufficientlySmartCompiler ?
> This is still true - a compiler can never win this battle. All a human programmer has to do is take the output of the compiler and make a single optimisation, and he/she wins. This is the advantage that the human has - they can use any of a wide variety of tools at their disposal (including the compiler), whilst the compiler can only do what it was programmed. The best the compiler can hope for is a tie.
This is awkward
And not to mention that a C compiler is something we have literally 50 years worth of code for. I still seriously doubt the ability of LLMs to tackle truly new problems.
What do you classify as new? Every problem that we solve as developers is a very small deviation from already existing problems. Maybe that’s the point of llms?
How many developers do you think are solving truly novel problems? Most like me are CRUD bunnies.
If your problem is a very small deviation from an existing problem, you should be able to take an existing open-source solution and make a very small modification to adapt it to your use case. No need for “vibe-coding” a lower-quality implementation from scratch.
Yeah, it kind of strikes me how a lot of the LLM use cases would actually be better served by existing techniques, like more/better libraries. And if that's not possible, it'd be way better to find the closest match, fork it, and make minimal modifications. At least then you have the benefit of an upstream.
But, sort of like cryptocurrency, the LLM people aren't so much trying to solve actual problems, but rather find an application of their existing technology. Sort of like the proverbial saying: when you're selling hammers, you want convince everyone that their problem as a nail.
And these developers do not write the majority of their codebase, they use tons of libraries and only write the glue code.
As an Anti, my argument is "if AI will good in future, then come back in the future"
As a pro, my argument is "it's good enough now to make me incredibly productive, and it's only going to keep getting better because of advancements in compute".
I'd rather get really good at leveraging AI now than to bury my head in the sand hoping this will go away.
I happen to agree with the saying that AI isn't going to replace people, but people using AI will replace people who don't. So by the time you come back in the future, you might have been replaced already.
it sure is possible that One person using AI effectively may replace 10 people like me. it is just as likely that i may replace 10 people who only use AI.
> I'd rather get really good at leveraging AI now than to bury my head in the sand hoping this will go away.
I don't think those are the only two options, though.
Further, "Getting really good at leveraging AI" is very different to "Getting really good at prompting LLMs".
One is a skill that might not even result in the AI providing any code. The other is a "skill" in much the same way as winning hotdog eating contests is a "skill".
In the latter, even the least-technical user can replace you once they get even halfway decent at min-maxing their agent's input (md files, although I expect we'll switch away from that soon enough to a cohesive and structured UI).
In the former, you had better find some really difficult problems that pay when you solve them.
Either way, I predict a lot of pain and anguish in the near future, for a lot of people. Especially those who expect that prompting skills are an actual "skill".
Why would anything you learn today be relevant tomorrow if AI keeps advancing? You would need less and less of all your tooling, markdown files and other rituals and just let the AI figure it out altogether.
So I can keep my job now so I can pay for compute in the future when I'm out of a job. The compute will be used to create my own business to make money.
What makes you think you’ll be able to out-compete the purely-AI-led businesses with your business? What skills will give you an edge in the business that won’t also give you an edge in the job?
> What makes you think you’ll be able to out-compete the purely-AI-led businesses with your business? What skills will give you an edge in the business that won’t also give you an edge in the job?
Or why do you think your small AI-driven business can survive against richer people who can pay for more compute and thus do better than you?
AI may just turn software into a pay to win game.
Anyways, I'm heavily invested in compute companies right now.
>> Or why do you think your small AI-driven business can survive against richer people who can pay for more compute and thus do better than you?
> I don't know. Maybe because of my creativity?
How would you keep that up? I think there's a false belief, especially common in business-inflected spaces (which includes the tech sector), that skills and abilities can be endlessly specialized (e.g. the MBA claim that they're experts at running businesses, any business). The more you outsource to AI the less creative you'll become, because you'll lose touch with the work.
You can only really be creative in spaces where you regularly get your hands dirty. Lose that, and I think your "creativity" will become the equivalent of the MBA offering the same cookie-cutter financial engineering ideas to every business (layoffs, buy back stock, FOMO the latest faddish ideas, etc).
Maybe I can't. But that's also why I'm invested into AI companies right now.
Plan:
1. Keep my job for as long as I can by leveraging current AI tools to the best of my ability while my non-AI user colleagues lose earnings power or their job
2. Invest my money into AI companies and/or S&P500
3. If I'm truly rendered useless, live off of the investment
You got any better ideas? How would you do this?
I believe that I’ll keep enough of an edge in my job that I’ll continue to be employed. (At present I still have zero pressure to use AI, though I do use it.) It’s of course possible for that to turn out to be wrong, but in that case I also see no chance to start a business, and society will be in a lot of trouble.
> The compute will be used to create my own business to make money
tell me you've never seriously attempted starting a software business without telling me
It's like all these people for the last 2 decades "if only I knew how to code I'd make an app and be rich!" yeah sure you would
Those are two different things though, and not everyone are stuck at a place enforcing token usage. And why would anyone pay you for something if all it takes is compute to make it? They would just make it themselves.
You didn't even mention that this vibe-coded toy compiler cost $20k in token spend. That's an insane amount of money for what this is.
It seems at least comparable to what you would have to pay a suitably competent developer to code up something similar.
> It seems at least comparable to what you would have to pay a suitably competent developer to code up something similar.
No. You could have someone of fiverr do it for $5. Download gcc, done.
If only the Claude team knew that, they could have saved a lot of tokens.
> this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.
That's a really nice fictitious conversation but in my experience "anti-ai" people would be prone to say "This is stupid LLM's will never be able to write complex code and attempting to do so is futile". If your mind is open to explore how LLM's will actually write complex software then by definition you are not "anti".
I think you also forgot: Anti: But the whole thing can only have been generated because GCC and other compilers already exists (and depending on how strong the anti-feeling is: and has been stolen…)!
Two completely valid perspectives.
Unless you need a correctly compiled Linux kernel. In that case one gets exhausting real quick.
The perspective that says "a whole compiler in just a few hours" is making false claims. So not a valid perspective.
I don't think this is how pro and anti conversation goes.
I think the pro would tell you that if GCC developers could leverage Opus 4.6, they'd be more productive.
The anti would tell you that it doesn't help with productivity, it makes us less versed in the code base.
I think the CCC project was just a demonstration on what Opus can do now autonomously. 99.9% of software projects out there aren't building something as complex as a Linux compiler.
> It's not fair to compare them like this!
As someone who leans pro in this debate, I don't think I would make that statement. I would say the results are exactly as we expect.
Also, a highly verifiable task like this is well suited to LLMs, and I expect within the next ~2 years AI tools will produce a better compiler than gcc.
Don't forget that gcc is in the training set.
That's what always puts me off: when AI replaces artists, SO and FOSS projects, it can only feed into itself and deteriorate..
The AlphaZero approach shows otherwise, as long as there is an automated way to generate new test cases and evaluate the outcomes.
We can't do it for all domains, but I believe we can for efficient code.
today's models could be probably already good enough to compose tasks, and evaluate the results.
it can feed into itself and improve. the idea that self-training necessarily causes deterioration is fanfic. remember that they spend massive amounts of compute on rl.
> I expect within the next ~2 years AI tools will produce a better compiler than gcc.
Building a "better compiler than gcc" is a matter of cutting-age scientific research, not of being able to write good code
Given that GCC is in the training data, it should not take much research to create an equally good compiler.
The same two years as in "full self driving available in 2 years"?
Right.
These are different technologies with different rates of demonstrated growth. They have very little to do with each other.
Well let's check again in two years then.
> and I expect within the next ~2 years AI tools will produce a better compiler than gcc
and the "anti" crowd will point to some exotic architecture where it is worse
No, they will point out that the way to make GCC better is not really in the code itself. It's in scientific paper writing and new approaches. Implementation is really not the most work.
But only if there is a competent compiler engineer running the AI, reviewing specs, and providing decent design goals.
Yes it will be far easier than if they did it without AI, but should we really call it “produced by AI” at that point?
Yes, we will certainly go that way, probably code already added to gcc has been developed through collaborative AI tools. Agree we don't call that "produced by AI".
I think compilers though are a rare case where large scale automated verification is possible. My guess is that starting from gcc, and all existing documentation on compilers, etc. and putting ridiculous amounts of compute into this problem will yield a compiler that significantly improves benchmarks.
> Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
> Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Also, from the Anti-LLM perspective: did the coding agent actually build a working compiler, or just plagiarize prior art? C compilers are certainly part of the LLM's training set.
That's relevant because the implication seems to be: "Look, the agent can successfully develop really advanced software!" when the reality may be that it can plagiarize existing advanced software, and will fall on its face if asked to do anything not already done before.
A lot of propaganda and hype follows the pattern of presenting things in a way to creating misleading implications in the mind of the listener that the facts don't actually support.
It seems that the cause of the difference in opinion is that the anti camp is looking at the current state while the pro camp looking at the slope and projecting it into the future.
This is spot on, you can find traces of this conversation in the original thread posted on HN as well, where people are proclaiming "yeah it doesn't work, but still impressive!"
Reminds me so much of the people posting their problems about the tesla cybertruck and ending the post with "still love the truck though"
Pretty much. It's missing a tiny detail though. One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.
And no-one ever stops and thinks about what it means to give up so much control.
Maybe one of those companies will come out on top. The others produce garbage in comparison. Capital loves a single throat to choke and doesn't gently pluralise. So of course you buy the best service. And it really can generate any code, get it working, bug free. People unlearn coding on this level. And some day, poof, Microsoft is coming around and having some tiny problem that it can generate a working Office clone. Or whatever, it's just an example.
This technology will never be used to set anyone free. Never.
The entity that owns the generator owns the effective means of production, even if everyone else can type prompts.
The same technology could, in a different political and economic universe, widen human autonomy. But that universe would need strong commons, enforced interoperability, and a cultural refusal to outsource understanding.
And why is this different from abstractions that came before? There are people out there understanding what compilers are doing. They understand the model from top to bottom. Tools like compilers extended human agency while preserving a path to mastery. AI code generation offers capability while dissolving the ladder behind you.
We are not merely abstracting labor. We are abstracting comprehension itself. And once comprehension becomes optional, it rapidly becomes rare. Once it becomes rare, it becomes political. And once it becomes political, it will not be distributed generously.
Nah bro it makes them productive. Get with the program. Amazing . Fantastic. Of course it resonates with idiots because they can't think beyond the vicinity of their own greed. We are doomed , noone gives two cents. Idiocracy is here and it's not Costco.
Sorry! Of course.
What an amazing tech. And look, the CEOs are promising us a good future! Maybe we can cool the datacenters with Brawndo. Let me ask chat if that is a good idea.
You could make same argument in "information superhighway" days, but it turned out to be the opposite: no company monopolised internet services, despite trying hard.
With so many companies in AI race it is already pretty competitive landscape and it doesnt seem likely to me that any of them can build deep enough moat to come ahead.
Internet services have been centralised into a few ISPs and a few websites everyone visits
a few? all sorts of websites and services are thriving on the Internet even after significant consolidation of attention social media caused. Not even close to a dystopian picture parent comment paints.
90% of eyeball views are using the 5 sites each filled with screenshots of the other 4
I don't feel that I see this anywhere but if so, I guess I'm in a third camp.
I am "pro" in the sense that I believe that LLM's are making traditional programming obsolete. In fact there isn't any doubt in my mind.
However, I am "anti" in the sense that I am not excited or happy about it at all! And I certainly don't encourage anyone to throw money at accelerating that process.
> I believe that LLM's are making traditional programming obsolete. In fact there isn't any doubt in my mind.
Is this what AI psychosis looks like? How can anyone that is a half decent programmer actually believe that English + non-deterministic code generator will replace "traditional" programming?
That's also my take, vibe coding as a non-deterministic 4GL. https://en.wikipedia.org/wiki/Fourth-generation_programming_...
4GLs are productive yes, but also limited, and still require someone to come up with specs that are both informed by (business) realities and engineering considerations.
But this is also an arena where bosses expect magic to happen when people are involved; just pronounce a new strategy, and your business magically transforms - without any of that pesky 'figuring out what to do' or 'aligning stakeholders' or 'wondering what drugs the c-suite is doing'. Let LLMs write the specs!
Of all takes, I find this most honest and believable. Not many would want a disruption of their stable life
> One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.
That's a valid take. The problem is that there are, at this time, so many valid takes that it's hard to determine which are more valid/accurate than the other.
FWIW, I think this is more insightful than most of the takes I've seen, which basically amount to "side-1: we're moving to a higher level of abstraction" and "side-2: it's not higher abstraction, just less deterministic codegen".
I'm on the "higher level of abstraction" side, but that seems to be very much at odds with however Anthropic is defining it. Abstraction is supposed to give you better high-level clarity at the expense of low-level detail. These $20,000 burning, Gas Town-style orchestration matrices do anything but simplify high level concerns. In fact, they seem committed building extremely complex, low-level harnesses of testing and validation and looping cycles around agents upon agents to avoid actually trying to deal with whatever specific problem they are trying to solve.
How do you solve a problem you refuse to define explicitly? We end up with these Goodhart's Law solutions: they hit all of the required goals and declare victory, but completely fail in every reasonable metric that matters. Which I guess is an approach you make when you are selling agents by the token, but I don't see why anyone else is enamored with this approach.
You’re copping downvotes for this, but you’re not wrong.
“It will get better, and then we will use it to make many of you unemployed”
Colour-me-shocked that swathes of this industry might have an issue with that.
> Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!
The billion dollar question is, can we get from 80% to 100%? Is this going to be a situation where that final gap is just insurmountable, or will the capabilities simply keep increasing?
> Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
> Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro-LLM: Read the freaking article, it's not that long. The compiler made a mistake in an area where only two compilers exist that are up to the task: Linux Kernel.
Anthropic said they vibe-coded a C compiler that could compile the Linux kernel. That's what they said. No-one forced them to say that. They could have picked another code base.
It turns out that isn't true in all instances, as this article demonstrates. I'm not nearly expert enough to be able to decide if that error was simple, stupid, irrelevant, or whatever. I can make a call on whether it successfully compiled the Linux kernel: it did not.
I'm sorry for being excessively edgy, but "it's useless" is not a good summary for "linking errors after successfully compiling Linux kernel for x86_64."
> Because if it’s worth your time to lie, it’s worth my time to correct it.
https://www.astralcodexten.com/p/if-its-worth-your-time-to-l...
Anti-LLM: isn’t all this intelligence supposed to give us something better than what we already have?
Me: Top 0.02%[1] human-level intelligence? Sure. But we aren't there yet.
[1] There are around 8k programming languages that are used (or were used) in practice (that is, they were deemed better than existing ones in some aspects) and there are around 50 million programmers. I use it to estimate how many people did something, which is objectively better than existing products.
> Read the freaking article
The freaking article omits several issues in the "compiler". My bet is because they didn't actually challenged the output of the LLM, as it usually happens.
If you go to the repository, you'll find fun things, like the fact that it cannot compile a bunch of popular projects, and that it compiles others but the code doesn't pass the tests. It's a bit surprising, specially when they don't explain why those failures exist (are they missing support for some extensions? any feature they lack?)
It gets less surprising, though, when you start to see that the compiler doesn't actually do any type checking, for example. It allows dereferences to non-pointers. It allows calling functions with the wrong number of arguments.
There's also this fantastic part of the article where they explain that the LLM got the code to a point where any change or bug fix breaks a lot of the existing tests, and that further progress is not possible.
Then the fact that this article points out that the kernel doesn't actually link. How did they "boot it"? It might very well be possible that it crashed soon after boot and wasn't actually usable.
So, as usual, the problem here is that a lot of people look at LLM outputs and trust what they're saying they achieved.
The purpose of this project is not to create a state-of-the-art C compiler on par with projects that represent tens of thousands of developer-years. The goal is to assess the current capabilities of a largely autonomous software-building pipeline: it's not yet limitless, but better than it was. What a shocker.
I’ve had my share of build errors while compiling the Linux kernel for custom targets, so I wouldn’t be so sure that linker errors on x86_64 can’t be fixed with changes to the build script.
> The goal is to assess the current capabilities of a largely autonomous software-building pipeline: it's not yet limitless, but better than it was. What a shocker.
Of course, but we're trying to assess the capabilities by looking at the LLM output as if it were a program written by a person. If someone told me to check out their new C compiler that can build the kernel, I'd assume that other basic things, such as not compiling incorrect programs, are already pretty much covered. But with an LLM we can't assume that. We need to really check what's happening and not trust the agent's word for it.
And the reason why it's important it's because we really need to check whether it's actually "better than it was" or just "doing things incorrectly for longer". Let's say your goal was writing a gcc replacement. Does this autonomous pipeline get you closer? Or does it just get you farther away through the wrong path? Considering that it's full of bugs and incomplete implementations and cannot be changed without things breaking down, I'd say it seems to be the latter.
That's such a strawman conversation. Starting from:
> it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
It works. It's not perfect, but anthropic claims to have successfully compiled and booted 3 different configurations with it. The blog post failed to reproduce one specific version on one specific architecture. I wish anthropic gave us more information about which kernel commits succeeded, but still. Compare this to years that it took for clang to compile the kernel, yet people were not calling that compiler useless.
If anyone thinks other compilers "just work", I invite them to start fixing packages that fail to build in nixos after every major compiler change, to get a dose of real world experience.
We could be colonizing Mars with Claude Code and there will always be some skeptic somewhere.
I read a Youtube comment recently on pro AI video, it was
"The source code of gcc is available online"
This encompasses all the back-and-forth arguments I can think of. I would assume that proponents will eventually also mention AGI or ASI. :)
This is a pattern I see a lot, in programming languages communities too, where it's a source a joy and dreams first and facts later.
Maybe Anthropic can sponsor a research team to polish this using just an agent. A lot of things can be learned from that exercise.
I think LLMs the technology is very cool and l’m frankly amazed at what it can do. What I’m ‘anti’ about is pushing the entire economy all in on LLM tech. The accelerationist take of ‘just keep going as fast as possible and it will work out, trust me bro’ is the most unhinged dangerous shit I’ve ever heard and unfortunately seems to be the default worldview of those in charge of the money. I’m not sure where all the AI tools will end up, but I am willing to bet big that the average person is not going to be better off 10 years from now. The direction the world is going scares the shit out of me and the usages of AI by bad actors is not helping assuage that fear.
Honestly? I think if we as a society could trust our leaders (government and industry) to not be total dirtbags the resistance to AI would be much lower.
Like imagine if the message was “hey, this will lead to unemployment, but we are going to make sure people can still feed their families during the transition, maybe look in to ways to support subsidies retraining programs for people whose jobs have been impacted .” Seems like a much more palatable narrative than, “fuck you pleb! go retrain as a plumber or die in a ditch. I’ll be on my private island counting the money I made from destroying your livelihood.”
What does this imagined conversation have to do with the linked article? The “pro” and “anti” character both sound like the kind of insufferable idiots I’d expect to encounter on social media, the OP is a very nice blog post about performance testing and finding out what compilers do, doesn’t attempt any unwarranted speculation about what agents “struggle with” or will do “next generation”, how is it an example of that sort of shitposting?
Two thoughts here:
First, remember when we had LLMs run optimisation passes last year? Alphaevolve doing square packing, and optimising ML kernels? The "anti" crowd was like "well, of course it can automatically optimise some code, that's easy". And things like "wake me up when it does hard tasks". Now, suddenly when they do hard tasks, we're back at "haha, but it's unoptimised and slow, laaame".
Second, if you could take 100 juniors, 100 mid level devs and 100 senior devs, lock them in a room for 2 weeks, how many working solutions that could boot up linux in 2 different arches, and almost boot in the third arch would you get? And could you have the same devs now do it in zig?
The thing that keeps coming up is that the "anti" crowd is fighting their own deamons, and have kinda lost the plot along the way. Every "debate" is about promisses, CEOs, billions, and so on. Meanwhile, at every step of the way these things become better and better. And incredibly useful in the right hands. I find it's best to just ignore the identity folks, and keep on being amazed at the progress. The haters will just find the next goalpost and the next fight with invisible entities. To paraphrase - those who can, do, those who can't, find things to nitpick.
You're heavily implying that because it can do this task, it can do any task at this difficulty or lower. Wrong. This thing isn't a human at the level of writing a compiler, and shouldn't be compared to one
Codex frustratingly failed at refactoring my tests for me the other day, despite me trying many, many prompts of increasing specificity. A task a junior could've done
Am I saying "haha it couldn't do a junior level task so therefor anything harder is out of reach?" No, of course not. Again, it's not a human. The comparison is irrelevant
Calculators are superhuman at arithmetic. Not much else, though. I predict this will be superhuman at some tasks (already is) and we'll be better at others
Adopt this half baked, half broken, insanely expensive, planet destroying, IP infringing tech, you have no choice.
Burn everything, because if you don’t, you will get left behind and, maybe, just maybe, in 2 years when it’s good enough, maybe… after hoovering up all the money, IP and domain expertise for free, and you’ve burnt all your money & sanity prompting and cajoling it to a semi working solution for a problem you didn’t really have in the first place, it will dump you at the back of the unemployment line. All hail the AI! Crazy times.
In the meantime please enjoy targeted scams, ever increasing energy prices, AI content farms, hardware shortages, and endless, endless slop.
When humans architect anything - ideas, buildings, software or ice cream sundaes, we make so many little decisions that affect the overall outcome, we don’t even know or think about it! Too many sprinkles and sauce and it will be too sweet and hard to eat. We make those decisions based on both experience and imagination. Watch a small child making one to see the perfect human intersection of these two things at play. The LLM totally lacks the imagination part, except in the worst possible ways. It’s experience includes all sorts of random internet garbage that can sound highly convincing even to domain experts. Now it’s training set is being further expanded with endless mountains of more highly impressive sounding garbage.
It was obvious to me with the first image gen models how incredibly impressive it was to see an image gradually forming from the computer based on nothing but my brief text input but also how painfully limited the technology would always be. After days and days of early obsessive image generation, I was no better as an artist than when I began! Everything also kind of looked the same as well?
As incredible as it was, it was nothing more than a massively complicated, highly advanced parlour trick. A futuristic, highly powerful pattern generator. Nothing has changed my mind at all. All that’s happened is we’ve seen the worst tricksters, shysters and con artists jump on a very dangerous bandwagon to hell and try and whip us less compliant souls onboard.
Lots of things follow patterns, the joy in life, for me, is discovering the patterns, exploring them and developing new unique and interesting patterns.
I’ve yet to encounter a bandwagon worth joining anyway, maybe this will be the one that leaves me behind and i’ll be forced to retire on cartoon gorilla NFTs and tulip farming?
> The thing that keeps coming up is that the "anti" crowd is fighting their own deamons, and have kinda lost the plot along the way
perfect example
First off Alpha Evolve isn't an LLM. No more than a human is a kidney.
Second depends. If you told them to pretrain for writing C compiler however long it takes, I could see a smaller team doing it in a week or two. Keep in mind LLMs pretrain on all OSS including GCC.
> Meanwhile, at every step of the way these things become better and better.
Will they? Or do they just ingest more data and compute?[1] Again, time will tell. But to me this seems more like speed-running into an Idiocracy scenario than a revolution.[2]
I think this will turn out another driverless car situation where last 1% needs 99% of the time. And while it might happen eventually it's going to take extremely long time.
[1] Because we don't have much more computing jumps left, nor will future data be as clean as now.
[2] Why idiocracy?
Because they are polluting their own corpus of data. And by replacing thinking about computers, there will be no one to really stop them.
We'll equalize the human and computer knowledge by making humans less knowledgeable rather than more.
So you end up in an Idiocracy-like scenario where a doctor can't diagnose you, nor can the machine because it was dumbed down by each successive generation, until it resembles a child's toy.
> First off Alpha Evolve isn't an LLM.
It's a system based on LLMs.
> AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization. AlphaEvolve pairs the creative problem-solving capabilities of our Gemini models with automated evaluators that verify answers, and uses an evolutionary framework to improve upon the most promising ideas.
> AlphaEvolve leverages an ensemble of state-of-the-art large language models: our fastest and most efficient model, Gemini Flash, maximizes the breadth of ideas explored, while our most powerful model, Gemini Pro, provides critical depth with insightful suggestions. Together, these models propose computer programs that implement algorithmic solutions as code.
It’s more like a concept car vs a production line model. The capabilities it has were fine tuned for a specific scenario and are not yet available to the general public.
> It's a system based on LLMs.
What you said is:
If I start a sentence: I, and other readers (probably), would think Rossie is a fish. Not a dog. Even if you can technically group dogs as a sort of fish descendant.I have no idea what you're arguing. Alphaevolve is similar to claude code. They are using LLMs in a harness. No idea what you mean with fish, kidneys and so on. Can you please stick to the technical stuff? Otherwise it's just noise.
I'm arguing your writing is unclear and confusing. You can't continue a sentence and pretend the new sentence isn't related to the previous one.
Alpha Evolve is made from LLMs, but it's not the only part. If anything, it needs a genetic algo component. LLMs generally don't evolve.
Also, why are you focusing on AlphaEvolve? I made two other points you haven't addressed.
You have no idea what alphaevolve is, yet you try to correct me. This isn't productive, I'm out.
First off, let's say I'm wrong about Alpha Evolve. Fine, I made two more points; address them as well; that's just normal manners in a conversation.
Second, I question your idea of what Alpha Evolve is. You seem to think it's an LLM or LLM-adjacent when it's more like an evolutionary algo picking a better seed among the LLMs. That's not an LLM, if anything, it has some ability to correct itself.
The "Anti" stance is only tenable now if you believe LLMs are going to hit a major roadblock in the next few months around which Big AI won't be able to navigate. Something akin to the various "ghosts in the machine" that started bedeviling EEs after 2000 when transistors got sufficiently small, including gate leakage and sub-threshold current, such that Dennard Scaling came to an abrupt end and clock speeds stalled.
I personally hope that that happens, but I doubt it will. Note also that processors still continued to improve even without Dennard Scaling due to denser, better optimized onboard caches, better branch prediction, and more parallelism (including at the instruction level), and the broader trend towards SoCs and away from PCB-based systems, among other things. So at least by analogy, it's not impossible that even with that conjectured roadblock, Big AI could still find room for improvement, just at a much slower rate.
But current LLMs are thoroughly compelling, and even just continued incremental improvements will prove massively disruptive to society.
I'm firmly in the anti/unimpressed camp so far - but of course open to see where it goes.
I mean this compiler is the equivalent of handing someone a calculator when it was first invented and seeing that it took 2 hours to multiply two numbers together, I would go "cool that you have a machine that can do math, but I can multiply faster by hand, so it's a useless device to me".
I mean - who would honestly expect an LLM to be able to compete with a compiler with 40 years of development behind it? Even more if you count the collective man years expended in that time. The Claude agents took two weeks to produce a substandard compiler, under the fairly tight direction of a human who understood the problem space.
At the same time - you could direct Claude to review the register spilling code and the linker code of both LLVM/gcc for potential improvements to CCC and you will see improvements. You can ask it not to copy GPL code verbatim but to paraphrase and tell it it can rip code from LLVM as long as the licenses are preserved. It will do it.
You might only see marginal improvements without spending another $100K on API calls. This is about one of the hardest projects you could ask it to bite off and chew on. And would you trust the compiler output yet over GCC or LLVM?
Of course not.
But I wager, that if you _started_ with the LLVM/gcc codebases and asked it to look for improvements - it might be surprising to see what it finds.
Both sides have good arguments. But this could be a totally different ball game in 2, 5 and 10 years. I do feel like those who are most terrified by it are those whose identity is very much tied to being a programmer, and seeing the potential for their role to be replaced and I can understand that.
Me personally - I'm relieved I finally have someone else to blame and shout at rather than myself for the bugs in the software I produce. I'm relieved that I can focus now on the more creative direction and design of my personal projects (and even some work projects on the non-critical paths) and not get bogged down in my own perfectionism with respect to every little component until reaching exhaustion and giving up.
And I'm fascinated by the creativity of some of the projects I see that are taking the same mindset and approach.
I was depressed by it at first. But as I've experimented more and more, I've come to enjoy seeing things that I couldn't ever have achieved even with 100 man years of my own come to fruition.
You are missing the most important:
"Pro": give me tons of money to keep going this endeavour.
More seriously: if some LLM or other can _assist_ c++ to plain and simple C port...
In my experience, it is often the other way around. Enthusiasts are tasked with trying to open minds that seem very closed on the subject. Most serious users of these tools recognize the shortcomings and also can make well-educated guesses on the short term future. It's the anti crowd who get hellbent on this ridiculously unfounded "robots are just parrots and can't ever replace real programmers" shtick.
Maybe if AI evangelists would stop lying about what AI can do then people would hate it less.
But lying and hype is baked into the DNA of AI booster culture. At this point it can be safely assumed anything short of right-here-right-now proof is pure unfettered horseshit when coming from anyone and everyone promoting the value of AI.
Your comment is a perfect example of the biases I'm talking about. "AI evangelists" are not a singular group of people.
You're right! Sometimes even the right-here-right-now claims of AI capabilities are horseshit too with people in actuality remotely controlling the product.
It's not common for present-capabilities to be lied about too. But it does happen!
And the smallest presence are the users who don't work in the AI industry but rave about AI. I know of...two....people who fit that bill. A lead developer at a cybersecurity firm and someone who works heavily in statistics and data analytics. Both of which are very senior people in their fields who can articulate exactly what they're looking for without much left to interpretation.
Are you trying to demonstrate a textbook example of straw man argument?
Um actually, it's called a Steel Man argument.
I learned about it from HackerNews and ChatGPT.
Something that bothers me here is that Anthropic claimed in their blog post that the Linux kernel could boot on x86 - is this not actually true then? They just made that part up?
It seemed pretty unambiguous to me from the blog post that they were saying the kernel could boot on all three arch's, but clearly that's not true unless they did some serious hand-waving with kernel config options. Looking closer in the repo they only show a claimed Linux boot for RISC-V, so...
[0]: https://www.anthropic.com/engineering/building-c-compiler - "build a bootable Linux 6.9 on x86, ARM, and RISC-V."
[1]: https://github.com/anthropics/claudes-c-compiler/blob/main/B... - only shows a test of RISC-V
My guess is that CCC works if you disable static keys/DKMS/etc.
In the specific case of __jump_table I would even guess there was some work in getting the Clang build working.
It's really cool to see how slow unoptimised C is. You get so used to seeing C easily beat any other language in performance that you assume it's really just intrinsic to the language. The benchmark shows a SQLite3 unoptimised build 12x slower for CCC, 20x for optimised build. That's enormous!
I'm not dissing CCC here, rather I'm impressed with how much speed is squeezed out by GCC out of what is assumed to be already an intrinsically fast language.
The speed of C is still largely intrinsic to the language.
The primatives are directly related to the actual silicon. A function call is actually going to turn into a call instruction (or get inlined). The order of bytes in your struct are how they exist in memory, etc. A pointer being dereferenced is a load/store.
The converse holds as well. Interpreted languages are slow because this association with the hardware isn't the case.
When you have a poopy compiler that does lots of register shuffling then you loose this association.
Specifically the constant spilling with those specific functions functions that were the 1000x slowdown, makes the C code look a lot more like Python code (where every variable is several dereference away).
Right - maybe we're saying the same thing. C is naturally amenable to being blazing fast, but if you compile it without trying to be efficient (not trying to be inefficient, just do the simplest, naive thing) it's still slow - by 1-1.5 order of magnitude.
I mean you can always make things slower. There are lots of non-optimizing or low optimizing compilers that are _MUCH_ faster than this. TCC is probably the most famous example, but hardly the only alternative C compiler with performance somewhere between -O1 and -O2 in GCC. By comparison as I understand it, CCC has performance worse than -O0 which is honestly a bit surprising to me, since -O0 should not be a hard to achieve target. As I understand it, at -O0 C is basically just macro expanding into assembly with a bit of order of operations thrown in. I don't believe it even does register allocation.
> the build failed at the linker stage
> The compiler did its job fine
> Where CCC Succeeds Correctness: Compiled every C file in the kernel (0 errors)
I don't think that follows. It's entirely possible that the compiler produced garbage assembly for a bunch of the kernel code that would make it totally not work even if it did link. (The SQLite code passing its self tests doesn't convince me otherwise, because the Linux kernel uses way more advanced/low-level/uncommon features than SQLite does.)
I agree. Lack of errors is not an indicator of correct compilation. Piping something to /dev/null won't provide any errors either & so there is nothing we can conclude from it. The fact that it compiles SQLite correctly does provide some evidence that their compiler at least implements enough of the C semantics involved in SQLite.
It can run Doom so it must mean some amount of correctness?
Yeah I saw a post on LinkedIn (can't find it again sorry) where they found that CCC compiles C by mostly just ignoring errors. `const` is a nop. It doesn't care if you redefine variables with different types, use a string where an `int` is expected, etc.
Whenever I've done optimisation (e.g. genetic algorithms / simulated annealing) before you always have to be super careful about your objective function because the optimisation will always come up with some sneaky lazy way to satisfy it that you didn't think of. I guess this is similar - their objective was to compile valid C code and pass some tests. They totally forgot about not compiling invalid code.
>They totally forgot about not compiling invalid code.
Indeed. For a specific example of it not erroring out:
https://www.reddit.com/r/Compilers/comments/1qx7b12/comment/...
"Ironically, among the four stages, the compiler (translation to assembly) is the most approachable one for an AI to build. It is mostly about pattern matching and rule application: take C constructs and map them to assembly patterns.
The assembler is harder than it looks. It needs to know the exact binary encoding of every instruction for the target architecture. x86-64 alone has thousands of instruction variants with complex encoding rules (REX prefixes, ModR/M bytes, SIB bytes, displacement sizes). Getting even one bit wrong means the CPU will do something completely unexpected.
The linker is arguably the hardest. It has to handle relocations, symbol resolution across multiple object files, different section types, position-independent code, thread-local storage, dynamic linking and format-specific details of ELF binaries. The Linux kernel linker script alone is hundreds of lines of layout directives that the linker must get exactly right."
I worked on compilers, assemblers and linkers and this is almost exactly backwards
Exactly this. Linker is threading given blocks together with fixups for position-independent code - this can be called rule application. Assembler is pattern matching.
This explanation confused me too:
If each iteration is X percent slower, then a billion iterations will also be X percent slower. I wonder what is actually going on.Claude one-shot a basic x86 assembler + linker for me. Missing lots of instructions, yes, but that is a matter of filling in tables of data mechanically.
Supporting linker scripts is marginally harder, but having manually written compilers before, my experience is the exact opposite of yours.
I am inclined to agree with you... but, did CC produce a working linker as well as a working compiler?
I thought it was just the compiler that Anthropic produced.
Why would the correct output of a C compiler not work with a standard linker?
> Why would the correct output of a C compiler not work with a standard linker?
I feel it should for a specific platform/target, but I don't know if it did.
Writing a linker is still a lot of work, so if their original $20k cost of production did not include a linker I'd be less impressed.
Which raises the question, did CC also produce its own pre-processor or just use one of the many free ones?
As a neutral observation: it’s remarkable how quickly we as humans adjust expectations.
Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.
That would have been completely unbelievable! Absurd! No one would take it seriously.
And now look at where we are.
Now consider how much of the original C compiler's source code it was trained on and still managed to output a worse result?
Proof of just how lossy the compression is.
> a simple English prompt
And that’s where my suspicion stems from.
An equivalent original human piece of work from an expert level programmer wouldn’t be able to do this without all the context. By that I mean all the all the shared insights, discussion and design that happened when making the compiler.
So to do this without any of that context is likely just very elaborate copy pasta.
Indeed, it's the Overton window that has moved. Which is why I secretly think the pro-AI side is more right than the anti-AI side. Makes me sad.
You're right. It's been pretty incredible. It's also frustrating as hell though when people extrapolate from this progress
Just because we're here doesn't mean we're getting to AGI or software developers begging for jobs at Starbucks
Sure then make your prediction? It’s always easy to hand wave and dismiss other people’s predictions. But make yours: what do you think llms can do in 2 years?
You're asking me to do the thing I just said was frustrating haha. I have no idea. It's a new technology and we have nothing to draw from to make predictions. But for the sake of fun..
New code generation / modification I think we're hitting a point of diminishing returns and they're not going to improve much here
The limitation is fundamentally that they can only be as good as the detail in the specs given, or the test harnesses provided to them. Any detail left out they're going to make up, and hopefully it's what you want (often it's not!). If you make the specs detailed enough so that there's no misunderstanding possible: you've just written code, what we already do today
Code optimization I think they'll get quite a bit better. If you give them GCC it's probable they'll be able to improve upon it
> If you make the specs detailed enough so that there's no misunderstanding possible: you've just written code, what we already do today
This was my opinion for a very long time. Having build a few applications from scratch using AI, though, nowadays I think: Sometimes not everything needs to be spelled out. Like in math papers some details can be left to the ~~reader~~LLM and it'll be fine.
I mean, in many cases it doesn't really matter what exactly the code looks like, as long as it ends up doing the right thing. For a given Turing machine, the equivalence class of equivalent implementations is infinite. If a short spec written in English leads the LLM to identify the correct equivalence class, that's all we need and, in fact, a very impressive compression result.
Sometimes, yeah. I don't think we're disagreeing
What I'd also add:
Because of the unspecified behaviour, you're always going to need someone technical that understands the output to verify it. Tests aren't enough
I'm not even sure if this is a net productivity benefit. I think it is? Some cases it's a clear win.. but definitely not always. You're reducing time coding and now putting extra into spec writing + review + verification
> Sometimes, yeah. I don't think we're disagreeing
I would disagree. Formalism and precision have a critical role to play which is often underestimated. More so with the advent of llms. Fuzziness of natural languages is both a strength and weakness. We have adopted precise but unnatural languages (math/C/C++) for describing machine models of the physical world or of the computing world. Such precision was a real human breakthrough which is often overlooked in these debates.
Hmm. It’s not clear what specific task it can’t handle. Can you come up with a concrete example?
Are you saying you've never had them fail at a task?
I wanted to refactor a bunch of tests in a TypeScript project the other day into a format similar to table driven tests that are common in Golang, but seemingly not so much in TypeScript. Vitest has specific syntax affordances for it, though
It utterly failed at the task. Tried many times with increasing specificity in my prompt, did one myself and used it as an example. I ended up giving up and just doing it manually
I see. Did you use Claude code? With access to compiling and running.
Codex on high, yeah it had access to compiling/running
thanks for the data point
Something that looks and sounds impressive but in the end not of much substance.
This will be true for next 2 years, 4 years, next decade, few decades. Until the state of the art ML paradigm remains language models.
> Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.
You’re very conveniently ignoring the billions in training and that it has practically the whole internet as input.
Wasn't there a fair amount of human intervention in the AI agents? My understanding is, the author didn't just write "make me a c compiler in rust" but had to intervene at several points, even if he didn't touch the code directly.
I totally agree, but I think a lot of the push-back is that this is presented as better than it actually is.
It's really difficult for me to understand the level of cynicism in the HN comments on this topic, at all. The amount of goalpost-moving and redefining is absolutely absurd. I really get the impression that the majority of the HN comments are just people whining about sour grapes, with very little value added to the discussion.
I'd like to see someone disagree with the following:
Building a C compiler, targeting three architectures, is hard. Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard. Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.
To the specific issues with the concrete project as presented: This was the equivalent of a "weekend project", and it's amazing
So what if some gcc is needed for the 16-bit stuff? So what if a human was required to steer claude a bit? So what if the optimizing pass practically doesn't exist?
Most companies are not software companies, software is a line-item, an expensive, an unavoidable cost. The amount of code (not software engineering, or architecture, but programming) developed tends towards glue of existing libraries to accomplish business goals, which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc. No one is seriously saying that you have to use an LLM to build your high-performance math library, or that you have to use an LLM to build anything, much in the same way that no one is seriously saying that you have to rewrite the world in rust, or typescript, or react, or whatever is bothering you at the moment.
I'm reminded of a classic slashdot comment--about attempting to solve a non-technical problem with technology, which is doomed to fail--it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology.
> This was the equivalent of a "weekend project", and it's amazing
I mean, $20k in tokens, plus the supervision by the author to keep things running, plus the number of people that got involved according to the article... doesn't look like "a weekend project".
> Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard.
Is it correctly compiling it? Several people have pointed out that the compiler will not emit errors for clearly invalid code. What code is it actually generating?
> Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.
It's even harder to have a C compiler that can correctly compile SQLite and pass the test suite but then the SQLite binary itself fails to execute certain queries (see https://github.com/anthropics/claudes-c-compiler/issues/74).
> which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc.
That code might be less complex for us, but more complex for an LLM if it has to deal with lots of domain-specific context and without a test suite that has been developed for 40 years.
Also, if the end result of the LLM has the same problem that Anthropic concedes here, which is that the project is so fragile that bug fixes or improvements are really hard/almost impossible, that still matters.
> it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology
It's a discussion about what the LLMs can actually do and how people represent those achievements. We're point out that LLMs, without human supervision, generate bad code, code that's hard to change, with modifications specifically made to address failing tests without challenging the underlying assumptions, code that's inconsistent and hard to understand even for the LLMs.
But some people are taking whatever the LLM outputs at face value, and then claiming some capabilities of the models that are not really there. They're still not viable for using without human supervision, and because the AI labs are focusing on synthetic benchmarks, they're creating models that are better at pushing through crappy code to achieve a goal.
"sour grapes" means nothing in this context
The 158,000x slowdown on SQLite is the number that matters here, not whether it can parse C correctly. Parsing is the solved problem — every CS undergrad writes a recursive descent parser. The interesting (and hard) parts of a compiler are register allocation, instruction selection, and optimization passes, and those are exactly where this falls apart.
That said, I think the framing of "CCC vs GCC" is wrong. GCC has had thousands of engineer-years poured into it. The actually impressive thing is that an LLM produced a compiler at all that handles enough of C to compile non-trivial programs. Even a terrible one. Five years ago that would've been unthinkable.
The goalpost everyone should be watching isn't "can it match GCC" — it's whether the next iteration closes that 158,000x gap to, say, 100x. If it does, that tells you something real about the trajectory.
The part of the article about the 158,000x slowdown doesn't really make sense to me.
It says that a nested query does a large number of iterations through the SQLite bytecode evaluator. And it claims that each iteration is 4x slower, with an additional 2-3x penalty from "cache pressure". (There seems to be no explanation of where those numbers came from. Given that the blog post is largely AI-generated, I don't know whether I can trust them not to be hallucinated.)
But making each iteration 12x slower should only make the whole program 12x slower, not 158,000x slower.
Such a huge slowdown strongly suggests that CCC's generated code is doing something asymptotically slower than GCC's generated code, which in turn suggests a miscompilation.
I notice that the test script doesn't seem to perform any kind of correctness testing on the compiled code, other than not crashing. I would find this much more interesting if it tried to run SQLite's extensive test suite.
This thing has likely all of GCC, clang and any other open source C compiler in its training set.
It could have spotted out GCC source code verbatim and matched its performance.
It's kinda of a failure it didn't just spit out GCC isn't it?
If I had GCC and was asked for a C compiler I would just provide GCC..
That’s the equivalent of getting less than 50% on a quiz consisting entirely of yes/no questions.
It’s in Rust…
It's an LLM, surely it could read gcc source code and translate it to Rust if it really tried hard enough
It wasn't given gcc source code, and was not given internet access. It the extent it could translate gcc source code, it'd need to be able to recall all of the gcc source from its weights.
Right. And the arguably simpler problem, where the model gets the C code directly, is active research: https://www.darpa.mil/research/programs/translating-all-c-to...
All of this work is extraordinarily impressive. It is hard to predict the impact of any single research project the week it is released. I doubt we'll ever throw away GCC/LLVM. But, I'd be surprised if the Claude C Compiler didn't have long-term impact on computing down the road.
I occasionally - when I have tokens to spare, a MAX subscription only lasts so far - have Claude working on my Ruby compiler. Far harder language to AOT compile (or even parse correctly). And even 6 months ago it was astounding how well it'd work, even without what I now know about good harnesses...
I think that is the biggest outcome of this: The notes on the orchestration and validation setup they used were far more interesting than the compiler itself. That orchestration setup is already somewhat quaint, but it's still far more advanced than what most AI users use.
CS undergrads write parsers for some toy lisps or other straight forward syntax. C isn't as trivial https://faultlore.com/blah/c-isnt-a-language/#you-cant-actua...
(a small remark, but to be clear I'm not terribly impressed by AI showcase of the c compiler, nor with browser before that, as it stands)
> Combined over a billion iterations: 158,000x total slowdown
I don't think that's a valid explanation. If something takes 8x as long then if you do it a billion times it still takes 8x as long. Just now instead of 1 vs 8 it's 1 billion vs 8 billion.
I'd be curious to know what's actually going on here to cause a multiple order of magnitude degradation compared to the simpler test cases (ie ~10x becomes ~150,000x). Rather than I-cache misses I wonder if register spilling in the nested loop managed to completely overwhelm L3 causing it to stall on every iteration waiting for RAM. But even that theory seems like it could only account for approximately 1 order of magnitude, leaving an additional 3 (!!!) orders of magnitude unaccounted for.
I think there's a lot more to the story here.
That stuck out to me as well.
I wonder if there could be a bug where extra code runs but the result is discarded (and the code that runs happens to have no side effects).
The post also says
> That is roughly 1 billion iterations
but that doesn't sound right because GCC's version runs in only 0.047s, and no CPU can do a billion iterations that quickly.
Building a C compiler is definitely hard for humans, but I don’t think it’s particularly strong evidence of "intelligence" from an LLM. It’s a very well understood, heavily documented problem with lots of existing implementations and explanations in the training data.
These kinds of tasks are relatively easy for LLMs, they’re operating in a solved design space and recombining known patterns. It looks impressive to us because writing a compiler from scratch is difficult and time consuming for a human, not because of the problem itself.
That doesn’t mean LLMs aren’t useful, even if progress plateaued tomorrow, they’d still be very valuable tools. But building yet another C compiler or browser isn’t that compelling as a benchmark. The industry keeps making claims about reasoning and general intelligence, but I’d expect to see systems producing genuinely new approaches or clearly better solutions, not just derivations of existing OSS.
Instead of copying a big project, I'd be more impressed if they could innovate in a small one.
A few things to note:
1. In the real world, for a similar task, there are little reasons for: A) not giving the compiler access to all the papers about optimizations, ISAs PDFs, MIT-licensed compilers of all the kinds. It will perform much better, and this is a proof that the "uncompressing GCC" is just a claim (but even more point 2).
2. Of all the tasks, the assembler is the part where memorization would help the most. Instead the LLM can't perform without the ISA documentation that it saw repreated infinite number of times during pre-training. Guess what?
3. Rust is a bad language for the test, as a first target, if you want an LLM-coded Rust C compiler, and you have LLM experience, you would go -> C compiler -> Rust port. Rust is hard when there are mutable data structures with tons of references around, and a C compiler is exactly that. To compose complexity from different layers is an LLM anti pattern that who worked a lot with automatic programming knows very well.
4. In the real world, you don't do a task like that without steering. And steering will do wonders. Not to say that the experiment was ill conceived. The fact is that the experimenter was trying to show a different point of what the Internet got (as usually).
> the experimenter was trying to show a different point of what the Internet got (as usually)
All of your points are important, but I think this is the most important one.
Having written compilers, $20k in tokens to get to a foundation for a new compiler with the feature set of this one is a bargain. Now, the $20k excludes the time of to set up the harness, so the total cost would be significantly higher, but still.
The big point here is that the researchers in question demonstrated that a complex task such as this could be achived shockingly cheaply, even when the agents were intentionally forced to work under unrealistically harsh conditions, with instructions to include features (e.g. SSA form) that significantly complicated the task but made the problem closer to producing the foundation for a "proper" compiler rather than a toy compiler, even if the outcome isn't a finished production-ready multi-arch C-compiler.
I think one of the issue is that the register allocation algorithm -- alongside the SSA generation -- is not enough.
Generally after the SSA pass, you convert all of them into register transfer language (RTL) and then do register allocation pass. And for GCC's case it is even more extreme -- You have GIMPLE in the middle that does more aggressive optimization, similar to rustc's MIR. CCC doesn't have all that, and for register allocation you can try to do simple linear scan just as the usual JIT compiler would do though (and from my understanding, something CCC should do at a simple cost), but most of the "hard part" of compiler today is actually optimization -- frontend is mostly a solved problem if you accept some hacks, unlike me who is still looking for an elegant academic solution to the typedef problem.
Note that the LLVM approach to IR is probably a bit more sane than the GCC one. GCC has ~3 completely different IRs at different stages in the pipeline, while LLVM mostly has only canonical IR form for passing data around through the optimization passes (and individual passes will sometimes make their own temporary IR locally to make a specific analysis easier).
What is the typedef problem?
If stevefan1999's referring to a nasty frontend issue, it might be due to the fact that a name introduced by a typedef and an identical identifier can mingle in the same scope, which makes parsing pretty nasty – e.g. (example from source at end):
https://eli.thegreenplace.net/2011/05/02/the-context-sensiti...I don't know off the top of my head whether there's a parser framework that makes this parse "straightforward" to express.
In your example bar is actually trivial, since both the type AA and the variable AA are ints both aa and bb ends up as 4 no matter how you parse it. AA has to be typedef'd to something other than int.
Lexical parsing C is simple, except that typedef's technically make it non-context-free. See https://en.wikipedia.org/wiki/Lexer_hack When handwriting a parser, it's no big deal, but it's often a stumbling block for parser generators or other formal approaches. Though, I recall there's a PEG-based parser for C99/C11 floating around that was supposed to be compliant. But I'm having trouble finding a link, and maybe it was using something like LPeg, which has features beyond pure PEG that help with context dependent parsing.
Clang's solution (presented at the end of the Wikipedia article you linked) seem much better - just use a single lexical token for both types and variables.
Then, only the parser needs to be context sensitive, for the A* B; construct which is either a no-op multiplication (if A is a variable) or a variable declaration of a pointer type (if A is a type)
Well, as you see this is inherently taking the spirit of GLL/GLR parser -- defer parse until we have all the information. The academic solution to this is not to do it on token level but introduce a parse tree that is "forkable", meaning a new persistent data structure is needed to "compress" the tree when we have different routes, and that thing is called: graph structured stack (https://en.wikipedia.org/wiki/Graph-structured_stack)
I think you're referring to this one: https://github.com/jhjourdan/C11parser
What I had specifically in mind definitely wasn't using OCaml or Menhir, but that's a very useful resource, as is the associated paper, "A simple, possibly correct LR parser for C11", https://jhjourdan.mketjh.fr/pdf/jourdan2017simple.pdf
This is closer to what I remember, but I'm not convinced it's what I had in mind, either: https://github.com/edubart/lpegrex/blob/main/parsers/c11.lua It uses LPeg's match-time capture feature (not a pure PEG construct) to dynamically memorize typedef's and condition subsequent matches. In fact, it's effectively identical to what C11Parser is doing, down to the two dynamically invoked helper functions: declare_typedefname/is_typedefname vs set_typedef/is_typedef. C11Parser and the paper are older, so maybe the lpegrex parser is derivative. (And probably what I had in mind, if not lpegrex, was derivative, too.)
"The miracle is not that the bear can dance well, it's that the bear can dance at all."
- Old Russian proverb.
But the poster, the ticket seller, and the ringmaster all said "Anna Pavlova reincarnated, a Bear that can dance as well as famous Ice Skaters!"
Who exactly said that? Can you give sources to high profile figures that said it?
Since we're in an Anthropic topic
https://www.businessinsider.com/anthropic-ceo-ai-90-percent-...
We're talking about this C compiler here.
Can someone explain to me, what’s the big deal about this? The AI model was trained on lots of code and spit out sonething similar than gcc. Why is this revolutionary?
It's a marketing gimmick. Cursor did the same recently when they claimed to have created a working browsers but it was basically just a bunch of open source software glued together into something barely functional for a PR stunt.
Incorrect. This compiler can compile and run doom.
Claims require evidence so where is your evidence?
https://www.anthropic.com/engineering/building-c-compiler
Chalmers: "May I see it?"
Anthropic: "No."
If someone told you 5 years ago that a computer generated a working C compiler, would you think it was a big deal or not?
These tools do not compete against the lonely programmer that writes everything from scratch they compete with the existing tooling. 5 years ago compiler generators already exist, as they did in the previous decades. That is a solved problem. People still like the handroll their parsers, not because generating wouldn't work, but because it has other benefits (maintainability, adaption, better diagnostics). Perfectly fine working code is routinely thrown away and reimplemented, because there are not enough people around anymore who know the code by heart. "The big Rewrite" is a meme for a reason.
A computer generating a compiler is nothing new. Unzip has done this many many times. The key difference is that unzip extracts data from an archive in a deterministic way, while LLMs recover data from the training dataset using a lossy statistical model. Aid that with a feedback loop and a rich test suite, and you get exactly what Anthropic has achieved.
While I agree that the technology behind this is impressive, the biggest issue is license infringement. Everyone knows there's GPL code in the training data, yet there's no trace of acknowledgment of the original authors.
Its already bad enough people are using non-GPL compilers like LLVM (that make malicious behavior like proprietary incompatible forks possible), so yet another compiler not-under GPL, that even AI-washes GPL code, is not a good thing.
If you can show us code it has infringed on, you might have a point. Until then, you are making unfounded accusations.
Sounds amazing, but the computer didn’t do it out of blue with intelligence, but more like cookie-cutter style from already existing code.
What’s the big deal about that?
That’s not true. It didn’t have access to the internet and no LLM has the fidelity to reproduce code verbatim from its training data at the project level. In this case, it’s true that compilers were in its training data but only helped at the conceptual level and not spitting verbatim gcc code.
> In this case, it’s true that compilers were in its training data but only helped at the conceptual level and not spitting verbatim gcc code.
How do you know that?
At this point AI coding feels like religion. You have to believe in it.
How do I know that? The code is not similar to GCC at any level except conceptual. If you can point out the similarity at any level I might agree with you.
I have a feeling, you didn't look at the code at all.
I would love to see and be proved wrong that the code is not similar to gcc. Please point it out
I have a feeling that you didn't because if you had, you'd realize it has more similarity to llvm than gcc.
Not sure who are you replying to. But me, personally, never said anything about gcc, nor llvm.
> I have a feeling, you didn't look at the code at all.
And you originally asked how someone knew that they weren't just spitting out gcc. So you reject their statement that it's not like gcc at all with your "you didn't look at the code at all". When its clear that you haven't looked at it.
well the part where it's written in rust was a lil bit of a giveaway
yeah its pretty amazing it can do this. The problem is the gaslighting by the companies making this - "see we can create compilers, we won't need programmers", programmers - "this is crap, are you insane?", classic gas lighting.
It’s giving you an idea of what Claude is capable of - creating a project at the complexity of a small compiler. I don’t know if it can replace programmers but can definitely handle tasks of smaller complexity autonomously.
"autonomously" I couldn't agree with, I use it regularly for 100-200 loc size stuff, I can't recall it ever being right the first time.
I regularly has it produce 10k+ lines of code that is working and passing extensive test suites. If you give it a prompt and no agent loop and test harness, then sure, you'll need to waste your time babysitting it.
Autonomously means giving it access to run tests and compiler
You are incorrect. You can not conclude something of lower complexity will not stump it.
My 2 cents: just like Cursor's browser, it seems the AI attempted a really ambitious technical design, generally matching the bells and whistles of a true industrial strength compiler, with SSA optimization passes etc.
However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.
If a human expert were to write a compiler not necessarily designed to match GCC, but provide a really good balance of features to complexity, they'd be able to make something much simpler. There are some projects like this (QBE,MIR), which come with nice technical descriptions.
Likewise there was a post about a browser made by a single dude + AI, which was like 20k lines, and worked about as well as Cursor's claimed. It had like 10% of the features, but everything there worked reasonably well.
So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.
> My 2 cents: just like Cursor's browser, it seems the AI attempted a really ambitious technical design, generally matching the bells and whistles of a true industrial strength compiler, with SSA optimization passes etc.
Per the article from the person who directed this, the user directed the AI to use SSA form.
> However looking at the assembly, it's clear to me the opt passes do not work, an I suspect it contains large amounts of 'dead code' - where the AI decided to bypass non-functioning modules.
That is quite possibly true, but presumably at least in part reflects the fact that it has been measured on completeness, not performance, and so that is where the compiler has spent time. That doesn't mean it'd necessarily be successful at adding optimisation passes, but we don't really know. I've done some experiments with this (a Ruby ahead-of-time compiler) and while Claude can do reasonably well with assembler now, it's by no means where it's strongest (it is, however, far better at operating gdb than I am...), but it can certainly do some of it.
> So while I don't want to make predictions, but it seems for now, the human-in-the-loop method of coding works much better (and cheaper!) than getting AI to generate a million lines of code on its own.
Yes, it absolutely is, but the point in both cases was to test the limits of what AI can do on their own, and you won't learn anything about that if you let a human intervene.
$20k in tokens to get to a surprisingly working compiler from agents working on their own is at a point where it is hard to assess how much money and time you'd save once considering the cleanup job you'd probably want to do on it before "taking delivery", but had you offered me $20k to write a working C-compiler with multiple backends that needed to be capable of compiling Linux, I'd have laughed at the funny joke.
But more importantly, even if you were prepared to pay me enough, delivering it as fast if writing it by hand would be a different matter. Now, if you factor in the time used to set up the harness, the calculation might be different.
But now that we know models can do this, efforts to make the harnesses easier to set up (for my personal projects, I'm experimenting with agents to automatically figure out suitable harnesses), and to make cleanup passes to review, simplify, and document, could well end up making projects like this far more viable very quickly (at the cost of more tokens, certainly, but even if you double that budget, this would be a bargain for many tasks).
I don't think we're anywhere near taking humans out of the loop for many things, but I do see us gradually moving up the abstraction levels, and caring less about the code at least at early stages and more about the harnesses, including acceptance tests and other quality gates.
You misunderstand me - first, almost all modern compilers (that I know of) use SSA, so that's not much of a thing you need to point out. The point I was making, is that by looking at the assembler, it seems the generated code is totally unoptimized, even though it was mentioned that Claude implemented SSA opt passes.
The generated code's quality is more inline with 'undergrad course compiler backend', that is, basically doing as little work on the backend as possible, and always doing all the work conservatively.
Basic SSA optimizations such as constant propagation, copy propagation or common subexpression propagation are clearly missing from the assembly, the register allocator is also pretty bad, even though there are simple algorithms for that sort of thing that perform decently.
So even though the generated code works, I feel like something's gone majorly wrong inside the compiler.
The 300k LoC things isnt encouraging either, its way too much for what the code actually does.
I just want to point out, that I think a competent-ish dev (me?) could build something like this (a reasonably accurate C compiler), by a more human-in-the-loop workflow. The result would be much more reasonable code and design, much shorter, and the codebase wouldn't be full of surprises like it is now, and would conform to sane engineering practices.
Honestly I would certainly prefer to do things like this as opposed to having AI build it, then clean it up manually.
And it would be possible without these fancy agent orchestration frameworks and spending tens of thousands of dollars on API.
This is basically what went down with Cursor's agentic browser, vs an implementation that was recreated by just one guy in a week, with AI dev tools and a premium subscription.
There's no doubt that this is impressive, but I wouldn't say that agentic sofware engineering is here just yet.
CCC was and is a marketing stunt for a new model launch. Impressive, but still suffers from the same 80:20 rule. These 20% are optimizations, and we all know where the devel in “let me write my own language”.
Vibe coding is entertainment. Nothing wrong about entertainment, but when totally clueless people connect to their bank account, or control their devices with vibe coded programs, someone will be entertained for sure.
Large language models and small language models are very strong for solving problems, when the problem is narrow enough.
They are above human average for solving almost any narrow problem, independent of time, but when time is a factor, let's say less than a minute, they are better than experts.
An OS kernel is exactly a problem, that everyone prefers to be solved as correct as possible, even if arriving at the solution takes longer.
The author mentions stability and correctness of CCC, these are properties of Rust and not of vibe coding. Still impressive feat of claude code though.
Ironically, if they populated the repo first with objects, functions and methods with just todo! bodies, be sure the architecture compiles and it is sane, and only then let the agent fill the bodies with implementations most features would work correctly.
I am writing a program to do exactly that for Rust, but even then, how the user/programmer would know beforehand how many architectural details to specify using todo!, to be sure that the problem the agent tries to solve is narrow enough? That's impossible to know! If the problem is not narrow enough, then the implementation is gonna be a mess.
The prospect of going the last mile to fix the remaining problems reminds me of the old joke:
"The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."
I’ve always heard/repeated it as: “The first 90% is easy, it’s the second 90% that gets you. No one’s willing to talk about the third 90%.”
Yeah, this is why I dont get the argument that LLMs are good for bootstrapping. Especially anything serious.
Sure these things can technically frontload a lot of work at the beginning of a project, but I would argue the design choices made at the beginning of a project set the tone for the entire project, and its best those be made with intention, not stochastic text extruders.
Lets be real these things are shortcut machines that appeal to people's laziness, and as with most shortcuts in life, they come with consequences.
Have fun with your "Think for me SaaS" im not going to let my brain atrophy to the point where my competency is 1:1 correlated to the quantity and quality or tokens I have access too.
Nice article. I believe the Claude C Compiler is an extraordinary research result.
The article is clear about its limitations. The code README opens by saying “don’t use this” which no research paper I know is honest enough to say.
As for hype, it’s less hyped than most university press releases. Of course since it’s Anthropic, it gets more attention than university press.
I think the people most excited are getting ahead of themselves. People who aren’t impressed should remember that there is no C compiler written in Rust for it to have memorized. But, this is going to open up a bunch of new and weird research directions like this blog post is beginning to do.
This is a conjecture: modern chips are optimized to make the output code style of GCC/Clang go fast. So, the compilers optimize for the chip, and the chip optimizes for the popular compilers.
This compiler experiment mirrors the recent work of Terence Tao and Google. The "recipe" is an LLM paired with an external evaluator (GCC) in a feedback loop.
By evaluating the objective (successful compilation) in a loop, the LLM effectively narrows the problem space. This is why the code compiles even when the broader logic remains unfinished/incorrect.
It’s a good example of how LLMs navigate complex, non-linear spaces by extracting optimal patterns from their training data. It’s amazing.
p.s. if you translate all this to marketing jargon, it’ll become “our LLM wrote a compiler by itself with a clean room setup”.
Edit: typo
I don't understand how this isn't a bigger deal. Why are people are quibbling about how it isn't a particularly good C compiler. It seems earth shattering that an AI can write a C compiler in the first place.
Am I just old? "How did they fit those people into the television?!"
Seeing that Claude can code a compiler doesn't help anyone if it's not coded efficiently, because getting it to be efficient is the hardest part, and it will be interesting seeing how long it will take to make it efficient. No one is gonna use some compiler that makes binaries run 700x longer.
I'm surprised that this wasn't possible before with just a bigger context size.
> Someone got it working on Compiler Explorer and remarked that the assembly output “reminds me of the quality of an undergraduate’s compiler assignment”. Which, to be fair, is both harsh and not entirely wrong when you look at the register spilling patterns.
This is what I've noticed about most LLM generated code, its about the quality of an undergrad, and I think there's a good reason for this - most of the code its been trained on is of undergrad quality. Stack overflow questions, a lot of undergrad open source projects, there are some professional quality open source projects (eg SqlLite) but they are outweighed by the mass of other code. Also things like Sqllite don't compare to things like Oracle or Sql Server which are proprietary.
They should have gone one step further and also optimized for query performance (without editing the source code).
I have cough AI generated an x86 to x86 compiler (takes x86 in, replaces arbitrary instructions with functions and spits x86 out), at first it was horrible, but letting it work for 2 more days it was actually close to only 50% to 60% slowdown when every memory read instruction was replaced.
Now that's when people should get scared. But it's also reasonable to assume that CCC will look closer to GCC at that point, maybe influenced by other compilers as well. Tell it to write an arm compiler and it will never succeed (probably, maybe can use an intermeriadry and shove it into LLVM and it'll work, but at that point it is no longer a "C" compiler).
One missing analysis, that IMHO is the most important right now, is : what is the quality of the generated code ?
Having LLM generates a first complete iteration of a C compiler in rust is super useful if the code is of good enough quality that it can be maintained and improved by humans (or other AIs). It is (almost) completely useless otherwise.
And that is the case for most of today's code generated by AIs. Most of it will still have to be maintained by humans, or at least a human will ultimately be responsible for it.
What i would like to see is whether that C compiler is a horrible mess of tangled spaghetti code with horrible naming. Or something with a clear structure, good naming, and sensible comments.
> with a clear structure, good naming, and sensible comments.
Additionally there is the additional problem, that LLM comments often represent what the code would be supposed to do, not what it actually does. People write comments to point out what was weird during implementation and what they found out during testing the implementation. LLM comments seems more to reflect the information present before writing the implementation, i.e. the use it as an internal check list what to generate.
In my opinion deceiving comments are worse than no comments at all.
I curious, maybe AI learn too much code from human writed compilers. What if invent a fresh new language, and let AI write the compiler, if the compiler works well I think that is the true intelligent.
I think AI will definitely help to get new compilers going. Maybe not the full product, yet. But it helps a lot to create all the working parts you need to get going. Taking lengthy specs and translating them into code is something AI does quite well - I asked it to give me a disassembler - and it did well. So, if you want to make a new compiler, you now don't have to read all the specs and details beforehand. Just let the AI mess with e.g. PE-Headers and only take care later if something in that area doesn't work.
Great article but you have to keep in mind that it was pure marketing, the real interesting question is to pass the same benchmark to CC an ask it to optimize in a loop, and see how long it takes for it to come up with something decent.
That’s the whole promise to reach AGI that it will be able to improve itself.
I think Anthropic ruined this by releasing it too early would have been way more fun to have seen a live website where you can see it iterating and the progress is making.
> CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI.
It would be interesting to compare the source code used by CCC to other projects. I have a slight suspicion that CCC stole a lot of code from other projects.
It's less impressive when you realize CCC happily compiles invalid C without emitting any errors.
Gcc and clang are part of the training set, the fact that it did as bad as it did is what’s shocking
There are lots of C compilers (LCC, TCC, SDCC, an army of hobby projects C compilers) available as open-source.
I am curious about what results would be for something like a lexer + parser + abstract machine code generator generation for a made up language
You, know, it sure does add some additional perspective to the original Anthropic marketing materia... ahem, I mean article, to learn that the CCC compiled runtime for SQLite could potentially run up to 158,000 times slower than a GCC compiled one...
Nevertheless, the victories continue to be closer to home.
What does the smallest (simplest in terms of complexity / lines of code) C-compiler that can compile and run SQLite look like?
Perhaps that would be a more telling benchmark to evaluate the Claude compiler against.
I don't know for certain that it can compile and run SQLite, but the smallest C compiler I know of is SectorC: <https://xorvoid.com/sectorc.html>
Not as simple as it could be but I doubt anyone will manage to beat Fabrice Bellard: https://www.bellard.org/tcc/
It seems like if Anthropic released a super cool and useful _free_ utility (like a compiler, for example) that was better than existing counterparts or solved a problem that hadn’t been solved before[0] and just casually said “Here is this awesome thing that you should use every day. By the way our language model made this.” it would be incredible advertising for them.
But they instead made a blog post about how it would cost you twenty thousand dollars to recreate a piece of software that they do not, with a straight face, actually recommend that you use in any capacity beyond as a toy.
[0] I am categorically not talking about anything AI related or anything that is directly a part of their sales funnel. I am talking about a piece of software that just efficiently does something useful. GCC is an example, Everything by voidtools is an example, Wireshark is an example, etc. Claude is not an example.
They made a blog post about it because it's an amazing test of the abilities of the models to deliver a working C-compiler, even with lots of bugs and serious caveats, for $20k of tokens, without a human babysitting it.
I'd challenge anyone who are negative to this to try to achieve what they did by hand, with the same restrictions (e.g. generating full SSA form instead of just directly emitting code, capable of compiling Linux), and log their time doing it.
Having written several compilers, I'll say with some confidence that not many developers would succeed. Far fewer would succeed fast enough to compete with $20k cost. Even fewer would do that and deliver decent quality code.
Now notice the part where they've done this experiment before. This is the first time it succeeded. Give it another model iteration or two, and expect quality to increase, and price to drop.
This is the new floor.
>without a human babysitting it.
From the original blog post:
>Every agent would hit the same bug, fix that bug, and then overwrite each other's changes. Having 16 agents running didn't help because each was stuck solving the same task.
>The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC
The blog post used the word autonomous a lot, which I suppose is true if Nicholas Carlini is not a human being but in fact a Claude agent.
>I'd challenge anyone who are negative to this to try to achieve what they did by hand, with the same restrictions (e.g. generating full SSA form instead of just directly emitting code, capable of compiling Linux), and log their time doing it.
Why would anyone do that? My point was that why does the company _not_ make a useful tool? I feel like that is a much more interesting topic of discussion than “why aren’t people that aren’t impressed by this spending their time trying to make this company look good?”
>This is the new floor.
Aside from the notion that they maybe intentionally set out to create the least useful or valuable output from their tooling (eg ‘the floor’) when they did not say that they did that, my question was “Why do they not make something genuinely useful?”. Marketing speak and imaginary engineers failing at made up challenges does not answer that question.
> The blog post used the word autonomous a lot, which I suppose is true if Nicholas Carlini is not a human being but in fact a Claude agent.
Nothing in the article suggests it did not autonomously do the work.
> Why would anyone do that?
Because a lot of naysayers here pretend as if this is somehow trivial.
> My point was that why does the company _not_ make a useful tool?
Useful to whom? This is a researcher testing the limits of the models. Knowing those limits is highly useful to Anthropic. And it's highly useful to lots of others too, like me, as a means of understanding the capabilities of these models.
What, exactly would such a tool that'd somehow make the people dismissing this change their minds look like? Because I don't think anything would. They could produce lots of useful tools, if they aimed lower than testing the limits of the model. But it would not achieve what they set out to do, and it would not tell us anything useful.
I produce "useful tools" with Claude every day. That's not interesting. Anyone who actually uses these tools properly will develop a good understanding of the many things that can be achieved with them.
Most of us can't spend $20k figuring out where the limits are, however.
> I feel like that is a much more interesting topic of discussion than “why aren’t people that aren’t impressed by this spending their time trying to make this company look good?”
This is a ridiculous misrepresentation of the point. The point is that the people who aren't impressed by this very clearly and obviously do not have an understanding of the complexity of what they achieved, and are making ignorant statements about it.
> Aside from the notion that they maybe intentionally set out to create the least useful or valuable output from their tooling (eg ‘the floor’)
Again, you're either entirely failing to understand, or wilfully misrepresenting what I said. No, their goal was not to "set out the create the least useful or valuable output". Their goal was to test the limits of what the model can achieve. They did that.
That has far higher value than not testing the limits. Lots, and lots of people are building tools with Claude without testing the limits. We would not learn anything from that.
> my question was “Why do they not make something genuinely useful?”
Because that wasn't the purpose. The purpose was to test the limits of what the model can achieve. That you struggle to understand why what they achieved was massively impressive, does not change that.
> Nothing in the article suggests it did not autonomously do the work.
I don’t know how to respond to that other than to ask you to quote the part of the blog post where the author described the language model running into a problem that it could not fix and then described the details of how he manually intervened to fix the problem that the language model could not fix when you elaborate on your definition of “nothing” in that sentence.
>Every agent would hit the same bug, fix that bug, and then overwrite each other's changes. Having 16 agents running didn't help because each was stuck solving the same task.
>The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC
As for:
> Because a lot of naysayers here pretend as if this is somehow trivial.
This is an answer to “why do you want someone to do that?” You have already established that you would like that to happen. It doesn’t answer “why would a real human being (who is not you) that isn’t impressed by the compiler that doesn’t work put their time into making Anthropic look good?”
As for this, that’s a good question but I would say the bare minimum would be “useful”
> What, exactly would such a tool that'd somehow make the people dismissing this change their minds look like?
It is pretty common for tech companies to release free useful software. For example pytorch, react, Hack/hhvm etc. from Meta
https://opensource.fb.com/projects/
Or chromium from Google. Chromium is a good example, there’s a decent chance that you’re using a chromium based browser to read this. There’s also a ton of other stuff, golang comes to mind as another example.
https://opensource.google/
Or if you want stuff made by a company that’s a fraction the valuation of Anthropic, there’s Campfire and Writebook by 37signals. https://once.com/
> Because that wasn't the purpose.
I know that. That was the premise of my question.
I saw that they put a bunch of resources into making something that is not useful and asked why they did not put a bunch of resources into that was useful. Surely they could make something that is both useful and made their model look good?
For me it seems like the obvious answer would be either that they can’t make something useful:
> Their goal was to test the limits of what the model can achieve. They did that.
Or they don’t want to
> Because that wasn't the purpose.
I was asking if anyone had any substantive knowledge or informed opinion about whether it was one or the other but it seems like you’re saying it’s… both? They don’t want to make and release a useful tool and also they can not make and release a useful tool because this compiler, which is not useful, is the limit of what their model can achieve.
Like you want us all to know that they cannot and do not want to make any sort of useful tool. That is your clearly-stated opinion about their desires and capabilities. And also you want these “naysayers”, who are not you, to put their time and effort into… also not making something useful? To prove… what?
I wonder how much more it would take Anthropic to make CCC on par with, or even better than, GCC.
Give me self hosting: LLM generates compiler which compiles LLM training and inference suite, which then generates compiler which...
I wonder how well an LLM would do for a new CPU architecture for which no C compiler exists yet, just assembler.
> I wonder how well an LLM would do for a new CPU architecture for which no C compiler exists yet, just assembler.
Quite well, possibly.
Look, I wasn't even aware of this until it popped up a few days ago on HN, I am not privy to the details of Anthropics engineers in general, or the specific engineer who curated this marathon multi-agent dev cycle, but I can tell you how anyone familiar with compilers or programming language development will proceed:
1. Vibe an IL (intermediate language) specification into existence (even if it is only held in RAM as structures/objects)
2. Vibe some utility functions for the IL (dump, search, etc)
3. Vibe a set of backends, that take IL as input and emit ISA (Instruction Set Architecture), with a set of tests for each target ISA
4. Vibe a front-end that takes C language input and outputs the IL, with a set of tests for each language construct.
(Everything from #2 onwards can be done in parallel)
I have no reason to believe that the engineer who vibe-coded CCC is anything other than competent and skillful, so lets assume he did at least the above (TBH, he probably did more)[1].
This means that CCC has, in its code, everything needed to vibe a never-before-seen ISA, given the ISA spec. It also means it has everything needed to support a new front-end language as long as it is similar enough to C (i.e. language constructs can map to the IL constructs).
So, this should be pretty easy to expand on, because I find it unlikely that the engineer who supervised/curated the process would be anything less than an expert.
The only flaw in my argument is that I am assuming that effort from CC was so large because it did the src -> IL -> ISA route. If my assumption is wrong, it might be well-nigh impossible to add support for a new ISA.
------------------------------
[1] When I agreed to a previous poster on a previous thread that I can recreate the functionality of CCC for $20k, these are the steps I would have followed, except I would not have LLM-generated anything.
Does it work better for the intended purpose than their browser experiments? No… no it doesn’t
It might be interesting to feed this report in and see what the coding agent swarm can improve on.
I had no idea that SQLite performance was in fact compiler-dependent. The more you know!
The performance of any software is compiler-dependent.
Now that we have seen this can be done, the next question is how much effort it takes to improve it 1%. And then the next 1%. Can we make consistent improvements without spending more and more compute on each step.
Why don't LLMs directly generate machine code?
Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.
> Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I offered to do it, but without a deadline (I work f/time for money), only a cost estimation based on how many hours I think it should take me: https://news.ycombinator.com/item?id=46909310
The poster I responded to had claimed that it was not possible to produce a compiler capable of compiling a bootable Linux kernel within the $20k cost, nor for double that ($40k).
I offered to do it for $40k, but no takers. I initially offered to do it for $20k, but the poster kept evading, so I settled on asking for the amount he offered.
Correct me if I am wrong. But Claude has probably been trained on gcc, so why oh why doesn't it one shot a faster and better compiler?
The time will come (and it's not far off) when LLM agents will be able to RE the program and re-implement it just by pointing to the program's directory.
We'll see how fun that will be for these big corporations.
For example: "Hey, Claude, re-implement Adobe Photoshop in Rust."
Did Anthropic release the scaffolding, harnesses, prompts, etc. they used to build their compiler? That would be an even cooler flex to be able to go and say "Here, if you still doubt, run this and build your own! And show us what else you can build using these techniques."
That would still require someone else to burn 20000$ to try it themselves.
Yes which gets paid to Anthropic, so it'll be a win-win for them.
This is a good example of ALL AI slop. You get something barely working, and are faced with the next problem:
- Deal with legacy code from day one.
- Have mess of a codebase that is most likely 10-20x the amount of LOC compared to human code
- Have your program be really slow and filled with bugs and edge cases.
This is the battlefield for programmers. You either just build the damn thing or fix bugs for the next decade.
The level of discourse I've seen on HN about this topic is really disappointing. People not reading the actual article in detail, just jumping to conclusions "it basically copied gcc" etc etc. Taking things out of context, or worse completely misrepresenting what the author of the article was trying to communicate.
We act so superior to LLMs but I'm very unimpressed with humanity at this stage.
mehh
But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler
/s
This is actually a nice case study in why agentic LLMs do kind of think. It's by no means the same code or compiler. It had to figure out lots and lots of problems along the way to get to the point of tests passing.
> But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler /s
Why the sarcasm tag? It is almost certainly trained on several compiler codebases, plus probably dozens of small "toy" C compilers created as hobby / school projects.
It's an interesting benchmark not because the LLM did something novel, but because it evidently stayed focused and maintained consistency long enough for a project of this complexity.
Since Claude Code can browse the web, is it fair to think of it as “rewriting and simplifying a compiler originally written in C++ into Rust”?
In the original post Anthropic did point out that Claude Code did not have access to the internet
Presumably it had access to GCC (and LLVM/Clang) sources in it's training data? All of which are hosted or mirrored on Github.
I would've guess so, but I was talking about it in a "does Claude Code (not the model) have access to the internet?", which, according to Anthropic, it didn't.
And all of which are in an entirely different language, and which use pretty different architectures to this compiler.