Amid a push toward AI agents, with Anthropic and OpenAI shipping multi-agent tools this week, Anthropic is more than ready to show off some of its most daring AI coding experiments. But as is usual with claims of AI-related achievements, you’ll find some key caveats below.
On Thursday, anthropic researcher Nicholas Carlini published a blog post describing how he unleashed 16 instances of the company’s Claude Opus 4.6 AI model into a shared codebase with minimal oversight, tasking them with building a C compiler from scratch.
Over two weeks and nearly 2,000 Claude Code sessions costing about $20,000 in API fees, the AI model agents reportedly produced a 100,000-line Rust-based compiler capable of building a bootable Linux 6.9 kernel on x86, ARM, and RISC-V architectures.
Carlini, a research scientist on Anthropic’s Safeguards team who previously spent seven years at Google Brain and DeepMind, used a new feature released with Claude Opus 4.6 called “agent teams.” In practice, each Claude instance ran inside its own Docker container, cloning a shared Git repository, reclaiming tasks by writing locked files, and then pushing the entire code back up. No orchestration agent directed traffic. Each instance independently identified the problem that seemed most obvious to work on next and began solving it. When merge conflicts arose, the AI model instances resolved them on their own.
The resulting compiler, which Anthropic has released on GitHubcan build a variety of major open source projects, including PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. It achieved a 99 percent pass rate on the CCG’s suite of torture tests and, in what Carlini called “the ultimate developer litmus test,” compiled and ran Condemn.
It’s worth noting that a C compiler is a near-ideal task for coding semi-autonomous AI models: the specification is decades old, and complete, well-defined test suites already exist, and there’s a healthy reference compiler to compare against. Most real-world software projects have none of these advantages. The hard part of most development is not writing code that passes tests; it’s about figuring out what the tests should be in the first place.
