Scaling long-running autonomous coding
Recorded: Jan. 20, 2026, 10:03 a.m.
| Original | Summarized |
Scaling long-running autonomous coding Simon Willison’s Weblog Scaling long-running autonomous coding. Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of "autonomous" coding agents: This post describes what we've learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens. They ended up running planners and sub-planners to create tasks, then having workers execute on those tasks - similar to how Claude Code uses sub-agents. Each cycle ended with a judge agent deciding if the project was completed or not. I think somebody will have built a full web browser mostly using AI assistance, and it won’t even be surprising. Rolling a new web browser is one of the most complicated software projects I can imagine[...] the cheat code is the conformance suites. If there are existing tests that it’ll get so much easier. I may have been off by three years, because Cursor chose "building a web browser from scratch" as their test case for their agent swarm approach: To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore the source code on GitHub. But how well did they do? Their initial announcement a couple of days ago was met with unsurprising skepticism, especially when it became apparent that their GitHub Actions CI was failing and there were no build instructions in the repo. This got me a working browser window! Here are screenshots I took of google.com and my own website: Honestly those are very impressive! You can tell they're not just wrapping an existing rendering engine because of those very obvious rendering glitches, but the pages are legible and look mostly correct. Recent articles First impressions of Claude Cowork, Anthropic's general agent - 12th January 2026
browsers ai generative-ai llms ai-assisted-programming coding-agents cursor conformance-suites Monthly briefing Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments. Pay me to send you less! Sponsor & subscribe Colophon |
Simon Willison’s weblog documents an experiment conducted by Wilson Lin at Cursor, a company focused on AI-assisted coding tools, to explore the scalability and effectiveness of autonomous coding agents. The project involved deploying a fleet of AI-driven agents to collaboratively build a web browser from scratch, with the system generating over 1 million lines of code across 1,000 files. The agents operated through a structured workflow: planners and sub-planners broke down the task into manageable components, workers executed these tasks, and a judge agent evaluated whether the project met its objectives. This approach mirrors techniques used in systems like Claude Code, which employs sub-agents to handle complex programming challenges. The experiment highlights advancements in AI’s ability to coordinate distributed tasks, suggesting a shift toward more autonomous software development processes. The project was framed as a test case for the capabilities of AI-assisted coding, with Lin and his team aiming to push the boundaries of what such systems could achieve. The chosen goal—building a web browser from scratch—was intentionally ambitious, as creating a modern browser involves integrating multiple technologies, including rendering engines, scripting capabilities, and user interface components. The agents’ output was initially met with skepticism, particularly due to the absence of clear build instructions and failing GitHub Actions CI checks in the early stages. However, these issues were addressed within 24 hours, with a revised README providing step-by-step guidance for compiling the project. The author of the blog post, Simon Willison, successfully built and tested the browser on macOS using commands like `cargo run --release --features browser_ui --bin browser`, resulting in a functional interface that displayed web pages such as Google.com and the author’s own website. While not perfect—there were visible rendering glitches—the browser demonstrated a level of completeness and functionality that underscored the progress made in AI-driven development. A key technical detail in the project was the use of Git submodules to incorporate specifications from the WhatWG and CSS-WG, ensuring that the agents had access to reference materials critical for adhering to web standards. This approach reflects a strategic decision to integrate authoritative documentation directly into the development process, enabling the AI agents to align their outputs with established technical requirements. The project also highlights the importance of conformance suites, which are sets of tests designed to validate compliance with standards. Willison had previously predicted in 2026 that by 2029, an AI-built web browser would become a common achievement, but the Cursor project appears to have accelerated this timeline. While he acknowledges that such AI-generated projects are unlikely to compete with established browsers like Chrome or Firefox in the near term, he expresses surprise at the rapid progress and the capability of the system to produce a functional browser. The blog post situates Cursor’s experiment within a broader context of recent advancements in AI-assisted programming. It references another project, HiWave, a browser engine developed in Rust that was announced on Reddit just weeks prior. This comparison suggests that multiple teams are exploring similar goals, leveraging AI to tackle complex software engineering challenges. The success of Cursor’s browser project also aligns with broader trends in the use of large language models (LLMs) for coding tasks, such as generating code snippets, debugging, and even managing entire development workflows. However, the blog post emphasizes that while these systems are impressive, they still face significant limitations in terms of reliability, scalability, and integration with existing software ecosystems. Willison’s analysis also touches on the implications of such experiments for the future of software development. He notes that while AI-assisted tools are becoming increasingly capable, they still require human oversight and refinement. The browser project, for instance, required manual intervention to address build issues and ensure compatibility with modern web standards. This underscores the current state of AI coding agents as complementary tools rather than fully autonomous systems. However, the project’s success demonstrates that AI can handle complex tasks when given clear objectives and structured workflows, suggesting a potential for future applications in areas such as rapid prototyping, automated testing, and collaborative coding environments. The blog post also reflects on the challenges of evaluating AI-generated code. While the Cursor project produced a large volume of code, its quality and correctness were not immediately apparent without rigorous testing. Willison’s hands-on experimentation with the browser highlights the importance of practical validation in assessing AI-driven development. The fact that the browser could render web pages, albeit with some imperfections, indicates that the agents were able to implement core functionalities effectively. However, the presence of rendering glitches suggests that certain aspects of the code—such as handling edge cases or optimizing performance—may still require human intervention. This observation aligns with broader discussions about the limitations of AI in understanding nuanced programming requirements and ensuring robustness. Another critical aspect of the project is its scale. Running hundreds of concurrent agents over a period of nearly a week required significant computational resources and coordination mechanisms. The use of planners and sub-planners to manage task distribution highlights the importance of structured workflows in large-scale AI coding initiatives. This approach contrasts with more ad-hoc methods where agents operate independently without centralized oversight, which can lead to inefficiencies and conflicts. By implementing a hierarchical system with distinct roles for planners, workers, and judges, Cursor’s project demonstrates how AI agents can be organized to tackle complex projects systematically. The blog post also touches on the broader implications of autonomous coding for the software industry. Willison speculates that while AI-generated browsers may not replace established products in the near future, they could serve as experimental platforms or niche tools. The ability to rapidly generate functional code opens new possibilities for innovation, particularly in domains where traditional development cycles are time-consuming or resource-intensive. However, he also cautions against overestimating the current capabilities of such systems, emphasizing that they are still in early stages and require refinement. In conclusion, the blog post presents a detailed account of Cursor’s experiment with autonomous coding agents, highlighting both its achievements and the challenges it faced. The project’s success in building a functional web browser underscores the potential of AI-assisted development, while also revealing the need for further improvements in reliability, efficiency, and integration with existing tools. Willison’s analysis provides a balanced perspective on the state of AI coding, acknowledging its progress while remaining cautious about its limitations. As the field continues to evolve, projects like this will likely play a key role in shaping the future of software development. |