The Code-Only Agent

Recorded: Jan. 19, 2026, 10:03 a.m.

Original

Summarized

The Code-Only Agent • Rijnard van Tonder

Rijnardvan Tonder

𝕏 @rvtond

Home

← Back to all posts

The Code-Only Agent

When Code Execution Really is All You Need

If you're building an agent, you're probably overwhelmed. Tools.
MCP. Subagents. Skills. The ecosystem pushes you toward complexity,
toward "the right way" to do things. You should know: Concepts like
"Skills" and "MCP" are actually outcomes of an
ongoing learning process of humans figuring stuff out. The
space is wide open for exploration. With this mindset I
wanted to try something different. Simplify the assumptions.

What if the agent only had
one tool? Not just any tool, but the most powerful one. The
Turing-complete one: execute code.

Truly one tool means: no `bash`, no `ls`, no `grep`. Only
execute_code. And you enforce it.

When you watch an agent run, you might think: "I wonder what tools
it'll use to figure this out. Oh look, it ran `ls`. That makes
sense. Next, `grep`. Cool."

The simpler Code-Only paradigm makes that question irrelevant. The
question shifts from "what tools?" to "what code will it produce?"
And that's when things get interesting.

execute_code: One Tool to Rule Them All
Traditional prompting works like this:

> Agent, do thing

> Agent
responds
with thing

Contrast with:

> Agent, do thing

> Agent
creates and runs code
to do thing

It does this every time. No, really,
every
time. Pick a runtime for our Code-Only agent, say Python. It needs
to find a file? It writes Python code to find the file and executes
the code. Maybe it runs rglob. Maybe it does os.walk.

It needs to create a script that crawls a website? It doesn't write
the script to your filesystem (reminder: there's no
create_file tool to do that!). It
writes code to output a script that crawls a website.1

We make it so that there is literally no way for the agent to
do anything productive without
writing code.

So what? Why do this? You're probably thinking, how is this useful?
Just give it `bash` tool already man.

Let's think a bit more deeply what's happening. Traditional agents
respond with something. Tell it to find some DNA pattern across 100
files. It might `ls` and `grep`, it might do that in some
nondeterministic order, it'll figure out an answer and
maybe you continue interacting because it missed a directory or you
added more files. After some time, you end up with a conversation of
tool calls, responses, and an answer.

At some point the agent might even write a Python script to do this
DNA pattern finding. That would be a lucky happy path, because we
could rerun that script or update it later... Wait, that's handy...
actually, more than handy... isn't that
ideal? Wouldn't it be better if we told it to write a script at the
start? You see, the Code-Only agent doesn't need to be told to write
a script. It
has
to, because that's literally the only way for it to do anything of
substance.

The Code-Only agent produces something more precise than an answer
in natural language. It produces a code witness of an
answer. The answer is the output from running the code. The agent
can interpret that output in natural language (or by writing code),
but the "work" is codified in a very literal sense. The Code-Only
agent doesn't respond with something. It produces a code witness
that outputs something.

Try ❯❯ Code-Only plugin for Claude Code

Code witnesses are semantic guarantees

Let's follow the consequences. The code witness must abide by
certain rules: The rules imposed by the language runtime semantics
(e.g., of Python). That's not a "next token" process. That's not a
"LLM figures out sequence of tool calls, no that's not what I
wanted". It's piece of code. A piece of code! Our one-tool agent has
a wonderful property: It went through latent space to produce
something that has a defined semantics, repeatably runnable, and
imminently comprehensible (for humans or agents alike to reason
about). This is nondeterministic LLM token-generation projected into
the space of Turing-complete code, an executable description of
behavior as we best understand it.

Is a Code-Only agent really enough, or too extreme? I'll be frank: I
pursued this extreme after two things (1) inspiration from articles in Further Reading below (2) being annoyed at agents for not comprehensively and
exhaustively analyzing 1000s of files on my laptop. They would skip,
take shortcuts, hallucinate. I knew how to solve part of that
problem: create a
programmatic
loop and try have fresh instances/prompts to do the work
comprehensively. I can rely on the semantics of a loop written in
Python. Take this idea further, and you realize that for anything
long-running and computable (e.g., bash or some tool), you actually
want the real McCoy: the full witness of code, a trace of why things
work or don't work. The Code-Only agent
enforces
that principle.

Code-Only agents are not too extreme. I think they're the only way
forward for computable things. If you're writing travel blog posts,
you accept the LLMs answer (and you don't need to run tools for
that). When something is computable though, Code-Only is the only
path to a
fully trustworthy
way to make progress where you need guarantees (subject to
the semantics that your language of choice guarantees, of course). When I say
guarantees, I mean that in the looser sense, and also in a
Formal
sense. Which beckons: What happens when we use a language like
Lean with some of the
strongest guarantees? Did we not observe that
programs are proofs?

This lens says the Code-Only agent is a producer of proofs,
witnesses of computational behavior in the world of
proofs-as-programs. An LLM in a loop forced to produce proofs, run
proofs, interpret proof results. That's all.

Going Code-Only

So you want to go Code-Only. What happens? The paradigm is simple,
but the design choices are surprising.

First, the harness. The LLM's output is code, and you execute that
code. What should be communicated back? Exit code makes sense. What
about output? What if the output is very large? Since you're running
code, you can specify the result type that running the code should
return.

I've personally, e.g., had the tool return results directly if under
a certain threshold (1K bytes). This would go into the session context.
Alternatively, write the results to a JSON
file on disk if it exceeds the threshold. This avoids context blowup and the result tells the
agent about the output file path written to disk. How best to pass
results, persist them, and optimize for size and context fill are
open questions. You also want to define a way to deal with `stdout`
and `stderr`: Do you expose these to the agent? Do you summarize
before exposing?

Next, enforcement. Let's say you're using Claude Code. It's not
enough to persuade it to always create and run code. It turns out
it's surprisingly twisty to force Claude Code into a single tool
(maybe support for this will improve). The best plugin-based
solution I found is a tool PreHook that catches banned tool uses.
This wastes some iterations when Claude Code tries to use a tool
that's not allowed, but it learns to stop attempting filesystem
reads/writes. An initial prompt helps direct.

Next, the language runtime. Python, TypeScript, Rust, Bash. Any
language capable of being executed is fair game, but you'll need to
think through whether it works for your domain. Dynamic languages
like Python are interesting because you can run code natively in the
agent's own runtime, rather than through subprocess calls. Likewise
TypeScript/JS can be injected into TypeScript-based agents (see
Further Reading).

Once you get into the Code-Only mindset, you'll see the potential
for composition and reuse. Claude Skills define reusable processes
in natural language. What's the equivalent for a Code-Only agent?
I'm not sure a Skills equivalent exists yet, but I anticipate it
will take shape soon: code as building blocks for specific domains
where Code-Only agents compose programmatic patterns. How is that
different from calling APIs? APIs form part of the reusable blocks,
but their composition (loops, parallelism, asynchrony) is what a
Code-Only agent generates.

What about heterogeneous languages and runtimes for our `execute_tool`? I don't think we've thought that far yet.

Further Reading

The agent landscape is quickly evolving. My thoughts on how the
Code-Only paradigm fits into inspiring articles and trends, from
most recent and going back:

prose.md
(Jan 2026) — Code-Only reduces prompts to executable code (with
loops and statement sequences). Prose expands prompts into natural
language with program-like constructs (also loops, sequences,
parallelism). The interplay of natural language for agent
orchestration and rigid semantics for agent execution could be
extremely powerful.

Welcome to Gas Town
(Jan 2026) — Agent orchestration gone berserk. Tool running is the low-level
operation at the bottom of the agent stack. Code-Only fits as the
primitive: no matter how many agents you orchestrate, each one
reduces to generating and executing code.

Anthropic Code Execution with MCP article
(Nov 2025) — MCP-centric view of exposing MCP servers as code API
and not tool calls. Code-Only is simpler and more general. It
doesn't care about MCP, and casting the MCP interface as an API is
a mechanical necessity that acknowledges the power of going
Code-Only.

Anthropic Agent Skills article
(Oct 2025) — Skills embody reusable processes framed in natural
language. They can generate and run code, but that's not their
only purpose. Code-Only is narrower (but computationally
all-powerful): the reusable unit is always executable. The analog
to Skills manifests as pluggable executable pieces: functions,
loops, composable routines over APIs.

Cloudflare Code Mode article
(Sep 2025) — Possibly the earliest concrete single-code-tool
implementation. Code Mode converts MCP tools into a TypeScript API
and gives the agent one tool: execute TypeScript. Their insight is
pragmatic: LLMs write better code than tool calls because of
training data. In its most general sense, going Code-Only doesn't
need to rely on MCP or APIs, and encapsulates all code execution
concerns.

Ralph Wiggum as a "software engineer"
(Jul 2025) — A programmatic loop over agents (agent
orchestration). Huntley describes it as "deterministically bad in
a nondeterministic world". Code-Only inverts this a bit:
projection of a nondeterministic model into deterministic
execution. Agent orchestration on top of an agent's Code-Only
inner-loop could be a powerful combination.

Tools: Code is All You Need
(Jul 2025) — Raises code as a first-order concern for agents.
Ronacher's observation: asking an LLM to write a script to
transform markdown makes it possible to reason about and trust the
process. The script is reviewable, repeatable, composable.
Code-Only takes this further where every action becomes a script
you can reason about.

How to Build an Agent
(Apr 2025) — The cleanest way to achieve a Code-Only agent today
may be to build it from scratch. Tweaking current agents like
Claude Code to enforce a single tool means friction. Thorsten's
article is a lucid account for building an agent loop with tool
calls. If you want to enforce Code-Only, this makes it easy to do
it yourself.

What's Next

Two directions feel inevitable. First, agent orchestration. Tools
like prose.md let you compose
agents in natural language with program-like constructs. What
happens when those agents are Code-Only in their inner loop? You get
natural language for coordination, rigid semantics for execution.
The best of both.

Second, hybrid tooling. Skills work well for processes that live in
natural language. Code-Only works well for processes that need
guarantees. We'll see agents that fluidly mix both: Skills for
orchestration and intent, Code-Only for computation and precision.
The line between "prompting an agent" and "programming an agent"
will blur until it disappears.

Try ❯❯ Code-Only plugin for Claude Code

1There is something beautifully
quine-like
about this agent. I've always
loved quines.

Timestamped 9 Jan 2026

The Code-Only Agent, as conceptualized by Rijnard van Tonder, represents a radical departure from traditional agent architectures that rely on multiple tools, subagents, or predefined skills. At its core, this paradigm asserts that the sole tool an agent needs is code execution itself—a Turing-complete mechanism capable of addressing virtually any task. Van Tonder argues that this approach simplifies the agent’s design by eliminating the complexity of managing disparate tools like `bash`, `ls`, or `grep`. Instead, every action—whether searching for files, parsing data, or interacting with external systems—is reduced to generating and executing code. This not only streamlines the agent’s workflow but also ensures that its outputs are precise, repeatable, and open to verification. The author emphasizes that the Code-Only Agent’s reliance on code execution transforms it into a “code witness,” producing executable artifacts that serve as both the solution and the proof of its correctness. By framing tasks in terms of code rather than natural language or tool calls, the agent avoids the pitfalls of ambiguity and inefficiency inherent in traditional methods.

The article critiques conventional agents for their tendency to employ a patchwork of tools, which can lead to inconsistencies, missed details, or even hallucinations when processing complex tasks. For instance, an agent tasked with analyzing 1,000 files might use a combination of `ls` and `grep`, but this approach risks skipping directories or misinterpreting results. In contrast, the Code-Only Agent does not need to be explicitly instructed to write a script; it is forced by its design to generate code as the primary means of achieving any goal. This ensures that even mundane tasks, such as file searching or web scraping, are handled through programmatically generated solutions. Van Tonder illustrates this with examples like using Python’s `os.walk` or `rglob` to locate files, or writing code that outputs a script for web crawling without saving it to the filesystem. The result is a system where every action is codified, creating a traceable and reproducible process. This approach also aligns with the principle of “executable descriptions of behavior,” where the agent’s outputs are not just answers but fully functional programs that can be rerun, modified, or analyzed.

A critical component of the Code-Only Agent’s design is its enforcement mechanism, which ensures that no alternative tools are used. Van Tonder acknowledges the challenges of implementing this in practice, particularly with models like Claude Code that may resist being confined to a single tool. He describes strategies such as using PreHook plugins to intercept and block prohibited actions, even if this leads to redundant iterations. The author also highlights the importance of managing output efficiently, as large datasets or complex computations could overwhelm the agent’s context. Solutions include returning results directly if they fall under a size threshold or writing them to disk and referencing the file path. Additionally, the handling of `stdout` and `stderr` requires careful consideration to balance transparency with usability. These technical challenges underscore the need for thoughtful design choices, such as selecting an appropriate runtime language—Python, TypeScript, Rust, or Bash—that aligns with the agent’s intended use cases. Dynamic languages like Python offer advantages by allowing code to run natively within the agent’s environment, while statically typed languages may provide stronger guarantees for certain applications.

Beyond technical implementation, the Code-Only Agent raises philosophical and practical questions about the nature of computation and trustworthiness. Van Tonder argues that code execution provides a form of “semantic guarantee” rooted in the runtime semantics of the chosen language. Unlike natural language responses, which are prone to interpretation errors, code is a formal construct that can be executed and verified. This is particularly valuable in scenarios requiring rigorous guarantees, such as data analysis or system automation, where even minor errors can have significant consequences. The author references the concept of “proofs-as-programs,” suggesting that Code-Only Agents function as producers of computational proofs, with their code serving as both the solution and the evidence of its validity. This perspective is further reinforced by the potential integration of languages like Lean, which offer formal verification capabilities. However, Van Tonder also acknowledges the limitations of this approach: while code provides a high degree of reliability, it is still subject to the semantics and constraints of its implementation. The trustworthiness of a Code-Only Agent, therefore, depends on the robustness of its underlying programming language and the fidelity of its execution environment.

The article also explores the broader implications of the Code-Only paradigm for agent orchestration and hybrid systems. Van Tonder speculates that future agents will likely blend Code-Only execution with other paradigms, such as natural language skills or API-based tools. For example, while Skills (as defined by Anthropic) enable reusable processes framed in natural language, the Code-Only approach offers a more precise and executable alternative. He envisions a future where these two models coexist, with Skills handling high-level coordination and Code-Only agents managing low-level computation. This hybrid model could leverage the strengths of both approaches, combining the flexibility of natural language with the rigor of code. Additionally, the article points to trends like “prose.md,” which uses natural language constructs to orchestrate agents, as a potential complement to Code-Only systems. By using natural language for coordination and code for execution, such hybrid agents could achieve a balance between human readability and machine precision.

Van Tonder also addresses the practical challenges of adopting the Code-Only paradigm, including the need for custom infrastructure and the limitations of existing tools. He notes that while platforms like Anthropic’s MCP (Model-Controller Protocol) or Cloudflare’s Code Mode provide some support for code-centric agents, they are not inherently designed for the extreme simplicity of a single-tool system. This has led to workarounds, such as using PreHook plugins to enforce code-only behavior or adapting existing agents through custom modifications. The author acknowledges that building a Code-Only Agent from scratch may be the most straightforward approach, though it requires careful consideration of factors like runtime selection, output management, and error handling. He also highlights the importance of further research into heterogeneous runtimes, where different languages or execution environments could be integrated into a unified Code-Only framework.

Finally, the article reflects on the philosophical underpinnings of the Code-Only Agent, framing it as a reaction to the growing complexity of agent ecosystems. Van Tonder critiques the “right way” mentality that often drives the development of tools and skills, arguing that it can stifle innovation by prioritizing established patterns over novel solutions. By stripping away extraneous components and focusing on code execution, the Code-Only Agent embodies a more minimalist and principled approach to agent design. This perspective is echoed in the author’s appreciation for quines—programs that generate their own source code—as a metaphor for the self-referential nature of the Code-Only paradigm. Ultimately, Van Tonder sees the Code-Only Agent not as a replacement for traditional methods but as a complementary approach that challenges developers to rethink how agents interact with the world. As he concludes, the future of agent systems may lie in a synthesis of Code-Only execution with other paradigms, creating a flexible and powerful framework for tackling computational tasks.