Building Pi with Pi

Recorded: May 25, 2026, 4 a.m.

Original

Summarized

Building Pi With Pi | Armin Ronacher's Thoughts and Writings

Armin Ronacher's Thoughts and Writings

blog
archive
projects
travel
talks
about

Building Pi With Pi
written on May 24, 2026
Pi is now part of Earendil, but in the important sense it is
still Mario’s project. He has been living with its
issue tracker longer than I have, and he has been exposed to the weirdness of
the new form of agent traffic in Open Source projects for longer too. This post
is mostly a reflection of my own experience after spending more time in the
tracker, using Pi to work on Pi, and watching what I have learned about it so
far.
Slop Issues
Unsurprisingly, we are using Pi to build Pi. That sounds like a cute dogfooding
thing but it really helps understand what we do. An interesting effect of
building with agents is that it changes the role of the issue tracker a tiny
bit. The issue descriptions are not just messages from a user to a maintainer
because we also use them as inputs for prompts in Pi sessions. It is something
I might hand to my clanker1 and say: “understand this, reproduce it, inspect
the code, and propose a fix.”
That means the shape of the issue matters in a new way. A bad issue was always
annoying, but at least a lot of issues were vague. Now we are also dealing with
a class of issues that are 5% human and 95% clanker-generated and largely
inaccurate shit. A bad issue that contains a plausible but wrong diagnosis
creates extra work.
The most frustrating failure mode right now is that people submit issues that
are not in their own voice. They contain an observed problem somewhere, but it
has been thrown into a clanker and the clanker reworded it and made a huge mess
of it. Typically, it was prompted so badly that the conclusions produced are
more often than not inaccurate but always full of confidence. The result is
complete guesswork on root causes, fake-minimal repros, suggested implementation
strategies, analogies to adjacent but often the wrong code, and long lists of
error classes that might or might not matter.
That is worse than no diagnosis.
I don’t want to point to specific issues because I really do not want to bad
mouth anyone, but it is frustrating. It is also frustrating because when I give
that issue to Pi, Pi sees the wrong diagnosis too. It does not treat the issue
body as a rumor. It treats it as evidence. It will happily go down the path
that the issue already prepared for it, because the prose is confident and the
code references look plausible. We use a custom slash command called /is,
which specifically has this instruction in it:

Do not trust analysis written in the issue. Independently verify behavior and
derive your own analysis from the code and execution path.

Unfortunately, it does not fully work, because when humans first throw their
issue through the clanker wringer, their clanker expands scope almost
immediately. What was once a very narrow and fact based bug observation, turns
into a much expanded surface area full of hypotheses. So at least personally, I
increasingly want issue reports to be condensed to what the human actually
observed:

I ran this command.
I expected this to happen.
This happened instead.
Here is the exact error or log.

That is enough. If you used an LLM to understand the problem, great, maybe
leave it as a follow-up comment. But the issue and the issue text should be
something you own. If you do not know the root cause, say that. I too can
operate a clanker, and I would rather do this myself than use your slop. If
your repro is a guess, say that. If the only hard fact is one stack trace, give
me the stack trace and stop there.
Slop Begets Slop
That we’re seeing issues full of slop is just a result of the present day
quality of these machines. Sadly, their failures in creating good issues
extend to a lot of code that is generated. Not all of it, but a lot of code.
Over and over I keep running into them over-engineering the hell out of issues
and implementations.
If you tell them that “this malformed session log crashes the reader,” the
clanker
will often add a tolerant reader. Then it will add a fallback, then maybe a
migration, then more debug output, then a test for all of this. None of this is
necessarily wrong in isolation, but it can be the wrong move for the system.
At Pi’s core is a rather well-designed session log with invariants that must be
upheld. The clanker’s present-day behavior is to just assume that no such
invariants exist, and instead to make the system work with all kinds of
malformedness, blowing up the complexity in the process.
Almost always, the correct fix is not to handle the bad state, but to make the
bad state impossible. This matters a lot for persisted data such as Pi session
logs. They are opened, branched, compacted, exported, shared, and analyzed.
The goal here is to never write bad session data. Yet if you just let the
clanker roam freely, it will attempt to handle every case of bad data in the
session log with a more permissive reader.
I have complained about this plenty, but working on Pi’s code base continues to
reinforce the point. This is one of the ways LLM authored code grows so much
needless complexity. All these models see a local failure and try to locally
defend against it. As maintainers we have to keep pulling the conversation back
to the global invariant, which is harder than it should be, and it’s laborious.
Volume Is The Problem
Then there is the issue of volume. The tracker is receiving a lot of issues and
PRs, and a significant fraction of them are clearly LLM-assisted. Some are
good, none are excellent, and most are just bad. The total throughput is a
maintenance problem by itself.
As you might know, Pi’s issue tracker is automated to close all issues and pull
requests from new contributors, and there is a manual process by which we might
reopen some of them or approve individuals. So auto-close -> reopen -> close
again is an interesting statistic for us to look at.
I pulled the public GitHub tracker data while writing this over the last 90
days. Excluding Earendil members, that leaves 3,145 external issues and pull
requests. Of those, 2,504 were auto-closed because they were from non-approved
individuals. 17% were reopened. For pull requests the number is worse: less
than 10% were merged.

Many of the issues and PRs are complete slop and in some cases the humans did
not even realize that they created them. Sources of low-quality spam include
OpenClaw instances, as well as some skills that people put into their context
that seemingly encourage issue creation.
GitHub clearly is not built to deal with this new form of Open Source, but I’m
increasingly feeling the need to put the blame less on GitHub than on all the
people involved who make that experience painful. If your clanker shits on
someone else’s issue tracker then it’s not the fault of GitHub, it’s yours alone.
Careful Parallelism
Pi might be built with Pi, but we’re quite far off today from where Bun and
OpenClaw already are: fully detached, automated software engineering. Maybe we
will reach that point, I don’t know. Today it does not seem like we know how to
pull off a dark factory and we also don’t yet have the desire. That said, there
is quite a bit of parallelism going on, and it is mostly for reproducing issues.
The small setup we use for this is three tiny pieces in Pi’s own committed
.pi folder. /is (for
analyze issue) is a prompt for analyzing GitHub issues: it labels and assigns
the issue, reads the full thread and links, then explicitly tells the agent not
to trust the analysis in the issue and to derive its own diagnosis from the
code. Then an extension adds a prompt-url-widget which watches the prompt
before the agent starts, recognizes the GitHub issue or PR URL that /is (or
the PR equivalent) put into the prompt, fetches the title and author with gh,
renders that in a little UI widget, and renames the session. It also rebuilds
that state on session start or session switch, so if we reopen an older
investigation the window still tells the developer which issue it belongs to.
In practice this means it’s possible to have several Pi windows open, each
running /is against a different issue, and the UI keeps the investigations
visually distinct while the agents do their independent reproduction and code
reading. Once the investigations are done, one can work through them
sequentially. To finish off everything, /wr (wrap it up) is the matching
wrap-up prompt: it infers the GitHub context from the session, updates the
changelog, drafts or posts the final issue comment with a disclaimer, commits
only the files changed in that session, adds the appropriate closes #... when
there is exactly one issue, and pushes from main.

Open Source Is About Hard Problems Worth Fixing
You will have noticed this already but Open Source in a post-AI world is under a
strange new pressure. We are getting more code, more projects, and more issues.
Projects appear with no real users, or a temporary audience of one, and even
projects with thousands of stars can have a shelf life of weeks.
For us, Pi’s harness layer is worth maintaining carefully because it solves hard
coordination problems and creates a platform we and others can build on. We
also know that coordination and cooperation lifts us all up. Many times the
right answer is not to work around a problem locally, but to make the upstream
behavior correct. Mario has been very good at refusing to make Pi paper over
every misconfigured gateway, and we’re trying to preserve that discipline. When
a gateway behaves correctly, everybody benefits.
Sadly that type of thinking is quickly disappearing because these machines make
local workarounds cheap, so code accumulates local defenses against every
misbehavior. Instead of humans talking to humans about where a fix belongs, one
human and one machine work around the problem in isolation.
Keep in mind that AI has not increased the number of people who need software,
or the number of maintainers who can review it. It has mostly increased the
amount of code and the number of projects competing for attention. Some of that
is healthy, but a lot of it fragments effort that should be shared.
We need stronger foundations, not weaker ones. Open Source needs more
collaboration, not more isolated work with a machine. Human communication is
hard, and it is tempting to avoid it when you can sit alone with your clanker.
But isolation is not where Open Source derives its value. The value is in the
community and the structure that lets projects outlive their original creators.

To me, clanker is a much
preferable term for agent. Agency lies with humans, not with machines.
Calling these things agents I still believe is a mistake, but alas.↩

This entry was tagged

ai,
open-source and
pi

copy as / view markdown

© Copyright 2026 by Armin Ronacher.

Content licensed under the Creative Commons
Attribution-NonCommercial 4.0 International License.

Contact me via mail,
bluesky,
x, or
github.

You can sponsor me on github.

More info: imprint &
AI transparency.
Subscribe via atom / RSS.

Color scheme:
auto,
light,
dark.

Armin Ronacher reflects on his experience building with agents, specifically using Pi, and the resultant challenges encountered in the context of open source project management, focusing heavily on the quality of input data and the philosophical implications of integrating large language models into software development workflows.

The introduction of agents into the development process, such as using Pi to build Pi, highlights how the format of issue descriptions is fundamentally changing. Issue descriptions are no longer merely messages from users to maintainers; they are also used as inputs for agent prompts, allowing agents to perform complex tasks like understanding, reproducing, inspecting code, and proposing fixes. This change means the shape of an issue carries new weight. While previously bad issues were annoying, the current problem involves a class of issues that are largely generated by agents, which are often inaccurate and confidently incorrect, leading to extensive extra work when agents analyze this flawed input. The author notes that an issue containing a plausible but wrong diagnosis creates significant complexity, as the agent treats the flawed text as evidence rather than a rumor.

To mitigate this problem, the author advocates for more concise issue reporting. The ideal report should focus strictly on what the human actually observed, emphasizing factual statements such as running a command, an expected outcome, the actual outcome, and the exact error or log. If an LLM is used for diagnosis, the result should be relegated to a follow-up comment, as the issue itself must remain something owned by the human contributor. If the only verifiable fact is a stack trace, that should be sufficient.

This situation stems from the quality of the generated code and process materials. The author argues that over-engineering issues and implementations leads to "slop," which is compounded by the tendency of LLMs to assume the absence of system invariants. When developers prompt an agent based on malformed logs, the agent will attempt to accommodate the bad state by generating permissive code or fallback mechanisms, thereby increasing overall complexity in persistent data like session logs. The core principle for a correct fix is to make bad states impossible, rather than attempting to handle them. This is particularly crucial for session logs that are opened, shared, and analyzed.

Volume is another significant challenge, as the issue tracker is overwhelmed by the sheer quantity of LLM-assisted submissions. The author presents statistics indicating that a large fraction of posted issues and pull requests are low quality, and the pain felt by contributors should be directed toward the individuals creating the poor artifacts rather than the platform itself.

To manage the process of reproducing issues efficiently, the author outlines a specific operational setup within Pi, involving a sequence of commands and extensions. This system utilizes a slash command, /is, designed to force the agent to independently verify behavior and derive its own analysis from the code, explicitly distrusting the analysis provided in the issue. This is augmented by a prompt-url-widget that extracts context from GitHub URLs and displays it in a user interface, helping to maintain visual separation between multiple independent investigations. A final command, /wr, wraps up the investigation by inferring GitHub context, drafting comments, committing changes, and pushing results, streamlining the process of iterative debugging.

Philosophically, the author addresses the broader implications of open source in the age of AI. While the current tooling allows for local workarounds where a human works around a misconfigured gateway using a machine in isolation, this pattern fragments effort instead of promoting collective solutions. The author asserts that open source derives its true value from community collaboration and structure that allows projects to endure, emphasizing that agency belongs to humans, not machines. The necessity is to shift focus from isolated machine workarounds to stronger foundations built on human communication and shared coordination to solve hard, upstream problems.