It's Not Just X. It's Y

Recorded: May 31, 2026, 11 p.m.

Original

Summarized

It's Not Just X. It's Y.

Home
About Me
Artist Portfolio
Critical AI Course

It's Not Just X. It's Y.

Eryk Salvaggio

31 May 2026
— 8 min read

A recent experiment testing the limits of noise-generation in diffusion models paired with verbs in the kinetic identification dataset, which has nothing to do with the topic of this post.

Against the Quantification of IntegrityWhen the measure of language becomes its target, it ceases to be good language.💡Nerd Rating: 1/5. I discuss the origins of certain linguistic tics in LLMs and what it means for writing, student assessment, and thinking."It's not x, it's y."Large Language Models gravitate toward this type of construction, called negative parallelism. It has its uses: it sets up a contrast. It's useful, especially, for reframing assumptions: "You think it's like that, but it's really like this." It's all over social media, especially on LinkedIn, and the construction has sparked a backlash amid an ongoing war against automated language production. If you use em-dashes – you might be a bot. If you describe things that delve, quietly, or genuinely (or create lists of three, like that one), you might be a bot. Recent overuse by language models has led many to declare it bad writing. I'm not so sure. Nobody called JFK a lazy writer when he said, "ask not what your country can do for you – ask what you can do for your country." Negative parallelism is a rhetorical device, and any rhetorical device is only as lazy or inspired as what it contains.Automated Language ProductionNow, we have AI detectors that claim to protect you from the witch hunt by looking for these patterns. You take your own writing and you run it through Grammarly, which will analyze word patterns that AI detectors might flag. Then it offers ideas for how to change them, which a) gives Grammarly the power to write for you and b) makes your writing lose any sense of rhythm or intent. Grammarly's review of this section has flagged 27 examples of text I should change to avoid the accusation that I am a machine. For example, Grammarly identified the above phrase – "automated language production" – as 11 times more likely to be AI. It suggests that a human would be "against mechanized language synthesis" instead. The simple two-word combo, "align with" was flagged as 43x more likely to be AI-generated. Real humans say "corresponds." These are small suggestions that add up until the result resembles nothing I chose. The human voice replaced by a machine trying to sound human. As a result, I just paid Pangram – another AI-detection company – $20 to verify that a recently submitted journal article wasn't AI-generated before submission. It wasn't, and I knew it wasn't. It agreed. That's what I paid for: not to learn whether I wrote it, but to be told it wouldn't flag me. Because if Pangram's AI system found me guilty, that's the end of my career. That's literally extortion. And if it had flagged it, then what? It would give me a score (four valuations: high, very likely, somewhat likely, human) to assign my integrity a category. In the ecosystem we're all building, I'd have to use Grammarly to rephrase everything: using a machine to write for me to prove that I didn't use a different machine to write for me.A Culture Hostile to ReasonOur instinct in making sense of these machines is to examine the training data. That training data is no longer "just the Web." The web is the raw meat, but this sausage is heavily pre- and post-processed. Post-training optimizes the model for whatever it's designed to do. This includes techniques such as RLHF (reinforcement learning with human feedback) and RLVR (reinforcement learning through verified rewards). RLHF has humans rank replies, then the system emphasizes those kinds of replies. RLVR is weirder, and I suspect it's why we see "It's not X, it's Y" so often. Dismissing negative parallelism as lazy gets in the way of understanding why it's showing up everywhere. This type of language is such a powerful framework for thinking that we mistake it for a model's capacity for thought. We credit computation for the work that's done by language. Weird Dogs RLVR isn't a structure that watches for words and triggers some sub-process. Instead, you train a model, like you would any model. When that model is done, it predicts tokens. Lots of people are still in denial about this. Token prediction involves producing a list of candidates based on their mathematical distribution in the training data, ranking them by their likelihood given the previous words in the prompt or sequence. RLVR intervenes by having the model solve math problems by writing their way to a solution, reproducing the language we would use when thinking out loud about how to solve it. When the model arrives at the correct answer, the language it used most often to get there is then emphasized in the finished model. This is (partly) what the industry calls reasoning. What day was it that we saw that weird dog?So, think of it like this: You are sitting with a friend. Your phones are dead. Your friend asks: what day was it that we saw that weird dog? You start by saying, "It was Thursday." Your friend says: "No, it wasn't Thursday, because Thursday I was out of town." So you say that's right, so it must have been Wednesday, because Wednesday was your mutual friend's birthday, and you both went to the party, and you saw the dog on the way to the party. Your friend says: "That's right, except, Wednesday was our friend's birthday but the party was on Friday. So we must have seen the dog on Friday." The two of you have articulated your way to the answer, a verifiable one: you could pop on your phones and check your photos and see that yes, the weird dog picture was taken on Friday. In dehumanizing terms, your gut instinct ("it's Thursday") is what a model might spit out at first guess, and that's where models used to stop. But you didn't. Your friend countered: "It wasn't [Thursday], it was [Wednesday]." There are more words, which narrow the window of possible answers, and then you arrive, through "its-not-x-its-y-ing," at the correct date. The two of you had actual memories and visceral experiences to work with. Language was the vessel through which these experiences were communicated and conflicts were resolved. The model, by contrast, extends language in longer and longer bursts, replicating the pattern of reasoning you two just engaged in. These longer runs re-enact that deliberation within language rather than through it. Other high-entropy states get filled by words like "suppose..." which triggers longer speculative passages. "Because," "consider," "alternatively," even "wait" can occupy these positions. These are words that lead to language that brings contrast, exceptions, and abstraction along for the ride. If they get to a correct answer on a math problem, they get pushed to occur more often. The Reason We ReasonWhen we talk about a weird dog or have conversations like it, the point of the question was not to identify the date on the calendar when the dog was encountered. It was an opening for a reminiscence. It was posed to reconstruct the memory, to revel in its surrounding context, and to deepen a connection between friends through a shared experience. Defining reasoning this way assumes that the point of asking a question is to get an answer, that answers can be verified, and that nothing is lost in immediate closure. Defining reasoning the way it has been used in LLMs assumes that the point of asking a question is to get an answer, that answers can be verified, and that nothing is lost in immediate closure. This has real effects on writing, and the openness to doubt is something we lose in the rapid prototyping of thought that occurs with a language model. Ambiguity, doubt, and uncertainty matter more to some ways of thinking than any immediate answer. The inner life grows in the spaces between the industrial complexes that harness every remnant of our externalized thought. Nonetheless, the language we use in these states is the same. When AI detectors flag text as AI-generated, is it because it follows a certain structural pattern of that reasoning? Pangram and reasoning models both detect structural patterns based on how humans reason when writing. Pangram's model is trained on pre-2021 data; it then inserts AI-generated versions of the same text into its training. So, if we publicly shame people whose text looks like it might have been written by a machine – because it mimics the language used for human reasoning – and people stop writing in ways that they internalize as "AI writing" out of fear of false detection, it sends a signal that your language for reasoning must be policed, or you too could be held up to public scrutiny. In the end, shaming people for writing that gets flagged as AI can lead people to sidestep structures the model has learned from us: structures that are effective tools for argumentation. We take the tools of critical thinking out of the kit at the time we most need them.For Good MeasureThere's another angle to this. An AI-based essay assessment tool was tested in the UK against human graders. The system rewarded writing structures that I can't help notice look a lot like RLVR-based reasoning: "giving out higher marks based on essay length, vocabulary range and sentence complexity, which are often unrelated to academic standards," all of which are hallmarks of AI reasoning. In other words, the LLM grades humans based on the criteria engineers use to assess the LLM. The LLM grades humans based on the criteria engineers use to assess the LLM.There's this old adage from economics called Goodhart's law. The econo-nese version of it is that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." Or: when a measure becomes a target, it ceases to be a good measure. It could be tweaked to apply to large language models: "when the measure of language becomes its target, it ceases to be good language." There is danger in evaluating for language patterns over its content, and both generation and detection incentivize this. Automated grading is somewhere between the two: rewarding students for employing the form of reason over the act of reasoning will only make them more tempting and more common. And yet, punishing the form risks punishing reason. Ultimately, we have to think critically in all cases, instead of deferring to the judgments of machines.Against Automatic ThinkingI'm not convinced by the old "if you haven't done anything wrong, you don't have anything to worry about" line. I've seen 99.8% cited as a measure of accuracy in automated surveillance systems since 2018. As Arvind Narayanan has noted, that is on a per-paper basis, which compounds every time we use it. So up to 10% of college students could be falsely accused. If we collectively run every bit of text through an AI model to check whether it is AI-generated, we will generate false positives on an even larger scale. These models concentrate real authority; companies promise they will reason on our behalf. We normalize something dangerous when we run every two-line phrase through an AI interpreter, post the result online, and say "see? They're plagiarists!" We create a culture of self-censorship and AI-detector-pressured rewriting and paraphrasing as people strive to avoid these witch hunts. That is the opposite of protecting human expression. We should resist normalizing a trust in any machine's ability to determine matters of guilt. If using AI to write is, at its worst, an industrialization of the mind, then AI detection, at its worst, becomes a surveillance system for thought. Monthly, for the Second Week in a Row. Thanks for reading! As mentioned last week, I am only a sporadic poster these days, aiming for once a month. If you're paying for the newsletter and would like to calibrate your donations (or would like to start supporting it!) you are very welcome to set up or change your subscription here.

Entropy Studies
Thinking Through Temporal Mode Collapse

💡Nerd Rating: 3/5. It's about why video models stop. There's technical talk, but also pretty videos to look at, so it's a wash.

I've been doing a series of experiments: taking diffusion-based image glitches and applying

By Eryk Salvaggio
24 May 2026

The Computer Science Fetish
On the Valorization of Technical Authority

💡Nerd Alert: 1/5. Readable, but also mostly venting about weird encounters at a conference in San Francisco.

The academic AI critic is often either bored or angry. Bored because everyone keeps saying the same things, or angry because all the same things need

By Eryk Salvaggio
29 Mar 2026

Toward a Critical Agentic Systems Design Practice
For designers, who will choose for themselves and for the rest of us.

💡Nerd Rating: 3/5. Remarks prepared for the "From Interface to Agency: A New Discourse for Design and AI" forum on design practices for emerging technology at UC Berkeley, which this year focused on Design

By Eryk Salvaggio
22 Mar 2026

Modeling Language with Plaster
💡Nerd Rating 3.75/5: This post is about a late-19th-century debate in academic mathematics, but it's plainly written and relevant to LLMs.

What is a model, anyway?

Doing math used to involve touch. We used plaster and wooden models, weird cubist-looking sculptural objects you turn over in

By Eryk Salvaggio
08 Mar 2026

Cybernetic Forests

Sifting Through the Techno-Cultural Debris.

Email sent

The examination of language in the context of large language models reveals complex tensions regarding the nature of reasoning, integrity, and automated evaluation. The author begins by addressing the linguistic construction known as negative parallelism, observed in LLMs, which employs structures like "it's not X, it's Y" to establish contrast and reframe assumptions. While this style has sparked criticism regarding writing quality, the author argues that rhetorical devices should be judged by their content rather than solely by their form, suggesting that dismissing this structure as lazy overlooks its power as a framework for thought.

This linguistic scrutiny extends to the development of automated language production and detection systems. Tools like Grammarly and AI detectors operate by analyzing word patterns, attempting to enforce a specific rhythm and voice that mimics human writing. The author critiques how these systems inadvertently impose a machine-like standard on human expression, suggesting that the process of detecting AI authorship risks replacing authentic human voice with an artificial construct by forcing writers to adopt machine-like patterns. The reliance on these detectors, such as those used for verifying journal articles, raises further concerns about integrity and potential extortion, forcing users to engage in a cycle of rewriting to evade detection, which ultimately undermines genuine expression.

The author then delves into the mechanisms behind how models process information, focusing on training data practices such as Reinforcement Learning with Human Feedback (RLHF) and Reinforcement Learning through Verified Rewards (RLVR). These post-training processes shape the model's capacity for "reasoning" by emphasizing linguistic patterns derived from the training data. The author posits that dismissing negative parallelism as mere laziness prevents a deeper understanding of why certain language patterns become prevalent in models, suggesting that the structural patterns themselves reflect a particular mode of thinking rather than simply indicating a lack of effort.

A parallel is drawn between human deliberation and model processing through an analogy involving reconstructing a memory. Human reasoning, when faced with a question, involves an engagement with ambiguity, doubt, and visceral experience, allowing for the exploration of multiple possibilities before reaching a verified conclusion. In contrast, language models generate responses by replicating these deliberative patterns through longer sequences of text, often using words that introduce contrast, exceptions, and abstraction, such as "because" or "consider." This suggests that the model replicates the *surface* of reasoning rather than engaging in the full, uncertain process that defines human thought.

When defining reasoning through the lens of an LLM, the focus shifts from achieving an answer to the immediate closure of an inquiry. This approach prioritizes verifiable answers over the openness to doubt and ambiguity that are integral to critical thinking. The author contends that in the rapid prototyping of thought, uncertainty and the space between immediate answers are where the "inner life grows." This distinction is critical when addressing automated grading, observing that AI-based assessment tools often reward writing structures that mirror the criteria engineers use to evaluate the models themselves, effectively grading human work based on the metrics of the machine.

The implications of this dynamic are explored in the context of automated thinking and surveillance. The pursuit of catching AI-generated text can create a culture of self-censorship, where individuals internalize the need to conform to detected patterns, inhibiting genuine argumentation and critical thinking. Furthermore, relying on AI detection risks creating a surveillance system for thought, as falsely flagging human writing can lead to the policing of language structures that are nonetheless effective tools for argumentation. Ultimately, the argument culminates in a call to resist deferring to machine judgments and instead emphasize the necessity of critical thinking when assessing both the form and the substance of communication.