LmCast :: Stay tuned in

Why xor eax, eax?

Recorded: Dec. 2, 2025, 3:04 a.m.

Original Summarized

Why xor eax, eax? — Matt Godbolt’s blog

Matt Godbolt's blog

Menu

Tags

AI
Amusing Stuff
AoCO2025
Blog
Coding
Compiler Explorer
Emulation
Games
Microarchitecture
New Zealand Trip
Personal
Python
Rants
Rust
WeeBox Project

Archive

AI
Amusing Stuff
AoCO2025
Blog
Coding
Compiler Explorer
Emulation
Games
Microarchitecture
New Zealand Trip
Personal
Python
Rants
Rust
WeeBox Project

About

About me
Contact me

Why xor eax, eax?
Written by me, proof-read by an LLM.
Details at end.
In one of my talks on assembly, I show a list of the 20 most executed instructions on an average x86 Linux desktop. All the usual culprits are there, mov, add, lea, sub, jmp, call and so on, but the surprise interloper is xor - “eXclusive OR”. In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine. It’s surprising then, that a Linux machine just minding its own business, would be executing so many.
That is, until you remember that compilers love to emit a xor when setting a register to zero:

We know that exclusive-OR-ing anything with itself generates zero, but why does the compiler emit this sequence? Is it just showing off?
In the example above, I’ve compiled with -O2 and enabled Compiler Explorer’s “Compile to binary object” so you can view the machine code that the CPU sees, specifically:
31 c0 xor eax, eax
c3 ret

If you change GCC’s optimisation level down to -O1 you’ll see:
b8 00 00 00 00 mov eax, 0x0
c3 ret

The much clearer, more intention-revealing mov eax, 0 to set the EAX register to zero takes up five bytes, compared to the two of the exclusive OR. By using a slightly more obscure instruction, we save three bytes every time we need to set a register to zero, which is a pretty common operation. Saving bytes makes the program smaller, and makes more efficient use of the instruction cache.
It gets better though! Since this is a very common operation, x86 CPUs spot this “zeroing idiom” early in the pipeline and can specifically optimise around it: the out-of-order tracking systems knows that the value of “eax” (or whichever register is being zeroed) does not depend on the previous value of eax, so it can allocate a fresh, dependency-free zero register renamer slot. And, having done that it removes the operation from the execution queue - that is the xor takes zero execution cycles!1 It’s essentially optimised out by the CPU!
You may wonder why you see xor eax, eax but never xor rax, rax (the 64-bit version), even when returning a long:

In this case, even though rax is needed to hold the full 64-bit long result, by writing to eax, we get a nice effect: Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free. So xor eax, eax sets all 64 bits to zero.
Interestingly, when zeroing the “extended” numbered registers (like r8), GCC still uses the d (double width, ie 32-bit) variant:

Note how it’s xor r8d, r8d (the 32-bit variant) even though with the REX prefix (here 45) it would be the same number of bytes to xor r8, r8 the full width. Probably makes something easier in the compilers, as clang does this too.
xor eax, eax saves you code space and execution time! Thanks compilers!
See the video that accompanies this post.

This post is day 1 of Advent of Compiler Optimisations 2025,
a 25-day series exploring how compilers transform our code.
This post was written by a human (Matt Godbolt) and reviewed and proof-read by LLMs and humans.
Support Compiler Explorer on Patreon
or GitHub,
or by buying CE products in the Compiler Explorer Shop.

It still has to retire, so some on-chip resources are still allocated to it. ↩

Permalink

Filed under:

Coding
AoCO2025

Posted at 06:00:00 CST on 1st December 2025.

About Matt Godbolt

Matt Godbolt is a C++ developer living in Chicago.
He works for Hudson River Trading on super fun but secret things.
He is one half of the Two's Complement podcast.
Follow him on Mastodon
or Bluesky.

Copyright 2007-2025 Matt Godbolt.
Unless otherwise stated, all content is licensed under the
Creative Commons Attribution-Noncommercial 3.0 Unported License.
This blog is powered by the MalcBlogSystem by Malcolm Rowe.
Note: This is my personal website. The views expressed on
these pages are mine alone and almost certainly not those of my employer.

Matt Godbolt’s blog post explores a seemingly counterintuitive optimization technique employed by modern x86 compilers: the use of `xor eax, eax` to set a register to zero. The article highlights several key aspects of this optimization, revealing a complex interplay between compiler behavior, CPU microarchitecture, and code efficiency.

The core observation is that compilers consistently emit this instruction sequence, even when a more explicit `mov eax, 0` instruction would achieve the same result. This seemingly obfuscated instruction, which utilizes the exclusive OR operation, yields a surprising advantage. The author details this advantage through three primary mechanisms. First, because the `xor eax, eax` instruction is less frequent, the CPU’s out-of-order execution system can efficiently allocate a dedicated “zero register renamer slot” for `eax`. This means the register is freshly zeroed, independent of its prior values, minimizing execution cycles. Secondly, the compiler saves three bytes of code compared to using the `mov eax, 0` instruction. Reducing code size leads to greater program efficiency, particularly beneficial when considering the instruction cache's limited capacity. Finally, the CPU can ‘optimize out’ the `xor eax, eax` operation entirely, eliminating the execution cycles associated with the instruction.

The post further clarifies that this optimization is most pronounced when dealing with 64-bit registers, such as `rax`. The author describes how writing to an ‘e’ register (e.g., `eax`) automatically zeroes the upper 32 bits, contributing to the overall zeroing effectiveness. The optimization isn't equally effective with extended registers like ‘r8’, which requires the 32-bit ‘xor r8d, r8d’ operation.

The discussion extends to the potential influence of compiler design, noting that clang similarly employs this ‘zeroing idiom’. The article concludes by emphasizing the cumulative effect of these seemingly minor optimizations, reinforcing the significant impact of compiler optimizations on performance and code size. The blog post provides a fascinating and detailed view into the hidden workings of modern compilers, showcasing a pragmatic approach to resource utilization and a deep understanding of CPU microarchitecture.