Why xor eax, eax?
Recorded: Dec. 2, 2025, 3:04 a.m.
| Original | Summarized |
Why xor eax, eax? — Matt Godbolt’s blog Matt Godbolt's blog Menu Tags AI Archive AI About About me Why xor eax, eax? We know that exclusive-OR-ing anything with itself generates zero, but why does the compiler emit this sequence? Is it just showing off? If you change GCC’s optimisation level down to -O1 you’ll see: The much clearer, more intention-revealing mov eax, 0 to set the EAX register to zero takes up five bytes, compared to the two of the exclusive OR. By using a slightly more obscure instruction, we save three bytes every time we need to set a register to zero, which is a pretty common operation. Saving bytes makes the program smaller, and makes more efficient use of the instruction cache. In this case, even though rax is needed to hold the full 64-bit long result, by writing to eax, we get a nice effect: Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free. So xor eax, eax sets all 64 bits to zero. Note how it’s xor r8d, r8d (the 32-bit variant) even though with the REX prefix (here 45) it would be the same number of bytes to xor r8, r8 the full width. Probably makes something easier in the compilers, as clang does this too. This post is day 1 of Advent of Compiler Optimisations 2025, It still has to retire, so some on-chip resources are still allocated to it. ↩ Permalink Filed under: Posted at 06:00:00 CST on 1st December 2025. About Matt Godbolt Matt Godbolt is a C++ developer living in Chicago. Copyright 2007-2025 Matt Godbolt. |
Matt Godbolt’s blog post explores a seemingly counterintuitive optimization technique employed by modern x86 compilers: the use of `xor eax, eax` to set a register to zero. The article highlights several key aspects of this optimization, revealing a complex interplay between compiler behavior, CPU microarchitecture, and code efficiency. The core observation is that compilers consistently emit this instruction sequence, even when a more explicit `mov eax, 0` instruction would achieve the same result. This seemingly obfuscated instruction, which utilizes the exclusive OR operation, yields a surprising advantage. The author details this advantage through three primary mechanisms. First, because the `xor eax, eax` instruction is less frequent, the CPU’s out-of-order execution system can efficiently allocate a dedicated “zero register renamer slot” for `eax`. This means the register is freshly zeroed, independent of its prior values, minimizing execution cycles. Secondly, the compiler saves three bytes of code compared to using the `mov eax, 0` instruction. Reducing code size leads to greater program efficiency, particularly beneficial when considering the instruction cache's limited capacity. Finally, the CPU can ‘optimize out’ the `xor eax, eax` operation entirely, eliminating the execution cycles associated with the instruction. The post further clarifies that this optimization is most pronounced when dealing with 64-bit registers, such as `rax`. The author describes how writing to an ‘e’ register (e.g., `eax`) automatically zeroes the upper 32 bits, contributing to the overall zeroing effectiveness. The optimization isn't equally effective with extended registers like ‘r8’, which requires the 32-bit ‘xor r8d, r8d’ operation. The discussion extends to the potential influence of compiler design, noting that clang similarly employs this ‘zeroing idiom’. The article concludes by emphasizing the cumulative effect of these seemingly minor optimizations, reinforcing the significant impact of compiler optimizations on performance and code size. The blog post provides a fascinating and detailed view into the hidden workings of modern compilers, showcasing a pragmatic approach to resource utilization and a deep understanding of CPU microarchitecture. |