Addressing the adding situation
Recorded: Dec. 3, 2025, 3:04 a.m.
| Original | Summarized |
Addressing the adding situation — Matt Godbolt’s blog Matt Godbolt's blog Menu Tags AI Archive AI About About me Addressing the adding situation Probably not what you were thinking, right? x86 is unusual in mostly having a maximum of two operands per instruction2. There’s no add instruction to add edi to esi, putting the result in eax. On an ARM machine this would be a simple add r0, r0, r1 or similar, as ARM has a separate destination operand. On x86, things like add are not result = lhs + rhs but lhs += rhs. This can be a limitation, as we don’t get to control which register the result goes into, and we in fact lose the old value of lhs. This post is day 2 of Advent of Compiler Optimisations 2025, The Linux system I’m compiling for here passes parameters in edi and esi, and expects the result in eax. We’ll cover calling conventions later in the series. ↩ Though some AVX instructions and some multiplies do allow a separate destination. ↩ As someone who grew up with 6502, and then 32-bit ARM, coming to the x86 ISA was quite a shock. The x86 is truly a “Complex Instruction Set Computer”. ↩ Three-operand meaning we can specify two source registers and a separate destination, unlike add which overwrites one of its operands. ↩ Those top bits should be zero, as the ABI requires it: the compiler relies on this here. Try editing the example above to pass and return longs to compare. ↩ Permalink Filed under: Posted at 06:00:00 CST on 2nd December 2025. About Matt Godbolt Matt Godbolt is a C++ developer living in Chicago. Copyright 2007-2025 Matt Godbolt. |
Addressing the Adding Situation – Matt Godbolt’s Blog This post, part of the Advent of Compiler Optimizations 2025 series, delves into a deceptively complex topic: the addition of two integers on the x86 architecture. Matt Godbolt, a C++ developer at Hudson River Trading, unveils a clever technique employed by compilers to circumvent the limitations of x86’s instruction set. The core observation revolves around how x86’s memory addressing system can be exploited to perform calculations, effectively transforming a simple addition into a sophisticated addressing operation. The traditional view of x86 assembly often presents a scenario where the `add` instruction accepts a limited number of operands, typically two. However, Matt Godbolt demonstrates that this isn’t necessarily the case. The x86 architecture possesses a remarkably flexible memory addressing system, allowing operands to be accessed using complex addressing modes. This system allows for referencing memory locations based on constant offsets, relative to registers, or even register values multiplied by scales of 1, 2, 4, or 8, all within a single instruction. This approach contrasts significantly with architectures like ARM, where a separate ‘add’ instruction explicitly specifies both the source operands and the destination register. The post highlights a specific use case within the Linux system being compiled. Parameters are passed into the `edi` and `esi` registers, mirroring the ABI (Application Binary Interface) conventions. The result, fundamentally a 32-bit value, is then placed in the `eax` register. This illustrates a key point: compilers don't just blindly execute instructions; they strategically utilize architectural features to optimize performance to a degree that is often hidden from the programmer’s direct view. A crucial element of this optimization hinges on the ‘load effective address’ instruction, often abbreviated as ‘lea’. Unlike the `add` instruction which overwrites one of the operands, `lea` calculates the address of a memory location without physically accessing its contents. This is particularly useful when the compiler needs to perform complex addressing calculations, such as shifting and adding, to arrive at the desired memory address. This allows for a flexible computation where one might need to perform several steps before arriving at the destination. The post emphasizes that, regardless of the intention to perform a 32-bit addition, the x86 architecture unconditionally treats the operation as a 64-bit calculation, owing to this robust system. The upper bits of the `eax` register are discarded when the result (the 32-bit value) is written to the destination, reflecting a further layer of architectural peculiarity. This emphasizes the importance of understanding the underlying hardware and instruction set. This detail is underscored by the requirement of maintaining the ABI, especially pertinent when considering long data types. By understanding the ABI, the compiler can operate in a consistent fashion and ensure correct integration with the system. Furthermore, Matt Godbolt points out the advantages of this approach, particularly concerning the multiple execution units within x86 processors. The ‘lea’ instruction allows for parallel execution, significantly reducing the overall processing time. The compiler’s ability to recognize and leverage these features showcases the sophisticated optimization strategies employed by modern compilers. The post implicitly advocates for a deeper understanding of computer architecture as a means of writing more efficient code. |