LmCast :: Stay tuned in

Addressing the adding situation

Recorded: Dec. 3, 2025, 3:04 a.m.

Original Summarized

Addressing the adding situation — Matt Godbolt’s blog

Matt Godbolt's blog

Menu

Tags

AI
Amusing Stuff
AoCO2025
Blog
Coding
Compiler Explorer
Emulation
Games
Microarchitecture
New Zealand Trip
Personal
Python
Rants
Rust
WeeBox Project

Archive

AI
Amusing Stuff
AoCO2025
Blog
Coding
Compiler Explorer
Emulation
Games
Microarchitecture
New Zealand Trip
Personal
Python
Rants
Rust
WeeBox Project

About

About me
Contact me

Addressing the adding situation
Written by me, proof-read by an LLM.
Details at end.
Yesterday we saw how compilers zero registers efficiently. Today let’s look at something a tiny bit less trivial (though not by much): adding two integers. What do you think a simple x86 function to add two ints1 would look like? An add, right? Let’s take a look!

Probably not what you were thinking, right? x86 is unusual in mostly having a maximum of two operands per instruction2. There’s no add instruction to add edi to esi, putting the result in eax. On an ARM machine this would be a simple add r0, r0, r1 or similar, as ARM has a separate destination operand. On x86, things like add are not result = lhs + rhs but lhs += rhs. This can be a limitation, as we don’t get to control which register the result goes into, and we in fact lose the old value of lhs.
So how do compilers work around this limitation? The answer lies in an unexpected place - the sophisticated memory addressing system of the x86. Nearly every operand can be a memory reference - there’s no specific “load” or “store”; a mov can just refer to memory directly. Those memory references are pretty rich: you can refer to memory addressed by a constant, relative to a register, or relative to a register plus an offset (optionally multiplied by 1, 2, 4 or 8). Something like add eax, word ptr [rdi + rsi * 4 + 0x1000] is still a single instruction3!
Sometimes you don’t want to access the memory at one of these complex addresses, you just want to calculate what the address would be. Sort of like C’s “address-of” (&) operator. That’s what lea (Load Effective Address) does: it calculates the address without touching memory.
Why is this useful for addition? Well, if we’re not actually accessing memory, we can abuse the addressing hardware as a calculator! That complex addressing mode with its register-plus-register-times-scale is really just shifting and adding - so lea becomes a cheeky way to do three-operand addition4.
The compiler writes our simple addition in terms of the address of memory at rdi offset by rsi. We get a full add of two registers and we get to specify the destination too. You’ll notice that the operands are referenced as rdi and rsi (the 64-bit version) even though we only wanted a 32-bit add: because we are using the memory addressing system it unconditionally calculates a 64-bit address. However, in this case it doesn’t matter; those top bits5 are discarded when the result is written to the 32-bit eax.
Using lea often saves an instruction, is useful if both of the operands are still needed later on in other calculations (as it leaves them unchanged), and can execute on x86’s multiple execution units in the same cycle. Compilers know this though, so you don’t have to worry!
See the video that accompanies this post.

This post is day 2 of Advent of Compiler Optimisations 2025,
a 25-day series exploring how compilers transform our code.
This post was written by a human (Matt Godbolt) and reviewed and proof-read by LLMs and humans.
Support Compiler Explorer on Patreon
or GitHub,
or by buying CE products in the Compiler Explorer Shop.

The Linux system I’m compiling for here passes parameters in edi and esi, and expects the result in eax. We’ll cover calling conventions later in the series. ↩

Though some AVX instructions and some multiplies do allow a separate destination. ↩

As someone who grew up with 6502, and then 32-bit ARM, coming to the x86 ISA was quite a shock. The x86 is truly a “Complex Instruction Set Computer”. ↩

Three-operand meaning we can specify two source registers and a separate destination, unlike add which overwrites one of its operands. ↩

Those top bits should be zero, as the ABI requires it: the compiler relies on this here. Try editing the example above to pass and return longs to compare. ↩

Permalink

Filed under:

Coding
AoCO2025

Posted at 06:00:00 CST on 2nd December 2025.

About Matt Godbolt

Matt Godbolt is a C++ developer living in Chicago.
He works for Hudson River Trading on super fun but secret things.
He is one half of the Two's Complement podcast.
Follow him on Mastodon
or Bluesky.

Copyright 2007-2025 Matt Godbolt.
Unless otherwise stated, all content is licensed under the
Creative Commons Attribution-Noncommercial 3.0 Unported License.
This blog is powered by the MalcBlogSystem by Malcolm Rowe.
Note: This is my personal website. The views expressed on
these pages are mine alone and almost certainly not those of my employer.

Addressing the Adding Situation – Matt Godbolt’s Blog

This post, part of the Advent of Compiler Optimizations 2025 series, delves into a deceptively complex topic: the addition of two integers on the x86 architecture. Matt Godbolt, a C++ developer at Hudson River Trading, unveils a clever technique employed by compilers to circumvent the limitations of x86’s instruction set. The core observation revolves around how x86’s memory addressing system can be exploited to perform calculations, effectively transforming a simple addition into a sophisticated addressing operation.

The traditional view of x86 assembly often presents a scenario where the `add` instruction accepts a limited number of operands, typically two. However, Matt Godbolt demonstrates that this isn’t necessarily the case. The x86 architecture possesses a remarkably flexible memory addressing system, allowing operands to be accessed using complex addressing modes. This system allows for referencing memory locations based on constant offsets, relative to registers, or even register values multiplied by scales of 1, 2, 4, or 8, all within a single instruction. This approach contrasts significantly with architectures like ARM, where a separate ‘add’ instruction explicitly specifies both the source operands and the destination register.

The post highlights a specific use case within the Linux system being compiled. Parameters are passed into the `edi` and `esi` registers, mirroring the ABI (Application Binary Interface) conventions. The result, fundamentally a 32-bit value, is then placed in the `eax` register. This illustrates a key point: compilers don't just blindly execute instructions; they strategically utilize architectural features to optimize performance to a degree that is often hidden from the programmer’s direct view.

A crucial element of this optimization hinges on the ‘load effective address’ instruction, often abbreviated as ‘lea’. Unlike the `add` instruction which overwrites one of the operands, `lea` calculates the address of a memory location without physically accessing its contents. This is particularly useful when the compiler needs to perform complex addressing calculations, such as shifting and adding, to arrive at the desired memory address. This allows for a flexible computation where one might need to perform several steps before arriving at the destination.

The post emphasizes that, regardless of the intention to perform a 32-bit addition, the x86 architecture unconditionally treats the operation as a 64-bit calculation, owing to this robust system. The upper bits of the `eax` register are discarded when the result (the 32-bit value) is written to the destination, reflecting a further layer of architectural peculiarity. This emphasizes the importance of understanding the underlying hardware and instruction set. This detail is underscored by the requirement of maintaining the ABI, especially pertinent when considering long data types. By understanding the ABI, the compiler can operate in a consistent fashion and ensure correct integration with the system.

Furthermore, Matt Godbolt points out the advantages of this approach, particularly concerning the multiple execution units within x86 processors. The ‘lea’ instruction allows for parallel execution, significantly reducing the overall processing time. The compiler’s ability to recognize and leverage these features showcases the sophisticated optimization strategies employed by modern compilers. The post implicitly advocates for a deeper understanding of computer architecture as a means of writing more efficient code.