Skip to main content
Fanael's random ruminations

Archives for the second quarter of 2021

Dependency-breaking zeroing XOR in P6

Published on the

Topics: microarchitecture-archeology, mythbusting

In x86 assembly language, a common idiom for setting the value of a register to 0 is to use the exclusive-or instruction with both operands being the same register, such as xor eax, eax. It was originally intended as a size optimization: the obvious mov eax, 0 is encoded as five bytes, of which four are used to store the constant 0, while the exclusive-or solution needs merely two, and is equally as fast, so it quickly became widespread.

By the time the P6 microarchitecture was being designed, the xor zeroing idiom was already nigh-universal in compiler output and hand-written assembly alike, so it was specifically recognized as a zeroing idiom for the purpose of avoiding partial register stalls in code such as this:

x86 assembly
    xor eax, eax
    mov al, [ecx]
    ; use eax

In code tuned for the original Pentium or earlier processors, this was the usual way of zero-extending an 8-bit (or 16-bit with ax instead of al) value into the full 32-bit register, as the movzx instruction was slower. P6, starting from the very first Pentium Pro, recognized that after a xor of a register with itself, the register held 0, which avoided the partial register stall that would otherwise occur when modifying a low part of a register followed by operations on the full 32 bits.

Unfortunately, the Pentium Pro as originally designed was too ambitious to be realized using then-available lithography technology without making the chip too big — and thus too prone to manufacturing defects — so some features had to go. Segment register renaming and beefier secondary decoders were some of the notable victims of that process.

I assume that the ability to recognize that the exclusive-or zeroing idiom doesn't really depend on the previous value of a register, so that it can be dispatched immediately without waiting for the old value — thus breaking the dependency chain — met the same fate; the Pentium Pro shipped without it.

Some of the cut features were introduced in later models: segment register renaming, for example, was added back in the Pentium II. Maybe dependency-breaking zeroing XOR was added in later P6 models too? After all, it seems such a simple yet important thing, and indeed, I remember seeing people claim that's the case in some old forum posts and mailing list messages. On the other hand, some sources, such as Agner Fog's optimization manuals say that not only it was never present in any of the P6 processors, it was also missing in Pentium M.

Whatever the case may be, there's only one way to make sure: test it!

Read the full article…

Blog update

Published on the

Topics: meta

In the past few days I've made some changes — some immediately noticeable, some less so — to the blog. Why not have a look at them?

I've been writing another, more substantial and much larger article for quite some time. I initially wanted to publish it in December, then January, then March, but getting it into a shape I'm comfortable with takes much more time and effort than I anticipated. Sorry about that.

The big, immediately noticeable change is that the main page is no longer a copy of the latest article. I've changed it to the tried and true format of using the introductory section of the last several articles, where "several" is defined here as five.

With that change, the need for displaying an article permalink in the header vanished: since pages aren't getting copied anymore, there's no need for an explicit, unambiguous, canonical link to an article. The address the browser displays when you read an article is now the canonical link, putting one in the article header as well would just be superfluous.

While the other changes are minor and not as readily — if at all — noticeable to most readers, they still warrant coverage, in their own subsections.

Read the full article…