Int movs
Macro Operation x86 Opcode Notes
macro_252D NOT [mem+8] F7 /2 Bitwise complement
macro_2535 NEG [mem+8] F7 /3 Two's complement negate
macro_2509 MOV [mem+8], reg 89 Register to memory
macro_251C MOV [mem+8], imm C7 /0 Immediate to memory
macro_2512 CMOVcc synthesis or SETcc 0F 40-4F or 0F 90-9F Conditional with register
macro_2524 Conditional with imm Synthetic No direct x86 equivalent
macro_12A5 MOV [mem+8], 1 (?) C7 /0 or test/bit op Unclear, needs more context
FP Transcendental Operation Analysis (msrom-612, 0x1FFE-0x21E1)
Overview
This code block implements a floating-point transcendental function (likely FPATAN or similar) using polynomial approximation. The code demonstrates critical patterns for transferring data between TMP registers (computational domain) and ST registers (architectural FP stack).
UOP.020(source_constant, value_register, U2_flags)
- Transfer value to FP stack register
- Examples from code:
- 0x2001: UOP.020(CONST_00+024, ST0, U2.80) - prepare ST0
- 0x2002: ST7 = UOP.020(CONST_00+03C, ST7, U2.80) - update ST7
- 0x20C6: ST7 = UOP.020(CONST_00+014, TMP1, U2.80) - write TMP to ST7
- 0x20CD: TMP0 = UOP.020(CONST_0, ST0) - read ST0 to TMP
- 0x20D1: ST0 = UOP.020(CONST_00+032, TMP0, U2.80) - write TMP to ST0
- CONST values likely specify conversion mode or precision control
- Can operate bidirectionally: TMP to ST or ST to TMP
- U2.80 flag present when writing to architectural ST registers
UOP.220(constant, ST_register, U2_flags)
- Prepare FP stack register for operation
- Examples from code:
- 0x2004: UOP.220(CONST_0, ST0) - prepare ST0
- 0x20D8: UOP.220(CONST_00+024, ST0, U2.80) - prepare with cross-domain flag
- Appears before reading ST into TMP registers
- May mark ST register as readable or lock it for operation
UOP.7EE(operand1, operand2, operation_code, U2_flags)
- Complex FP operation producing ST result
- Examples from code:
- 0x207A: ST0 = UOP.7EE(TMP5, TMP6, CONST_00+032, U2.C9) - with EOM_Fl3
- 0x2086: ST0 = UOP.7EE(TMP6, TMP5, CONST_00+032, U2.C9) - with EOM_Fl3
- Takes two TMP operands, produces ST result
- CONST_00+032 appears to be operation/mode selector
- Always appears with U2.C9 flag at subroutine exit
- Always marked with EOM_Fl3 (subroutine return)
UFPOP_7X8(operand1, operand2, U2_flags)
- FP operation with stack pop
- Examples from code:
- 0x2139: ST0 = UFPOP_7X8(TMP0, TMP5, U2.49) - with EOM_Fl3
- 0x215C: ST0 = UFPOP_7X8(TMP7, TMP5, U2.49) - with EOM_Fl3
- 0x219E: ST0 = UFPOP_7X8(TMP3, TMP2, U2.49) - with EOM_Fl3
- 0x21AE: ST0 = UFPOP_7X8(TMP4, TMP2, U2.49) - with EOM_Fl3
- Takes TMP register inputs
- Produces ST0 result AND pops FP stack
- Always uses U2.49 flag
- Always marked with EOM_Fl3 (subroutine return)
UOP.262(operand1, operand2)
- Simple TMP to ST transfer or merge
- Examples from code:
- 0x21B8: ST0 = UOP.262(TMPA, ST0) - with EOM_Fl3
- 0x21C2: ST0 = UOP.262(TMPA, ST0) - with EOM_Fl3
- Used in special case handling (denormals, infinities)
- Appears to merge or conditionally update ST0
UOP.029(ST_register, ST_register)
- Extract field from FP register
- Example from code:
- 0x200A: TMPD = UOP.029(ST0, ST0)
- 0x20DE: TMPD = UOP.029(ST0, ST0)
- Result immediately masked with AND (0x004 in examples)
- Likely extracts FP classification bits (NaN, Inf, denormal flags)
UOP.060(FP_value, CONST_0)
- Extract FP field to integer
- Examples from code:
- 0x2019: TMPE = UOP.060(TMP0, CONST_0)
- 0x2020: TMPD = UOP.060(TMPA, CONST_0)
- 0x20ED: TMPE = UOP.060(TMP0, CONST_0)
- Extracts exponent or other FP fields for range reduction
UOP.061(FP_value, CONST_0)
- Extract FP field (variant of UOP.060)
- Examples from code:
- 0x2009: TMPC = UOP.061(ST0, CONST_0)
- 0x2036: TMPC = UOP.061(TMP0, CONST_0)
- 0x20DD: TMPC = UOP.061(TMP0, CONST_0)
- Result used for range comparisons
- Possibly extracts biased exponent
UOP.063(FP_value, CONST_0)
- Extract FP field (another variant)
- Example from code:
- 0x2035: TMPE = UOP.063(TMP3, CONST_0)
- Used after FXORS operation
UOP.064(FP_value, CONST_0)
- Extract FP field (exponent?)
- Examples from code:
- 0x2006: TMPC = UOP.064(ST0, CONST_0)
- 0x202E: TMPD = UOP.064(TMP0, CONST_0)
- 0x20DA: TMPC = UOP.064(ST0, CONST_0)
- Result often shifted left by 3 or 4 bits
- Likely extracts exponent for classification
UOP.0A1(CONST_0, FP_value)
- FP operation on value
- Example from code:
- 0x2034: TMP4 = UOP.0A1(CONST_0, TMP3)
- Purpose unclear - possibly absolute value or normalize
UOP.223(operand1, operand2)
- FP arithmetic operation
- Example from code:
- 0x201C: TMPA = UOP.223(TMP2, TMP0)
- Used in range reduction sequence
UOP.227(CONST_0, operand)
- FP operation
- Example from code:
- 0x201A: TMP0 = UOP.227(CONST_0, TMP0)
- Part of argument reduction
UOP.228(operand1, operand2)
- FP operation
- Example from code:
- 0x2018: TMP0 = UOP.228(TMP0, TMP1)
- Used before range reduction
UOP.267(operand1, operand2)
- FP operation
- Example from code:
- 0x2024: TMP0 = UOP.267(TMP2, TMP1)
- Part of computation sequence
Integer Operations on FP Exponents
UOP.124(operand1, operand2)
- Integer operation on FP exponent/sign
- Examples from code:
- 0x2090: TMPB = UOP.124(TMPB, TMPD)
- 0x20BE: TMPB = UOP.124(CONST_0, TMPD)
- 0x2134: TMPB = UOP.124(TMPB, TMPE)
- Result used with FXORS to apply sign changes
- Likely constructs sign/exponent bits for result
Based on observed patterns in code:
- U2.08 - Read architectural state flag
- 0x2005: TMP0 = FXORS(ST0, ST0, U2.08)
- 0x20D9: TMP0 = FXORS(ST0, ST0, U2.08)
- Allows reading ST registers in computational domain
- U2.20 - Write preparation or intermediate result flag
- 0x2072: TMP9 = ADD.DSZ32(EIP_30, REG.31, U2.20)
- 0x2191: TMP9 = ADD.DSZ32(EIP_30, REG.31, U2.20)
- Marks operations that prepare for architectural commit
- U2.49 - FP stack pop with result commit
- Always used with UFPOP_7X8
- Combines result commit with stack management
- Bit pattern: 0100 1001
- U2.4A - Exception or special completion flag
- 0x20D5: UOP.120(CONST_16+004, CONST_16+004, U2.4A) - with EOM_Fl3
- 0x21C6: UOP.120(CONST_16+004, CONST_16+004, U2.4A) - with EOM_Fl3
- Used at error/exception exits
- U2.4B - Normal setup/initialization flag
- 0x2000: UOP.120(CONST_0, CONST_0, U2.4B)
- 0x20D6: UOP.120(CONST_0, CONST_0, U2.4B)
- Appears at start of computation sequences
- U2.50 - Precision or mode control flag
- 0x201D: UOP.120(CONST_16+010, CONST_16+010, U2.50)
- 0x20CC: UOP.120(CONST_16+010, CONST_16+010, U2.50)
- Used before final result computation
- U2.80 - Cross-domain write enable (architectural commit)
- Used extensively with UOP.020 when writing to ST registers
- This is the key "make visible" flag
- Allows computational results to affect architectural state
- Bit pattern: 1000 0000
- U2.C9 - Complex cross-domain operation with commit
- Always used with UOP.7EE at subroutine returns
- Bit pattern: 1100 1001 (includes U2.80 bit)
- Indicates full architectural state update
- Step 1: Read ST registers into TMP domain
- UOP.220 prepares ST register
- FXORS, UOP.064, UOP.029 extract fields with U2.08 flag
- Exponent and special case checks
- Step 2: Range reduction and argument preparation
- UOP.061 extracts exponent
- Compare against ROM constants (CONSTROM.03D, CONSTROM.047, CONSTROM.03E)
- Branch to special case handlers if needed
- Step 3: Polynomial approximation in TMP domain
- Multiple FREADROM to load coefficients
- UOP.6E9 (FP multiply) and UOP.768 (FP add) for Horner's method
- All computation stays in TMP registers (invisible to architecture)
- Step 4: Result finalization
- UOP.120 operations with various U2 flags for mode setup
- FXORS with TMPB to apply final sign
- Step 5: Commit to architectural state
- UOP.0D8 updates next instruction pointer
- UOP.0D4 synchronizes pipeline
- UOP.7EE or UFPOP_7X8 writes result to ST0 with U2.C9 or U2.49
- EOM_Fl3 marks subroutine return
- Denormals: Check UOP.029 result & 0x004 at 0x200D, 0x20E1
- Underflow range: Compare exponent < CONSTROM.03D at 0x200E, 0x20E2
- Overflow range: Compare exponent >= CONSTROM.047 at 0x2012, 0x20E6
- Large arguments: Compare exponent >= CONSTROM.03E at 0x2038, 0x2110
- Each case has dedicated exit path with appropriate result handling
The U2.80 bit is the "architectural visibility" flag. Operations without this bit execute in a shadow computational domain where:
- TMP registers can be freely modified
- FP operations compute intermediate results
- No architectural state is changed
- Exceptions cannot occur (computation is speculative)
Only operations with U2.80 (or composite flags like U2.C9 containing it) can:
- Write to architectural ST registers
- Update FP status flags
- Trigger FP exceptions
- Make results visible to subsequent instructions
The Pentium Pro SYSENTER/SYSEXIT Bug: A Microcode Analysis
The Pentium Pro implemented SYSENTER and SYSEXIT instructions that Intel quietly left undocumented at launch. When Linux 2.6 later enabled these instructions based on the documented Pentium II behavior, Pentium Pro systems crashed. The reason has now been confirmed through direct analysis of the processor's microcode.
SYSENTER works. SYSEXIT does not.
SYSENTER on the Pentium Pro behaves correctly and is functionally equivalent to the Pentium II version. It reads the kernel entry point and stack from the SYSENTER MSRs, switches to ring 0, and clears the interrupt flag before transferring control. A kernel using SYSENTER for the call half and IRET for the return would have worked fine on Pentium Pro all along, as was suspected by Linux developers at the time.
SYSEXIT, however, is a completely different implementation from what Intel later documented for the Pentium II.
A different calling convention
The most fundamental problem is that SYSEXIT on the Pentium Pro reads its inputs from different registers than the Pentium II. The documented Pentium II SYSEXIT takes the return address from EDX and the user stack pointer from ECX, and derives the user-mode code and stack segment selectors automatically from the SYSENTER_CS MSR (adding fixed offsets to produce the ring-3 CS and SS). This is the design that operating systems implemented against.
The Pentium Pro SYSEXIT works differently. It still takes the stack pointer from ECX, but it reads the new instruction pointer from ESI. More critically, it reads the user-mode code segment selector directly from the DI register and the stack segment selector from BX, rather than computing them from the MSR. This was apparently intended to give the operating system explicit control over the user-mode segment descriptors, enabling non-flat memory models. In practice it was fatal.
The null selector crash
In normal kernel code, DI and BX frequently contain zero or arbitrary values left over from system call argument handling. When DI is zero, SYSEXIT loads the null descriptor into CS. The null descriptor (GDT entry 0) is architecturally reserved and must never be loaded into a code segment register. The Pentium Pro microcode checks that the SYSENTER_CS MSR is nonzero, but performs no validity check on the value in DI.
The result: SYSEXIT completes without error, CPL is set to 3, and the CPU begins executing in user mode with a null CS. The very first instruction fetch causes a General Protection Fault. The fault handler tries to report the error and return via IRET, but the error frame on the stack contains the null CS selector that caused the fault in the first place. IRET restores that null CS, causing an immediate second General Protection Fault. Two consecutive General Protection Faults produce a Double Fault, which is what Linux users observed.
This is also the exact behavior described by Intel's own erratum for the Pentium Pro: "SYSENTER/SYSEXIT instructions can implicitly load null segment selector to SS and CS registers." Intel published the erratum, apparently without fully acknowledging that SYSEXIT was the culprit and that DI was the vector.
EFLAGS are not restored
The Pentium II SYSEXIT clears most processor flags before returning to user mode, including the interrupt flag. The Pentium Pro SYSEXIT clears nothing. A kernel that disabled interrupts during syscall handling would return to user mode with interrupts still disabled, causing the system to gradually freeze as no timer or device interrupts could be serviced.
The STI timing problem
The Linux kernel, like many operating systems, uses a STI instruction immediately before SYSEXIT to re-enable interrupts before returning to user mode. The x86 architecture guarantees that an interrupt enabled by STI will not be taken until after the following instruction completes. On Pentium II, SYSEXIT honors this guarantee cleanly because the microcode contains an explicit pipeline checkpoint partway through the instruction, after all segment registers and the privilege level have been committed to a consistent state. If an interrupt arrives, it is held until that safe point.
The Pentium Pro SYSEXIT contains no such checkpoint. An interrupt that arrives during the execution of SYSEXIT may find the processor in a partially-updated state: the privilege level may already be set to ring 3 while the stack still points to a kernel address, or the code segment may be committed while the stack segment is not. Interrupt delivery in this half-updated state produces a stack fault or a protection fault, both of which escalate to a Double Fault.
Why Intel did not fix it
The Pentium Pro was already in production when this problem was identified. The fix Intel implemented for the Pentium II was architectural: rather than reading CS and SS from general-purpose registers, SYSEXIT computes them automatically from the SYSENTER_CS MSR, eliminating the possibility of a null selector being specified and removing the dependence on register state that kernel code cannot reliably control. This change made the instruction safe to document and use.
Intel's decision not to fix later Pentium Pro steppings was likely a cost/timeline judgment. No software used SYSEXIT at the time, so there was no pressure to patch a product that was already shipping. The workaround — not documenting the instruction — was cheap. The consequence surfaced years later when Linux began exploiting the Pentium II behavior and discovered the hard way that Pentium Pros behaved differently.
The correct CPUID check
Intel's documented check for SYSEXIT support — "family 6, model less than 3, stepping less than 3" — excludes only the earliest Pentium Pro models. Later Pentium Pros with model 1, stepping 9 pass the check and are incorrectly identified as supporting the Pentium II SYSEXIT behavior. The corrected check, using a combined model-stepping value, excludes all Pentium Pro processors. The discrepancy between these two checks is what made the Linux 2.6 crashes dependent on the specific CPU stepping and caused confusion about which processors were actually affected.
The author is not affiliated with, endorsed by, or sponsored by Intel Corporation or its affiliates. All trademarks, including but not limited to Intel, Pentium, and any other registered or unregistered marks mentioned herein, are the property of their respective owners. Their use in this context is solely for descriptive and informational purposes and constitutes nominative fair use under applicable trademark laws.