Table of contents

Floating-Point Operations

Memory Validation and Setup
Instruction Pointer Operations
Floating-Point Status

Misc

Operations identified (revised)
New UOPs from 21C2 full-listing

Pentium M 6D8 iret findings

Unknown uops - these are very rough guesses, take with grain of salt. To be further verified/analyzed.

Inspiration from Atom:

Microcode Sequence Control

UOP.000(ALIAS.014, ALIAS.014)

Microcode sequence synchronization point
Appears at start of sequences with BOM
Likely pipeline flush or register alias table checkpoint
"ALIAS.014" may indicate register renaming state

UOP.0D4(CONST_0, CONST_0, U2.20)

Pipeline commit/flush operation
Seen after MOVETOCREG (control register modifications)
Flushes pipeline after CR0/CR4 changes
U2.20 flag suggests register writeback phase

UOP.0D8(CONST_0, address)

Update instruction pointer / commit microcode state
Used with EIP calculations (EIP + displacement)
Appears in ENTER, RDPMC, bus lock sequences
Likely updates next instruction fetch address

Exception and Event Handling

UOP.120(vector, vector)

Prepare exception vector number
Examples: UOP.120(5, 5) for #BR, UOP.120(13, 13) for #GP
Also used in shift operations to combine operands
Dual purpose: exception setup OR operand preparation

UOP.203(CONST_0, error_code)

Format exception error code
Prepares exception information structure
Used before SIGEVENT calls
Seen in RDPMC privilege checking

Memory Synchronization

UOP.134(CONST_6, CONST_6, U2.08)

Assert bus LOCK# signal
Used in all LOCK-prefixed operations
Critical for atomic memory operations
Seen in LOCK XADD, LOCK CMPXCHG, LOCK XCHG
Also used in LGDT/LIDT before bus lock wait

Atomic Memory Operations (32-bit)

UOP.84C [address]

Unlocked atomic load (32-bit)
Used in XADD without LOCK prefix
Standard load with atomicity guarantees

UOP.84D [address]

Locked atomic load (32-bit)
Read phase of locked read-modify-write
Used with LOCK prefix operations

UOP.841 [address]

Locked atomic store (32-bit)
Write phase of locked read-modify-write
Completes atomic operation sequence

UOP.8CC [address]

Load 8-bit from memory
Special variant for byte-sized operations
Used in 8-bit rotate operations

Atomic Memory Operations (64-bit)

UOP.B4C [address]

Unlocked 64-bit load
Used in CMPXCHG8B without LOCK
Loads double-word operand

UOP.B4D [address]

Locked 64-bit atomic load
Used in LOCK CMPXCHG8B
Read phase of 64-bit atomic operation

UOP.B41 [address]

Locked 64-bit atomic store
Write phase of 64-bit atomic operation
Completes LOCK CMPXCHG8B sequence

Shift and Rotate Operations

UOP.1C1(ArithFLAGS, operand, U2.21)

Extract flags for shift/rotate operations
Prepares carry flag for RCL/RCR operations
U2.21 suggests flag extraction phase
Used in all rotate-through-carry instructions

UOP.42C(operand, count)

Prepare shift right operation
Used in SHRD (shift right double precision)
Preprocessing phase

UOP.42D(operand, count)

Prepare shift left operation
Used in SHLD (shift left double precision)
Preprocessing phase

UOP.42E(operand, flags)

Execute double precision shift left
Main execution phase of SHLD
Updates flags based on result

UOP.42F(operand, flags)

Execute double precision shift right
Main execution phase of SHRD
Updates flags based on result

UOP.4AC(flags, count)

Prepare rotate right through carry
Extracts and positions carry flag for RCR
Used with 8-bit and larger operands

UOP.4AD(flags, count)

Prepare rotate left through carry
Extracts and positions carry flag for RCL
Used with 8-bit and larger operands

UOP.52F(operand, flags, U2.39)

Execute rotate operation
Performs actual rotation with carry
U2.39 flag suggests flag update phase

UOP.62C(flags, count)

Prepare rotate operation (variant)
Alternative preparation for rotate instructions
Exact difference from UOP.4AC/4AD unclear

Conditional Operations

UOP.265(condition, value)

Conditional 64-bit select
Used in CMPXCHG8B
Selects between two 64-bit values based on condition
Implements conditional move for double-word operands

Floating-Point Operations

Original FPU instr encoding:

dc/0 fadd
dc/1 fmul
dc/2 fcom
dc/3 fcomp
dc/4 fsub??
dc/5 fsub??
dc/6 fdiv
dc/7 fdiv


6E9 fmul ??
768 FADD?

UOP.6E8(operand1, operand2)

Floating-point operation - likely LOAD or PREPARE
Used at start and end of polynomial chains
Possibly converts or prepares FP values

UOP.6E9(operand1, operand2)

Floating-point MULTIPLY
Core operation in polynomial evaluation
Implements x * coefficient pattern

UOP.768(operand1, operand2)

Floating-point ADD
Implements coefficient + (x * previous_result)
Horner's method addition step

UOP.769(operand1, operand2)

Floating-point operation variant
Only appears once at end of second chain
Possibly final adjustment or different precision?

UOP.124(operand1, operand2)

Unknown integer operation on FP exponent
Works with shifted TMPA values
Likely exponent bias adjustment

Memory Validation and Setup

UOP.94B(address_reg, address_reg, offset, mode, flags)

Memory operand validation
Checks alignment, access permissions, or page boundaries
Used before large memory operations (FXSAVE)
Called twice: once for start address, once for end of region (+0x40)
Likely validates entire region is accessible

UOP.E41(address_reg, address_reg, offset, mode, flags)

Load effective address with validation
Returns address in destination register
Result is tested for alignment (FXSAVE requires 16-byte alignment)
Combines LEA functionality with boundary checking

UOP.CCB(segment, address, mode, flags)

Prepare memory region for bulk operations
Called after calculating end address (base + 511)
Possibly cache line prefetch or reserve memory region
Prepares for sequential store operations

UOP.CCD(offset, base_address, mode, flags)

Memory operation related to UOP.CCB
Called at offset +0x10 into FXSAVE structure
Possibly cache line lock or begin transaction
Used in normal path only (not in alternate path)

Instruction Pointer Operations

UOP.1C9(CONST_0, EIP_30)

Unknown instruction pointer operation
Appears only in alternate FXSAVE path (part_macro_fxsave_alt)
Called immediately after UOP.0D8(CONST_0, EIP_30)
Both operate on EIP_30 - purpose unclear
Possibly related to exception handling or precise state recovery

Floating-Point Status

TMPB = FNSTSW?(FCC, FSW, U2.20)

Store FPU Status Word - microcode variant
Reads both FCC (FP Condition Codes) and FSW (FP Status Word)
Combines them into result register
Question mark indicates disassembler uncertainty
More comprehensive than architectural FNSTSW instruction
Returns composite status for FXSAVE state capture

Misc

From various analysis

UOP.121 [MSROM-612: 2995, 299E, 29A1, 29AC, 29B6, 29C1, 29D1, 37DD]

FP exponent field replace — overwrites biased exponent of a float with a supplied value, preserving mantissa
Used in FBSTP for range reduction and digit pipeline initialisation
Also seen in FXTRACT normalisation (previous listing)

UOP.028 [MSROM-612: 29A9, 29BD, 29CD]

FP fractional-part extract — computes frac(src1), using src2 as subtraction threshold (3.75 = 15/4 here)
Used in FBSTP inner digit-peel loop to isolate remainder after each decimal digit is stripped

UOP.22C [MSROM-612: 29B0, 29C0, 29D0]

FP add small correction — adds epsilon src2 to fractional remainder src1; src2 here is FPROM[05D] = 2.168e-19 = 2^-62
Used in FBSTP to suppress accumulated rounding drift across 16 digit extraction iterations

UOP.221 [MSROM-612: 29A0]

FP zero-correct or sign-strip — takes value + 0.0; likely flushes negative zero or clears sign before digit pipeline
Used in FBSTP once during digit register initialisation

UOP.22B [MSROM-612: 29A2]

FP scale by small integer — multiplies/scales digit register by a small integer constant (3 here)
Used in FBSTP for initial alignment of the working value before main extraction loop

UOP.229 [MSROM-612: 2998]

FP align for extraction — normalises/aligns src1 against a precision threshold src2 before digit pipeline begins
Used in FBSTP once after range check passes

UOP.7E0 [MSROM-612: 2999, 29E1]

FP integer-part extract — extracts the integer portion of a float into a working register
Used in FBSTP to seed the digit accumulator; second use at EOM finalises FP register state

UOP.62E [MSROM-612: 29AD, 29B8, 29C2, 29D2]

Integer nibble insert — ORs a digit nibble (src2, from INTEXTRACT.UP32) into accumulator src1
Used in FBSTP BCD packing: paired with UOP.120 shift to build packed BCD word nibble by nibble

UOP.120 [MSROM-612: 29AC, 29B6, 29C1, 29D1, 14DE, 29F5]

Integer left-shift accumulator — shifts integer register src1 left by src2 bits (4 = one nibble here)
Used in FBSTP to make room for each new BCD digit before UOP.62E inserts it
Second usage pattern (14DE, 29F5) with U2.49 suggests exception state initialisation role

UOP.733 [MSROM-612: 29DA]

Prepare 64-bit value for memory store — feeds a WUCONCAT result into the store pipeline
Used in FBSTP just before FSTA2.DSZ64 writes 8 packed BCD bytes to memory

UOP.0D8 [MSROM-612: 0E82, 29C8, 29D8, 29EA, 14DC]

Fault return-EIP register — saves EIP+instruction-length for exception recovery
Used in FBSTP at every potential fault site before #IA or store operations
Always paired with UOP.0D4

UOP.0D4 [MSROM-612: 0E84, 29EC]

Fault type / exception class tag — specifies the class of fault being prepared
Used in FBSTP paired with UOP.0D8 at unmasked invalid-operation exception sites

UOP.020 [MSROM-612: 29E4, 29F9]

FP stack pop and commit — pops ST(0) off the x87 register stack; opcode selector in src1 (0B1 here)
U2.80 = normal pop, U2.A0 = pop with exception marker
Used in FBSTP as the final EOM operation on both normal and exception paths

UOP.CCC [MSROM-612: 29E2, 29F8, 29FA]

Cache store commit / MESI coherence — finalises pending memory stores and ensures cache line consistency
Used in FBSTP after every memory write path (both normal and exception) before instruction retirement

UOP.020 (FXTRACT listing) [MSROM-612: 123C, 123D, 1249, 124A]

FP register write with stack control — writes result to ST0/ST7; src1 encodes push/write mode
CONST.00.034 = push+write (ST7 becomes new ST0), CONST.00.012 = plain write
Used in FXTRACT to deliver exponent and significand results onto the FP stack

Operations identified (revised)

21C2 full-listing (entry 20D6): FSIN

Shared kernel also used by FCOS (different entry, same body) and FSINCOS (via macro at 24FE).

macro_fsincos at 24FE sets TMP8 bit 2 = 4 before jumping to loc_20DE, signalling dual-output mode.

Labels @macro_fsincos_sin_6taylor (2119) and @macro_fsincos_cos_6taylor (213A) confirm this kernel

is shared between FSIN, FCOS and FSINCOS; octant bit selects which Taylor path executes.

FPROM sin coefficients (high precision): 01C-021

FPROM cos coefficients (high precision): 022-027

FPROM sin coefficients (large angle / table path): 028-02B

FPROM cos coefficients (large angle / table path): 02C-02F

FPROM table lookup: 070-077 sin(k/64), 078-07F cos(k/64), indexed by TMPE

Argument reduction: divide by FPROM[043]=pi/2, take frac, compare with FPROM[044]=pi/4

1BAC full-listing: most likely FSINCOS (parallel dual Horner variant) or the internal

computation kernel shared by FPTAN. Runs sin and cos Horner series simultaneously

in a single pass (TMP5=sin chain, TMP6=cos chain interleaved), then outputs two results.

Cannot be FSIN/FCOS alone (single-output). Not FPTAN (207A uses UOP.7EE, this uses UFPOP_7X8 twice).

The dual UFPOP_7X8 with CONST.00.034 (push+write) and CONST.00.012 (write) at EOM

delivering results to both ST0 and ST7 is the strongest indicator of FSINCOS.

207A full-listing: FPTAN

Produces sin in TMP5/TMP6, cos in complementary, then UOP.7EE combines into tan = sin/cos.

Pushes FPROM[042] = 1.0 into ST7 via UOP.020 before EOM, matching FPTAN output contract.

New UOPs from 21C2 full-listing

UOP.262 [MSROM-612: 21C2 full-listing: 21B6, 21C2]

FP special-case result write for tiny-argument path — used when |ST0| exponent is below threshold (very small x)
src1 is FPROM[042]=1.0 scaled by a sign bit from TMP8 via UOP.120; writes result directly to ST0 in-place
For FSIN: sin(x) ≈ x for tiny x, so this path likely returns ST0 unchanged or sign-corrected
For FCOS: cos(x) ≈ 1 for tiny x, so returns ±1.0
Only seen at exit of the "exponent too small" branch (loc_21B0 and loc_21B9)

UOP.029 [MSROM-612: 21C2 full-listing: 20DE | 1BAC full-listing: 1B36 | 207A full-listing: 200A | FXTRACT full-listing: 122E]

FP special-value classifier — tests src for zero, denormal, NaN, infinity; returns flag word
AND with CONST.16.004 isolates bit 2; non-zero branches to exception/special-value path
Consistent role confirmed across FSIN, FSINCOS, FPTAN, FXTRACT

UOP.0A1 [MSROM-612: 21C2 full-listing: 210C | 1BAC full-listing: 1B60 | 207A full-listing: 2034]

FP upper-mantissa index extractor — extracts high bits of reduced angle mantissa for sin/cos table index
Input is |x| (from FXORS src,src = clear sign); output feeds FPEXTRACT.P3 then ADD with 0x070/0x078 offset to form FPROM table address
Consistent role confirmed across FSIN, FSINCOS, FPTAN

UOP.120 [MSROM-612: 21C2 full-listing: 20F8, 2136, 2156, 219C, 21AC, 21B6, 21BC, 21C1 | 1BAC full-listing: 1B4D | 207A full-listing: 2021]

Dual role confirmed by context:
Role A — FP ldexp-style exponent adjust: src1 is float, src2 is integer exponent delta; scales float by power of 2

Used at 20F8/1B4D/2021 in argument reduction to re-scale fractional argument

Role B — integer nibble-shift accumulator: TMPD <<= 4; seen in FBSTP BCD packing
Role C — FP sign-bit apply: UOP.120(1.0, sign_bit) flips sign of 1.0; seen at 21B6/21C1 in tiny-argument path
Role D — exception / status signal: with U2.4A, U2.4B, U2.50 flag variants, triggers exception or status update (no FP result)
Opcode is heavily overloaded; execution unit behaviour determined by operand types and U2 flags

UOP.124 [MSROM-612: 21C2 full-listing: 2134, 2155, 2158, 2196, 2199, 21A5, 21A9 | 1BAC full-listing: 1BA2, 1BA8, 1BB1, 1BB6, 1BF4 | 207A full-listing: 20BE]

FP sign-mask builder from integer quadrant bits — converts integer octant/quadrant selector to FP-compatible sign mask
Output XORed via FXORS with sin/cos result to apply correct sign without branching
Confirmed across FSIN, FSINCOS, FPTAN; universal branchless sign-correction mechanism for trig instructions

UFPOP_7X8 [MSROM-612: 21C2 full-listing: 2139, 215C, 219E, 21AE | 1BAC full-listing: 1BA6, 1BAC, 1BB5, 1BBA]

Dual FP operand write to register stack — delivers up to two FP values simultaneously
In 21C2 (FSIN): U2.49, single effective output; src1 and src2 carry high/low parts of result or redundant values
In 1BAC (FSINCOS): CONST.00.034 = push+write (new TOS), CONST.00.012 = write second slot; delivers sin and cos to stack
U2.49 vs U2.C9/U2.80 likely selects single-value vs dual-value retirement mode

UOP.267 [MSROM-612: 21C2 full-listing: 20FA | 1BAC full-listing: 1B50 | 207A full-listing: 2024]

FP argument recombiner after reduction — merges octant-adjusted float (src1, from UOP.120) with Taylor residual (src2, from _FSUB)
Produces final reduced argument x' in [MSROM-612: 0, pi/4] fed into Horner evaluation
Confirmed across FSIN, FSINCOS, FPTAN; always follows the UOP.120 + _FSUB pair in argument reduction

UOP.220 [MSROM-612: 21C2 full-listing: 20D8 | 1BAC full-listing: 1B30 | 207A full-listing: 2004]

FP register tag/status validation — reads ST0 and updates internal tag or pipeline state without producing a result (SINK)
Always precedes FXORS(ST0,ST0) that computes |ST0|; ensures tag-word consistency before classification
Confirmed across FSIN, FSINCOS, FPTAN

UOP.7EE [MSROM-612: 207A full-listing: 207A, 2086]

FP divide-and-write for tan — combines sin (src1) and cos (src2) into quotient, writes to ST0
CONST.00.032 = write-without-push; U2.C9 signals EOM with stack semantics
Specific to FPTAN; not seen in FSIN or FSINCOS which use UFPOP_7X8 instead

UOP.7E9 [MSROM-612: 223A]

FP tiny-argument 2^x-1 approximation — computes 2^x-1 ≈ x*ln(2) for x near zero using src1=ln(2) and src2=ST0
Used as EOM write on the fast-exit path in F2XM1 when exponent of ST0 is below small threshold
Avoids full Taylor evaluation; result written directly to ST0

UOP.0A0 [MSROM-612: 223C]

FP argument complementor for 2^x-1 table reduction — transforms TMP0 into the residual needed for table-plus-polynomial evaluation
Applied after FPEXTRACT.P6 extracts the table index bits; TMP0+UOP.0A0(TMP0) forms the argument for the Horner polynomial correction
Likely computes x - round(x * 128)/128 (the fractional residual after table index extraction)
Seen only in F2XM1 at the point of switching from full Taylor to table-lookup path

UOP.0E3 [MSROM-612: 24C0]

FP packed-BCD 64-bit preconvert — transforms a raw 64-bit integer loaded from memory (8 BCD-packed bytes) into an FP working value for the digit reconstruction pipeline
Feeds directly from FLOAD.DSZ64 result into the UOP.028/UOP.22D digit-peel loop
Inverse role to UOP.733 in FBSTP; FBSTP packs FP→BCD, this converts raw BCD load→FP
Seen only in FBLD

UOP.22D [MSROM-612: 24CC, 24D2, 24DE, 24E8]

FP BCD digit accumulator — merges a scaled digit value (src2) into the running FP accumulator (src1); inverse of the UOP.028+UOP.22C peel step used in FBSTP
Pattern: UOP.028(TMP1, threshold) → frac; _FMUL(frac, scale) → digit; UOP.22D(TMP1, digit) → updated accumulator
Called four times in FBLD, once per digit-group loaded from the 10-byte BCD memory format

UOP.262 [MSROM-612: 21B6, 21C2]  — updated from prior listing

* FP tiny-argument result write for FSIN/FCOS — writes ±1.0 (for FCOS) or passes ST0 unchanged (for FSIN)

when input exponent is below threshold; src1 = 1.0 sign-adjusted via UOP.120, src2 = ST0

Both FSIN and FCOS share this shortcut; the octant bit in TMP8 already encodes which function's result is appropriate
Result written as new ST0 in-place (no stack push/pop)

Pentium M 6D8 iret findings

Pentium M IRET microcode — Unknown UOP identification
======================================================
Based on macro_iret-msrom-6d8.asm
Cross-referenced against SDM Vol.2/3 and p6microcode documentation.
Certainty levels: HIGH / MEDIUM / LOW

All occurrences noted. Arguments as they appear in the listing.
Subroutine labels (sub_xxx / tail_xxx) are shared routines, not IRET-specific.

--------------------------------------------------------------------------------

UOP.0D4(CONST.0, CONST.0)
Occurrence: UROM_2E68 (same-privilege return path, immediately after UOP.0D8)
Context: appears only once, right after UOP.0D8 committed EIP, before RDSEGFLD/ADD sequence.
Both args are zero constants. No result register. No segment operand.
* Likely: "finalize instruction-fetch redirect" — pipeline signal that the
branch-to-new-EIP (committed by the preceding UOP.0D8) is complete and
the front-end may begin fetching. Alternatively a no-op padding slot or
"cancel speculative fetch" barrier.
* Certainty: LOW — single occurrence, no result, very sparse context.

--------------------------------------------------------------------------------

UOP.0D8(arg0, arg1)
Occurrences:
UROM_20B8  EOM.Fl2  UOP.0D8(CONST.0, TMP3)         ; task-switch exit
UROM_2E66           UOP.0D8(TMP0, TMP3)             ; same-priv return
Context: TMP3 = new EIP (loaded from stack or TSS). TMP0 = result of UOP.62A
(committed CS descriptor state). Always appears at or very near EOM. In the
task-switch path the first arg is 0; in the same-priv path it is the CS
descriptor handle.
* Likely: "redirect instruction fetch to new CS:EIP" — the microcode
equivalent of a taken branch to the target of IRET. Tells the front-end
decode unit to start fetching at TMP3 within the segment described by TMP0
(or default CS when arg0=0). Equivalent in effect to the P6 "BRTAKEN" /
branch-redirect uop.
* Certainty: HIGH — always at EOM, always has new EIP, always follows CS
descriptor commit.

--------------------------------------------------------------------------------

UOP.120(CONST.16.00D, CONST.16.00D)
Occurrences:
UROM_352C  TMP7 = UOP.120(CONST.16.00D, CONST.16.00D)   ; PAE PDPTE fault path
UROM_35BC  TMP6 = UOP.120(CONST.16.00D, CONST.16.00D)   ; general GP fault path
Context: 0x0D = 13 = #GP exception vector. Both arguments are identical.
Result is passed as first arg to SIGEVENT(result, 0xC1) which raises #GP(0).
No other values of the constant are seen.
* Likely: "prepare exception descriptor / build fault frame info" — takes the
exception vector number (0x0D = #GP) and constructs an internal exception
descriptor or code that SIGEVENT then delivers. Analogous to an internal
"raise exception N" uop seen in other P6-family fault sequences.
* Certainty: HIGH — always both args = 0x0D, always feeds directly into a
SIGEVENT that raises #GP, appears in all fault exit paths.

--------------------------------------------------------------------------------

UOP.131(CONST.04.004, CONST.04.004, LINSEG)
Occurrence: UROM_32DD (sub_tlbflush_and_a20, immediately after STRD(0,0))
Context: sub_tlbflush_and_a20 is called after a CR3 change during TSS
task switch. Preceded by STRD(0,0). Has a LINSEG segment qualifier.
Followed by MOVETOCREG(0E.154, 0x20) and SIGEVENT(0xCC).
* Likely: "TLB flush / INVLPG-all" — the uop that physically invalidates all
TLB entries when CR3 changes. LINSEG operand may indicate the linear
address space being flushed (all = base 0). The STRD+UOP.131 pair matches
the P6 pattern for serializing TLB invalidation.
* Certainty: HIGH — location (after CR3 write), LINSEG operand, and the
SIGEVENT(0xCC) "TLB flush complete" signal all strongly confirm this.

--------------------------------------------------------------------------------

UOP.134(CONST.6, CONST.6)
Occurrences:
UROM_3105  UOP.134(CONST.6, CONST.6)      ; before XLOAD of TSS descriptor
UROM_20A8  UOP.134(CONST.6, CONST.6, U2.08) ; serialization in task-switch exit
Context: file comment says "Always has preceding STRD with imm #0; almost
always called with CONST.6, CONST.6". No result register. Arguments always
the same small constant. Appears where strict ordering is needed (before an
atomic locked load of GDT, and in the pipeline-drain loop before SIGEVENT).
* Likely: "pipeline serialization / memory fence" — equivalent to MFENCE or
the internal P6 serialization barrier. The preceding STRD(0,0) drains the
store buffer; UOP.134 then ensures all prior loads/stores are globally
visible before the next memory access. In the GDT context this implements
the locked bus cycle needed for the busy-bit RMW.
* Certainty: HIGH — context (pre-atomic-access, post-store-drain, task-switch
boundary) is unambiguous. Matches known P6 serialization patterns.

--------------------------------------------------------------------------------

UOP.1CA(CONST.0, CONST.0)
Occurrences:
UROM_1066  UOP.1CA(CONST.0, CONST.0)   ; after spin-wait loop in TSS load
UROM_1166  UOP.1CA(CONST.0, CONST.0)   ; after TSS-switch mode check
UROM_2779  UOP.1CA(CONST.0, CONST.0)   ; after spin-wait loop in sub_tss_save
Context: always immediately follows a spin-wait loop (U_JCC.NT.Z to self,
polling an ALIAS flag) or a conditional branch that exits such a loop. No
result. Args always (0, 0). No segment operand.
* Likely: "acquire / mark ready after synchronization" — signals to the
microsequencer that the spin condition has been resolved and execution may
continue. Could be a microcode-level barrier acknowledgment, or a signal
to clear the ALIAS latch that was being polled.
* Certainty: MEDIUM — pattern is consistent (always post-spin), but the exact
internal mechanism is unclear.

--------------------------------------------------------------------------------

UOP.201(CONST.0, TMP7)
Occurrence: UROM_1C0D
Context:
TMP7 = merged new EFLAGS (computed from stack value + current flags)
TMP6 = UOP.201(CONST.0, TMP7)
SystemFlags = MOVE(CONST.0, TMP7)   ; commit EFLAGS
TMP6 = BTEST(TMP6, bit 17)          ; test VM flag
JC → @macro_iret_v86_return
UOP.201 result is used only to test VM (bit 17 of EFLAGS). The actual
EFLAGS write uses TMP7 directly, not TMP6.
* Likely: "read VM flag from EFLAGS image / normalize flags for VM test" —
may extract or re-format the flags word so that the VM bit is accessible at
the expected position in TMP6. Alternatively it reads the current
SystemFlags VM state and XORs or merges with the new value to detect the
0→1 transition.
* Certainty: LOW-MEDIUM — single occurrence, result used only for VM bit;
could also be "extract high EFLAGS word" or a flags-normalization step.

--------------------------------------------------------------------------------

UOP.202(TMP5, CONSTROM.0AD)   [0x00254DD5]
Occurrence: UROM_1BF5
Context: first step of EFLAGS permission-mask computation:
TMP5 = MOVE(CONSTROM.13D)           ; 0x00254FD5 = base allowed-bits mask
TMP5 = UOP.202(TMP5, CONSTROM.0AD)  ; 0x00254DD5 = second mask
TMP5 = UOP.203(CONST.14.13E, TMP5)
TMP5 = UOP.209(CONST.14.0BD, TMP5)
The two ROM constants differ only in bit 1 (0xFD5 vs 0xDD5). Both encode
which EFLAGS bits the instruction is permitted to change.
* Likely: "select EFLAGS write-mask based on CPL/IOPL" — chooses between two
flag permission masks (one for CPL=0, one for CPL>0) or ORs/ANDs them to
produce the initial mask. The slight difference in the two masks corresponds
to the IF bit permission (CPL vs IOPL comparison in the SDM).
* Certainty: MEDIUM — context and mask values fit EFLAGS permission logic,
but exact selection criterion is inferred.

--------------------------------------------------------------------------------

UOP.203(CONST.14.13E, TMP5)
Occurrence: UROM_1BF6
Context: second step of the EFLAGS mask sequence (see UOP.202 above).
CONST.14.13E is a CONST-table index (different address space from CONST.16.xxx).
TMP5 holds the mask from UOP.202.
* Likely: "apply IOPL-sensitive filter to EFLAGS mask" — reads an internal
state value (current IOPL or CPL comparison result) from the CONST.14 table
and uses it to further restrict or expand the writable-bits mask. E.g.
clears the IF bit in the mask if CPL > IOPL.
* Certainty: MEDIUM — fits the SDM EFLAGS protection rules; CONST.14.13E
address suggests a mode/privilege table lookup.

--------------------------------------------------------------------------------

UOP.204(arg0, arg1)
Occurrences:
UROM_1BE0  TMP6 = UOP.204(CONST.0E.004, TMP6)    ; entry: read internal size state
UROM_1BF0  TMP6 = UOP.204(TMP7, CONST.0)          ; test EFLAGS image for TF
UROM_1BFA  TMP5 = UOP.204(CONST.14.125, TMP5)     ; flags mask refinement
UROM_1BFD  TMPB = UOP.204(TMP8, TMP7)             ; VM flag combination
UROM_1C08  TMP6 = UOP.204(CONST.14.109, CONST.14.109) ; check VIF/VIP/AC mask
Context is heterogeneous. With CONST.0E.xxx first arg it resembles a CR
read (same prefix as MOVEFROMCREG calls). With CONST.14.xxx it appears in
flag-manipulation sequences. With runtime args (TMP7/TMP8) it processes
flag images.
* Likely: "read internal microcode state register / apply flag table entry" —
a general-purpose state-access uop with two variants:
- CONST.0E.xxx form: read internal CR-like register by index (similar to
MOVEFROMCREG but for a different address space).
- CONST.14.xxx form: read from a per-mode ROM table (e.g., flag masks
keyed by IOPL or CPL).
- Dynamic arg form: compute some function of two flag images (e.g.,
detect TF or VM bits after normalization).
The common thread is producing a value derived from machine state that
standard ALU uops cannot access directly.
* Certainty: LOW-MEDIUM — too many distinct usage patterns to assign a single
precise function; the "internal state register read" interpretation covers
all observed cases but is speculative.

--------------------------------------------------------------------------------

UOP.208(TMPA, CONST.0)
Occurrence: UROM_1BE6
Context:
TMP6 = UOP.208(TMPA, CONST.0)
TMP6 = BTEST(TMP6, CONST.16.00E)   ; test bit 14 = NT flag
JC → @macro_iret_tss_link
TMPA likely holds the current (pre-IRET) EFLAGS image in internal form.
Second arg is always 0.
* Likely: "read / normalize current EFLAGS from internal representation" —
TMPA is a microarchitectural flags alias; UOP.208 converts it to a linear
32-bit EFLAGS image where individual bits match the architectural layout.
The result is then BTEST'd for NT (bit 14).
* Certainty: MEDIUM — single occurrence; the pattern (normalize → BTEST → JCC)
matches what you would expect for "read EFLAGS.NT".

--------------------------------------------------------------------------------

UOP.209(CONST.14.0BD, TMP5)
Occurrence: UROM_1BF8
Context: third step of the EFLAGS mask sequence (see UOP.202, UOP.203).
CONST.14.0BD is another CONST-table index.
* Likely: "apply VM-mode or V86-mode filter to EFLAGS mask" — a third
conditioning step on the writable-bits mask, reading a different mode bit
(possibly the current VM flag) from the CONST.14 table to further restrict
which EFLAGS bits this IRET is allowed to modify.
* Certainty: MEDIUM — same reasoning as UOP.203; position in the mask-
computation chain and CONST.14 address space are consistent.

--------------------------------------------------------------------------------

UOP.20A(CONST.14.109, CONST.14.109)
Occurrence: UROM_20B2 (@macro_iret_exit)
Context:
SIGEVENT(TMP3, 0xE7)              ; signal instruction complete
TMP0 = UOP.20A(CONST.14.109, CONST.14.109)
TMP0 = AND(SystemFlags, TMP0)
TMP0 = SUB(TMP0, 0x00180000)     ; 0x00180000 = VIF|VIP (bits 19:17)
JZ → @macro_iret_fault_gp
Identical argument pattern to UOP.204 at UROM_1C08 (both use CONST.14.109
twice), and almost identical surrounding code. The difference is that this
occurs after the main SIGEVENT, as a final sanity check on the committed
SystemFlags.
* Likely: "read EFLAGS/permission mask from CONST.14 table, variant B" —
functionally equivalent to UOP.204 with the same index but possibly a
different internal timing or pipeline stage. May read the committed
SystemFlags permission state to verify the VIF/VIP bits are legal for the
current mode after the IRET completes.
* Certainty: MEDIUM — nearly identical to UOP.204(CONST.14.109,...); the
distinction between the two opcodes in this context is unclear.

--------------------------------------------------------------------------------

UOP.263(TMP0_gdtr, TMP1_ldtr)
Occurrences (all identical pattern):
UROM_1115, 1122, 112D, 113A, 1145  (segment descriptor validation loop)
UROM_114E  (CS descriptor validation)
UROM_2E5A  (same-privilege return CS validation)
Context: always immediately preceded by two LOADs:
TMP0 = LOAD.DSZ64(selector, GDTR)   ; 64-bit GDT entry
TMP1 = LOAD.DSZ64(selector, LDTR)   ; 64-bit LDT entry
TMP0 = UOP.263(TMP0, TMP1)          ; select one
TMP1 = INTEXTRACT.HI32(TMP0, ...)
TMPB = USEGOP4(TMP1, TMP0, type_check, ...)
The selector's TI (Table Indicator) bit determines whether the GDT or LDT
entry is the correct one. Both are pre-loaded speculatively.
* Likely: "select GDT vs LDT descriptor entry by TI bit" — examines the TI
bit of the segment selector and returns either TMP0 (GDT path, TI=0) or
TMP1 (LDT path, TI=1). Effectively a mux/select on the two 64-bit
descriptor entries. The hardware loads both speculatively to avoid a branch.
* Certainty: HIGH — the GDT+LDT double-load pattern is unambiguous, and this
is the only step between the two loads and the descriptor validation.

--------------------------------------------------------------------------------

UOP.62A(CONST.0E.102, TMPB)
Occurrences:
UROM_1159   UOP.62A(CONST.0E.102, TMPB)        ; TSS task switch, commit CS
UROM_209D   UOP.62A(CONST.0E.102, CONST.0E.102) ; priv-change path
UROM_2CA9   UOP.62A(CONST.0E.102, TMPB)         ; V86 return, commit CS
UROM_2E65   TMP0 = UOP.62A(CONST.0E.102, TMPB)  ; same-priv return
Context: TMPB always holds the result of a USEGOP4 call that validated and
built a segment descriptor handle. Always used with CS (or the CS-equivalent
context). CONST.0E.102 appears to be an internal hardware port index. In the
priv-change path both args are the same constant (self-referential commit).
* Likely: "commit segment descriptor into hardware CS descriptor cache" —
finalizes the new CS descriptor (limit, base, access rights, etc.) computed
by the USEGOP chain and makes it visible to the instruction-fetch unit and
memory-access hardware. The 0E.102 port may be the "CS descriptor cache
write port".
* Certainty: HIGH — always follows USEGOP4 on CS, always immediately before
UOP.0D8 (EIP redirect) or EOM. The segment-commit role is clear.

--------------------------------------------------------------------------------

UOP.CC1(args..., segment)
Occurrences:
UROM_276E   UOP.CC1(CONST.6, TMP8, SEG_02)        ; sub_tss_save: setup write ptr
UROM_115D   UOP.CC1(CONST.6, TMP3, CS)             ; TSS switch, CS commit signal
UROM_2E61   UOP.CC1(TMPB, TMP3, CONST.6, CS)       ; same-priv return, CS signal
Context: appears at segment-register-update boundaries. At 276E it sets up
the write pointer for TSS body stores (TMP8 is a stride/offset value). At
115D and 2E61 it appears after a new CS descriptor has been validated,
signaling the load. The segment argument varies (SEG_02 for TSS access,
CS for instruction-fetch context).
* Likely: "notify segment unit of pending segment register load / set segment
access cursor" — a pipeline signal to the segment hardware that a segment
register is about to be updated, allowing it to pre-arm the descriptor cache
or set an internal write/read pointer. In the TSS-body context it may
initialize the sequential-access pointer for the TSS store sequence.
* Certainty: MEDIUM — the segment operand and position at segment-update
boundaries is clear; the exact internal mechanism is inferred.

--------------------------------------------------------------------------------

UOP.CC9(CONST.6, TMP8, SEG_02)
Occurrences:
UROM_27D5  UOP.CC9(CONST.6, TMP8, SEG_02)          ; sub_tss_save epilogue
UROM_27D9  UOP.CC9(CONST.6, TMP8, SEG_02, U3.40)   ; sub_tss_save epilogue
Context: both appear in the sub_tss_save epilogue (loc_27CE), right before
the indirect jump back to the TRANSPORTUIP caller. TMP8 is a stride value
(2 or 4 depending on TSS type). Both have SEG_02 (the new-task TSS segment).
The U3.40 flag on the second suggests it has pipeline-flow significance.
* Likely: "finalize / commit segment sequential-access pointer" — the
counterpart to UOP.CC1, signaling that the sequential TSS-body access (the
store loop) is complete and the segment unit should tear down the cursor.
Alternatively: "arm segment for read access" to prepare the new-task TSS
read that follows. The pair of CC9 calls (write then read contexts?) may
correspond to switching the TSS segment from write to read mode.
* Certainty: MEDIUM — context (TSS save completion, before indirect return)
is consistent, but CC9 vs CC1 distinction is inferred from position.

--------------------------------------------------------------------------------

UOP.F0B(CONST.6, TMP7, GDTR)
Occurrence: UROM_3102 (@tail_tss_load_continue — TSS busy-bit clear)
Context:
TMP7 = selector_index & 0x1F8    ; byte offset of descriptor in GDT
UOP.F0B(CONST.6, TMP7, GDTR)    ; ← this
STRD(0, 0)                        ; store-buffer drain
UOP.134(CONST.6, CONST.6)        ; serialization
TMP0 = XLOAD.DSZ64.1(...)        ; atomic locked load of GDT descriptor
... BTR busy bit ...
STA (write back modified descriptor)
The sequence implements a locked read-modify-write on the GDT descriptor.
UOP.F0B fires before the serialization and the XLOAD.
* Likely: "acquire bus lock / assert LOCK# for GDT descriptor access" — asserts
the external LOCK# signal (or internal lock equivalent) on the GDT entry at
the byte offset in TMP7, ensuring atomicity of the subsequent
XLOAD+modify+STA cycle. The GDTR segment qualifier specifies that the target
is the GDT. This matches the SDM requirement that the TSS busy-bit write be
atomic.
* Certainty: MEDIUM-HIGH — the locked-RMW context (XLOAD.1 flag = locked,
STA.1 = locked store) and the pre-lock position are consistent with a
bus-lock acquire; GDTR operand is unambiguous.

This project is an independent, unofficial work based on publicly available information and reverse-engineering research, and is not affiliated with, endorsed by, sponsored by, or associated with Intel Corporation or its affiliates. It is provided "as is", without warranty of any kind. The author assumes no responsibility or liability for any use, misuse, damage, data loss, hardware failure, or other consequences arising from its use. Intel, Pentium, Core and related trademarks are the property of their respective owners and are used solely for identification and informational purposes.