UOP.000(ALIAS.014, ALIAS.014)
- Microcode sequence synchronization point
- Appears at start of sequences with BOM
- Likely pipeline flush or register alias table checkpoint
- "ALIAS.014" may indicate register renaming state
UOP.0D4(CONST_0, CONST_0, U2.20)
- Pipeline commit/flush operation
- Seen after MOVETOCREG (control register modifications)
- Flushes pipeline after CR0/CR4 changes
- U2.20 flag suggests register writeback phase
UOP.0D8(CONST_0, address)
- Update instruction pointer / commit microcode state
- Used with EIP calculations (EIP + displacement)
- Appears in ENTER, RDPMC, bus lock sequences
- Likely updates next instruction fetch address
UOP.120(vector, vector)
- Prepare exception vector number
- Examples: UOP.120(5, 5) for #BR, UOP.120(13, 13) for #GP
- Also used in shift operations to combine operands
- Dual purpose: exception setup OR operand preparation
UOP.203(CONST_0, error_code)
- Format exception error code
- Prepares exception information structure
- Used before SIGEVENT calls
- Seen in RDPMC privilege checking
UOP.134(CONST_6, CONST_6, U2.08)
- Assert bus LOCK# signal
- Used in all LOCK-prefixed operations
- Critical for atomic memory operations
- Seen in LOCK XADD, LOCK CMPXCHG, LOCK XCHG
- Also used in LGDT/LIDT before bus lock wait
Atomic Memory Operations (32-bit)
UOP.84C [address]
- Unlocked atomic load (32-bit)
- Used in XADD without LOCK prefix
- Standard load with atomicity guarantees
UOP.84D [address]
- Locked atomic load (32-bit)
- Read phase of locked read-modify-write
- Used with LOCK prefix operations
UOP.841 [address]
- Locked atomic store (32-bit)
- Write phase of locked read-modify-write
- Completes atomic operation sequence
UOP.8CC [address]
- Load 8-bit from memory
- Special variant for byte-sized operations
- Used in 8-bit rotate operations
Atomic Memory Operations (64-bit)
UOP.B4C [address]
- Unlocked 64-bit load
- Used in CMPXCHG8B without LOCK
- Loads double-word operand
UOP.B4D [address]
- Locked 64-bit atomic load
- Used in LOCK CMPXCHG8B
- Read phase of 64-bit atomic operation
UOP.B41 [address]
- Locked 64-bit atomic store
- Write phase of 64-bit atomic operation
- Completes LOCK CMPXCHG8B sequence
UOP.1C1(ArithFLAGS, operand, U2.21)
- Extract flags for shift/rotate operations
- Prepares carry flag for RCL/RCR operations
- U2.21 suggests flag extraction phase
- Used in all rotate-through-carry instructions
UOP.42C(operand, count)
- Prepare shift right operation
- Used in SHRD (shift right double precision)
- Preprocessing phase
UOP.42D(operand, count)
- Prepare shift left operation
- Used in SHLD (shift left double precision)
- Preprocessing phase
UOP.42E(operand, flags)
- Execute double precision shift left
- Main execution phase of SHLD
- Updates flags based on result
UOP.42F(operand, flags)
- Execute double precision shift right
- Main execution phase of SHRD
- Updates flags based on result
UOP.4AC(flags, count)
- Prepare rotate right through carry
- Extracts and positions carry flag for RCR
- Used with 8-bit and larger operands
UOP.4AD(flags, count)
- Prepare rotate left through carry
- Extracts and positions carry flag for RCL
- Used with 8-bit and larger operands
UOP.52F(operand, flags, U2.39)
- Execute rotate operation
- Performs actual rotation with carry
- U2.39 flag suggests flag update phase
UOP.62C(flags, count)
- Prepare rotate operation (variant)
- Alternative preparation for rotate instructions
- Exact difference from UOP.4AC/4AD unclear
UOP.265(condition, value)
- Conditional 64-bit select
- Used in CMPXCHG8B
- Selects between two 64-bit values based on condition
- Implements conditional move for double-word operands
Original FPU instr encoding:
dc/0 fadd dc/1 fmul dc/2 fcom dc/3 fcomp dc/4 fsub?? dc/5 fsub?? dc/6 fdiv dc/7 fdiv 6E9 fmul ?? 768 FADD?
UOP.6E8(operand1, operand2)
- Floating-point operation - likely LOAD or PREPARE
- Used at start and end of polynomial chains
- Possibly converts or prepares FP values
UOP.6E9(operand1, operand2)
- Floating-point MULTIPLY
- Core operation in polynomial evaluation
- Implements x * coefficient pattern
UOP.768(operand1, operand2)
- Floating-point ADD
- Implements coefficient + (x * previous_result)
- Horner's method addition step
UOP.769(operand1, operand2)
- Floating-point operation variant
- Only appears once at end of second chain
- Possibly final adjustment or different precision?
UOP.124(operand1, operand2)
- Unknown integer operation on FP exponent
- Works with shifted TMPA values
- Likely exponent bias adjustment
UOP.94B(address_reg, address_reg, offset, mode, flags)
- Memory operand validation
- Checks alignment, access permissions, or page boundaries
- Used before large memory operations (FXSAVE)
- Called twice: once for start address, once for end of region (+0x40)
- Likely validates entire region is accessible
UOP.E41(address_reg, address_reg, offset, mode, flags)
- Load effective address with validation
- Returns address in destination register
- Result is tested for alignment (FXSAVE requires 16-byte alignment)
- Combines LEA functionality with boundary checking
UOP.CCB(segment, address, mode, flags)
- Prepare memory region for bulk operations
- Called after calculating end address (base + 511)
- Possibly cache line prefetch or reserve memory region
- Prepares for sequential store operations
UOP.CCD(offset, base_address, mode, flags)
- Memory operation related to UOP.CCB
- Called at offset +0x10 into FXSAVE structure
- Possibly cache line lock or begin transaction
- Used in normal path only (not in alternate path)
Instruction Pointer Operations
UOP.1C9(CONST_0, EIP_30)
- Unknown instruction pointer operation
- Appears only in alternate FXSAVE path (part_macro_fxsave_alt)
- Called immediately after UOP.0D8(CONST_0, EIP_30)
- Both operate on EIP_30 - purpose unclear
- Possibly related to exception handling or precise state recovery
TMPB = FNSTSW?(FCC, FSW, U2.20)
- Store FPU Status Word - microcode variant
- Reads both FCC (FP Condition Codes) and FSW (FP Status Word)
- Combines them into result register
- Question mark indicates disassembler uncertainty
- More comprehensive than architectural FNSTSW instruction
- Returns composite status for FXSAVE state capture
From various analysis
UOP.121 [MSROM-612: 2995, 299E, 29A1, 29AC, 29B6, 29C1, 29D1, 37DD]
- FP exponent field replace — overwrites biased exponent of a float with a supplied value, preserving mantissa
- Used in FBSTP for range reduction and digit pipeline initialisation
- Also seen in FXTRACT normalisation (previous listing)
UOP.028 [MSROM-612: 29A9, 29BD, 29CD]
- FP fractional-part extract — computes frac(src1), using src2 as subtraction threshold (3.75 = 15/4 here)
- Used in FBSTP inner digit-peel loop to isolate remainder after each decimal digit is stripped
UOP.22C [MSROM-612: 29B0, 29C0, 29D0]
- FP add small correction — adds epsilon src2 to fractional remainder src1; src2 here is FPROM[05D] = 2.168e-19 = 2^-62
- Used in FBSTP to suppress accumulated rounding drift across 16 digit extraction iterations
UOP.221 [MSROM-612: 29A0]
- FP zero-correct or sign-strip — takes value + 0.0; likely flushes negative zero or clears sign before digit pipeline
- Used in FBSTP once during digit register initialisation
UOP.22B [MSROM-612: 29A2]
- FP scale by small integer — multiplies/scales digit register by a small integer constant (3 here)
- Used in FBSTP for initial alignment of the working value before main extraction loop
UOP.229 [MSROM-612: 2998]
- FP align for extraction — normalises/aligns src1 against a precision threshold src2 before digit pipeline begins
- Used in FBSTP once after range check passes
UOP.7E0 [MSROM-612: 2999, 29E1]
- FP integer-part extract — extracts the integer portion of a float into a working register
- Used in FBSTP to seed the digit accumulator; second use at EOM finalises FP register state
UOP.62E [MSROM-612: 29AD, 29B8, 29C2, 29D2]
- Integer nibble insert — ORs a digit nibble (src2, from INTEXTRACT.UP32) into accumulator src1
- Used in FBSTP BCD packing: paired with UOP.120 shift to build packed BCD word nibble by nibble
UOP.120 [MSROM-612: 29AC, 29B6, 29C1, 29D1, 14DE, 29F5]
- Integer left-shift accumulator — shifts integer register src1 left by src2 bits (4 = one nibble here)
- Used in FBSTP to make room for each new BCD digit before UOP.62E inserts it
- Second usage pattern (14DE, 29F5) with U2.49 suggests exception state initialisation role
UOP.733 [MSROM-612: 29DA]
- Prepare 64-bit value for memory store — feeds a WUCONCAT result into the store pipeline
- Used in FBSTP just before FSTA2.DSZ64 writes 8 packed BCD bytes to memory
UOP.0D8 [MSROM-612: 0E82, 29C8, 29D8, 29EA, 14DC]
- Fault return-EIP register — saves EIP+instruction-length for exception recovery
- Used in FBSTP at every potential fault site before #IA or store operations
- Always paired with UOP.0D4
UOP.0D4 [MSROM-612: 0E84, 29EC]
- Fault type / exception class tag — specifies the class of fault being prepared
- Used in FBSTP paired with UOP.0D8 at unmasked invalid-operation exception sites
UOP.020 [MSROM-612: 29E4, 29F9]
- FP stack pop and commit — pops ST(0) off the x87 register stack; opcode selector in src1 (0B1 here)
- U2.80 = normal pop, U2.A0 = pop with exception marker
- Used in FBSTP as the final EOM operation on both normal and exception paths
UOP.CCC [MSROM-612: 29E2, 29F8, 29FA]
- Cache store commit / MESI coherence — finalises pending memory stores and ensures cache line consistency
- Used in FBSTP after every memory write path (both normal and exception) before instruction retirement
UOP.020 (FXTRACT listing) [MSROM-612: 123C, 123D, 1249, 124A]
- FP register write with stack control — writes result to ST0/ST7; src1 encodes push/write mode
- CONST.00.034 = push+write (ST7 becomes new ST0), CONST.00.012 = plain write
- Used in FXTRACT to deliver exponent and significand results onto the FP stack
Operations identified (revised)
21C2 full-listing (entry 20D6): FSIN
Shared kernel also used by FCOS (different entry, same body) and FSINCOS (via macro at 24FE).
macro_fsincos at 24FE sets TMP8 bit 2 = 4 before jumping to loc_20DE, signalling dual-output mode.
Labels @macro_fsincos_sin_6taylor (2119) and @macro_fsincos_cos_6taylor (213A) confirm this kernel
is shared between FSIN, FCOS and FSINCOS; octant bit selects which Taylor path executes.
FPROM sin coefficients (high precision): 01C-021
FPROM cos coefficients (high precision): 022-027
FPROM sin coefficients (large angle / table path): 028-02B
FPROM cos coefficients (large angle / table path): 02C-02F
FPROM table lookup: 070-077 sin(k/64), 078-07F cos(k/64), indexed by TMPE
Argument reduction: divide by FPROM[043]=pi/2, take frac, compare with FPROM[044]=pi/4
1BAC full-listing: most likely FSINCOS (parallel dual Horner variant) or the internal
computation kernel shared by FPTAN. Runs sin and cos Horner series simultaneously
in a single pass (TMP5=sin chain, TMP6=cos chain interleaved), then outputs two results.
Cannot be FSIN/FCOS alone (single-output). Not FPTAN (207A uses UOP.7EE, this uses UFPOP_7X8 twice).
The dual UFPOP_7X8 with CONST.00.034 (push+write) and CONST.00.012 (write) at EOM
delivering results to both ST0 and ST7 is the strongest indicator of FSINCOS.
207A full-listing: FPTAN
Produces sin in TMP5/TMP6, cos in complementary, then UOP.7EE combines into tan = sin/cos.
Pushes FPROM[042] = 1.0 into ST7 via UOP.020 before EOM, matching FPTAN output contract.
New UOPs from 21C2 full-listing
UOP.262 [MSROM-612: 21C2 full-listing: 21B6, 21C2]
- FP special-case result write for tiny-argument path — used when |ST0| exponent is below threshold (very small x)
- src1 is FPROM[042]=1.0 scaled by a sign bit from TMP8 via UOP.120; writes result directly to ST0 in-place
- For FSIN: sin(x) ≈ x for tiny x, so this path likely returns ST0 unchanged or sign-corrected
- For FCOS: cos(x) ≈ 1 for tiny x, so returns ±1.0
- Only seen at exit of the "exponent too small" branch (loc_21B0 and loc_21B9)
UOP.029 [MSROM-612: 21C2 full-listing: 20DE | 1BAC full-listing: 1B36 | 207A full-listing: 200A | FXTRACT full-listing: 122E]
- FP special-value classifier — tests src for zero, denormal, NaN, infinity; returns flag word
- AND with CONST.16.004 isolates bit 2; non-zero branches to exception/special-value path
- Consistent role confirmed across FSIN, FSINCOS, FPTAN, FXTRACT
UOP.0A1 [MSROM-612: 21C2 full-listing: 210C | 1BAC full-listing: 1B60 | 207A full-listing: 2034]
- FP upper-mantissa index extractor — extracts high bits of reduced angle mantissa for sin/cos table index
- Input is |x| (from FXORS src,src = clear sign); output feeds FPEXTRACT.P3 then ADD with 0x070/0x078 offset to form FPROM table address
- Consistent role confirmed across FSIN, FSINCOS, FPTAN
UOP.120 [MSROM-612: 21C2 full-listing: 20F8, 2136, 2156, 219C, 21AC, 21B6, 21BC, 21C1 | 1BAC full-listing: 1B4D | 207A full-listing: 2021]
- Dual role confirmed by context:
- Role A — FP ldexp-style exponent adjust: src1 is float, src2 is integer exponent delta; scales float by power of 2
- Role B — integer nibble-shift accumulator: TMPD <<= 4; seen in FBSTP BCD packing
- Role C — FP sign-bit apply: UOP.120(1.0, sign_bit) flips sign of 1.0; seen at 21B6/21C1 in tiny-argument path
- Role D — exception / status signal: with U2.4A, U2.4B, U2.50 flag variants, triggers exception or status update (no FP result)
- Opcode is heavily overloaded; execution unit behaviour determined by operand types and U2 flags
UOP.124 [MSROM-612: 21C2 full-listing: 2134, 2155, 2158, 2196, 2199, 21A5, 21A9 | 1BAC full-listing: 1BA2, 1BA8, 1BB1, 1BB6, 1BF4 | 207A full-listing: 20BE]
- FP sign-mask builder from integer quadrant bits — converts integer octant/quadrant selector to FP-compatible sign mask
- Output XORed via FXORS with sin/cos result to apply correct sign without branching
- Confirmed across FSIN, FSINCOS, FPTAN; universal branchless sign-correction mechanism for trig instructions
UFPOP_7X8 [MSROM-612: 21C2 full-listing: 2139, 215C, 219E, 21AE | 1BAC full-listing: 1BA6, 1BAC, 1BB5, 1BBA]
- Dual FP operand write to register stack — delivers up to two FP values simultaneously
- In 21C2 (FSIN): U2.49, single effective output; src1 and src2 carry high/low parts of result or redundant values
- In 1BAC (FSINCOS): CONST.00.034 = push+write (new TOS), CONST.00.012 = write second slot; delivers sin and cos to stack
- U2.49 vs U2.C9/U2.80 likely selects single-value vs dual-value retirement mode
UOP.267 [MSROM-612: 21C2 full-listing: 20FA | 1BAC full-listing: 1B50 | 207A full-listing: 2024]
- FP argument recombiner after reduction — merges octant-adjusted float (src1, from UOP.120) with Taylor residual (src2, from _FSUB)
- Produces final reduced argument x' in [MSROM-612: 0, pi/4] fed into Horner evaluation
- Confirmed across FSIN, FSINCOS, FPTAN; always follows the UOP.120 + _FSUB pair in argument reduction
UOP.220 [MSROM-612: 21C2 full-listing: 20D8 | 1BAC full-listing: 1B30 | 207A full-listing: 2004]
- FP register tag/status validation — reads ST0 and updates internal tag or pipeline state without producing a result (SINK)
- Always precedes FXORS(ST0,ST0) that computes |ST0|; ensures tag-word consistency before classification
- Confirmed across FSIN, FSINCOS, FPTAN
UOP.7EE [MSROM-612: 207A full-listing: 207A, 2086]
- FP divide-and-write for tan — combines sin (src1) and cos (src2) into quotient, writes to ST0
- CONST.00.032 = write-without-push; U2.C9 signals EOM with stack semantics
- Specific to FPTAN; not seen in FSIN or FSINCOS which use UFPOP_7X8 instead
UOP.7E9 [MSROM-612: 223A]
- FP tiny-argument 2^x-1 approximation — computes 2^x-1 ≈ x*ln(2) for x near zero using src1=ln(2) and src2=ST0
- Used as EOM write on the fast-exit path in F2XM1 when exponent of ST0 is below small threshold
- Avoids full Taylor evaluation; result written directly to ST0
UOP.0A0 [MSROM-612: 223C]
- FP argument complementor for 2^x-1 table reduction — transforms TMP0 into the residual needed for table-plus-polynomial evaluation
- Applied after FPEXTRACT.P6 extracts the table index bits; TMP0+UOP.0A0(TMP0) forms the argument for the Horner polynomial correction
- Likely computes x - round(x * 128)/128 (the fractional residual after table index extraction)
- Seen only in F2XM1 at the point of switching from full Taylor to table-lookup path
UOP.0E3 [MSROM-612: 24C0]
- FP packed-BCD 64-bit preconvert — transforms a raw 64-bit integer loaded from memory (8 BCD-packed bytes) into an FP working value for the digit reconstruction pipeline
- Feeds directly from FLOAD.DSZ64 result into the UOP.028/UOP.22D digit-peel loop
- Inverse role to UOP.733 in FBSTP; FBSTP packs FP→BCD, this converts raw BCD load→FP
- Seen only in FBLD
UOP.22D [MSROM-612: 24CC, 24D2, 24DE, 24E8]
- FP BCD digit accumulator — merges a scaled digit value (src2) into the running FP accumulator (src1); inverse of the UOP.028+UOP.22C peel step used in FBSTP
- Pattern: UOP.028(TMP1, threshold) → frac; _FMUL(frac, scale) → digit; UOP.22D(TMP1, digit) → updated accumulator
- Called four times in FBLD, once per digit-group loaded from the 10-byte BCD memory format
UOP.262 [MSROM-612: 21B6, 21C2] — updated from prior listing* FP tiny-argument result write for FSIN/FCOS — writes ±1.0 (for FCOS) or passes ST0 unchanged (for FSIN)
when input exponent is below threshold; src1 = 1.0 sign-adjusted via UOP.120, src2 = ST0
- Both FSIN and FCOS share this shortcut; the octant bit in TMP8 already encodes which function's result is appropriate
- Result written as new ST0 in-place (no stack push/pop)
Pentium M IRET microcode — Unknown UOP identification ====================================================== Based on macro_iret-msrom-6d8.asm Cross-referenced against SDM Vol.2/3 and p6microcode documentation. Certainty levels: HIGH / MEDIUM / LOW All occurrences noted. Arguments as they appear in the listing. Subroutine labels (sub_xxx / tail_xxx) are shared routines, not IRET-specific. -------------------------------------------------------------------------------- UOP.0D4(CONST.0, CONST.0) Occurrence: UROM_2E68 (same-privilege return path, immediately after UOP.0D8) Context: appears only once, right after UOP.0D8 committed EIP, before RDSEGFLD/ADD sequence. Both args are zero constants. No result register. No segment operand. * Likely: "finalize instruction-fetch redirect" — pipeline signal that the branch-to-new-EIP (committed by the preceding UOP.0D8) is complete and the front-end may begin fetching. Alternatively a no-op padding slot or "cancel speculative fetch" barrier. * Certainty: LOW — single occurrence, no result, very sparse context. -------------------------------------------------------------------------------- UOP.0D8(arg0, arg1) Occurrences: UROM_20B8 EOM.Fl2 UOP.0D8(CONST.0, TMP3) ; task-switch exit UROM_2E66 UOP.0D8(TMP0, TMP3) ; same-priv return Context: TMP3 = new EIP (loaded from stack or TSS). TMP0 = result of UOP.62A (committed CS descriptor state). Always appears at or very near EOM. In the task-switch path the first arg is 0; in the same-priv path it is the CS descriptor handle. * Likely: "redirect instruction fetch to new CS:EIP" — the microcode equivalent of a taken branch to the target of IRET. Tells the front-end decode unit to start fetching at TMP3 within the segment described by TMP0 (or default CS when arg0=0). Equivalent in effect to the P6 "BRTAKEN" / branch-redirect uop. * Certainty: HIGH — always at EOM, always has new EIP, always follows CS descriptor commit. -------------------------------------------------------------------------------- UOP.120(CONST.16.00D, CONST.16.00D) Occurrences: UROM_352C TMP7 = UOP.120(CONST.16.00D, CONST.16.00D) ; PAE PDPTE fault path UROM_35BC TMP6 = UOP.120(CONST.16.00D, CONST.16.00D) ; general GP fault path Context: 0x0D = 13 = #GP exception vector. Both arguments are identical. Result is passed as first arg to SIGEVENT(result, 0xC1) which raises #GP(0). No other values of the constant are seen. * Likely: "prepare exception descriptor / build fault frame info" — takes the exception vector number (0x0D = #GP) and constructs an internal exception descriptor or code that SIGEVENT then delivers. Analogous to an internal "raise exception N" uop seen in other P6-family fault sequences. * Certainty: HIGH — always both args = 0x0D, always feeds directly into a SIGEVENT that raises #GP, appears in all fault exit paths. -------------------------------------------------------------------------------- UOP.131(CONST.04.004, CONST.04.004, LINSEG) Occurrence: UROM_32DD (sub_tlbflush_and_a20, immediately after STRD(0,0)) Context: sub_tlbflush_and_a20 is called after a CR3 change during TSS task switch. Preceded by STRD(0,0). Has a LINSEG segment qualifier. Followed by MOVETOCREG(0E.154, 0x20) and SIGEVENT(0xCC). * Likely: "TLB flush / INVLPG-all" — the uop that physically invalidates all TLB entries when CR3 changes. LINSEG operand may indicate the linear address space being flushed (all = base 0). The STRD+UOP.131 pair matches the P6 pattern for serializing TLB invalidation. * Certainty: HIGH — location (after CR3 write), LINSEG operand, and the SIGEVENT(0xCC) "TLB flush complete" signal all strongly confirm this. -------------------------------------------------------------------------------- UOP.134(CONST.6, CONST.6) Occurrences: UROM_3105 UOP.134(CONST.6, CONST.6) ; before XLOAD of TSS descriptor UROM_20A8 UOP.134(CONST.6, CONST.6, U2.08) ; serialization in task-switch exit Context: file comment says "Always has preceding STRD with imm #0; almost always called with CONST.6, CONST.6". No result register. Arguments always the same small constant. Appears where strict ordering is needed (before an atomic locked load of GDT, and in the pipeline-drain loop before SIGEVENT). * Likely: "pipeline serialization / memory fence" — equivalent to MFENCE or the internal P6 serialization barrier. The preceding STRD(0,0) drains the store buffer; UOP.134 then ensures all prior loads/stores are globally visible before the next memory access. In the GDT context this implements the locked bus cycle needed for the busy-bit RMW. * Certainty: HIGH — context (pre-atomic-access, post-store-drain, task-switch boundary) is unambiguous. Matches known P6 serialization patterns. -------------------------------------------------------------------------------- UOP.1CA(CONST.0, CONST.0) Occurrences: UROM_1066 UOP.1CA(CONST.0, CONST.0) ; after spin-wait loop in TSS load UROM_1166 UOP.1CA(CONST.0, CONST.0) ; after TSS-switch mode check UROM_2779 UOP.1CA(CONST.0, CONST.0) ; after spin-wait loop in sub_tss_save Context: always immediately follows a spin-wait loop (U_JCC.NT.Z to self, polling an ALIAS flag) or a conditional branch that exits such a loop. No result. Args always (0, 0). No segment operand. * Likely: "acquire / mark ready after synchronization" — signals to the microsequencer that the spin condition has been resolved and execution may continue. Could be a microcode-level barrier acknowledgment, or a signal to clear the ALIAS latch that was being polled. * Certainty: MEDIUM — pattern is consistent (always post-spin), but the exact internal mechanism is unclear. -------------------------------------------------------------------------------- UOP.201(CONST.0, TMP7) Occurrence: UROM_1C0D Context: TMP7 = merged new EFLAGS (computed from stack value + current flags) TMP6 = UOP.201(CONST.0, TMP7) SystemFlags = MOVE(CONST.0, TMP7) ; commit EFLAGS TMP6 = BTEST(TMP6, bit 17) ; test VM flag JC → @macro_iret_v86_return UOP.201 result is used only to test VM (bit 17 of EFLAGS). The actual EFLAGS write uses TMP7 directly, not TMP6. * Likely: "read VM flag from EFLAGS image / normalize flags for VM test" — may extract or re-format the flags word so that the VM bit is accessible at the expected position in TMP6. Alternatively it reads the current SystemFlags VM state and XORs or merges with the new value to detect the 0→1 transition. * Certainty: LOW-MEDIUM — single occurrence, result used only for VM bit; could also be "extract high EFLAGS word" or a flags-normalization step. -------------------------------------------------------------------------------- UOP.202(TMP5, CONSTROM.0AD) [0x00254DD5] Occurrence: UROM_1BF5 Context: first step of EFLAGS permission-mask computation: TMP5 = MOVE(CONSTROM.13D) ; 0x00254FD5 = base allowed-bits mask TMP5 = UOP.202(TMP5, CONSTROM.0AD) ; 0x00254DD5 = second mask TMP5 = UOP.203(CONST.14.13E, TMP5) TMP5 = UOP.209(CONST.14.0BD, TMP5) The two ROM constants differ only in bit 1 (0xFD5 vs 0xDD5). Both encode which EFLAGS bits the instruction is permitted to change. * Likely: "select EFLAGS write-mask based on CPL/IOPL" — chooses between two flag permission masks (one for CPL=0, one for CPL>0) or ORs/ANDs them to produce the initial mask. The slight difference in the two masks corresponds to the IF bit permission (CPL vs IOPL comparison in the SDM). * Certainty: MEDIUM — context and mask values fit EFLAGS permission logic, but exact selection criterion is inferred. -------------------------------------------------------------------------------- UOP.203(CONST.14.13E, TMP5) Occurrence: UROM_1BF6 Context: second step of the EFLAGS mask sequence (see UOP.202 above). CONST.14.13E is a CONST-table index (different address space from CONST.16.xxx). TMP5 holds the mask from UOP.202. * Likely: "apply IOPL-sensitive filter to EFLAGS mask" — reads an internal state value (current IOPL or CPL comparison result) from the CONST.14 table and uses it to further restrict or expand the writable-bits mask. E.g. clears the IF bit in the mask if CPL > IOPL. * Certainty: MEDIUM — fits the SDM EFLAGS protection rules; CONST.14.13E address suggests a mode/privilege table lookup. -------------------------------------------------------------------------------- UOP.204(arg0, arg1) Occurrences: UROM_1BE0 TMP6 = UOP.204(CONST.0E.004, TMP6) ; entry: read internal size state UROM_1BF0 TMP6 = UOP.204(TMP7, CONST.0) ; test EFLAGS image for TF UROM_1BFA TMP5 = UOP.204(CONST.14.125, TMP5) ; flags mask refinement UROM_1BFD TMPB = UOP.204(TMP8, TMP7) ; VM flag combination UROM_1C08 TMP6 = UOP.204(CONST.14.109, CONST.14.109) ; check VIF/VIP/AC mask Context is heterogeneous. With CONST.0E.xxx first arg it resembles a CR read (same prefix as MOVEFROMCREG calls). With CONST.14.xxx it appears in flag-manipulation sequences. With runtime args (TMP7/TMP8) it processes flag images. * Likely: "read internal microcode state register / apply flag table entry" — a general-purpose state-access uop with two variants: - CONST.0E.xxx form: read internal CR-like register by index (similar to MOVEFROMCREG but for a different address space). - CONST.14.xxx form: read from a per-mode ROM table (e.g., flag masks keyed by IOPL or CPL). - Dynamic arg form: compute some function of two flag images (e.g., detect TF or VM bits after normalization). The common thread is producing a value derived from machine state that standard ALU uops cannot access directly. * Certainty: LOW-MEDIUM — too many distinct usage patterns to assign a single precise function; the "internal state register read" interpretation covers all observed cases but is speculative. -------------------------------------------------------------------------------- UOP.208(TMPA, CONST.0) Occurrence: UROM_1BE6 Context: TMP6 = UOP.208(TMPA, CONST.0) TMP6 = BTEST(TMP6, CONST.16.00E) ; test bit 14 = NT flag JC → @macro_iret_tss_link TMPA likely holds the current (pre-IRET) EFLAGS image in internal form. Second arg is always 0. * Likely: "read / normalize current EFLAGS from internal representation" — TMPA is a microarchitectural flags alias; UOP.208 converts it to a linear 32-bit EFLAGS image where individual bits match the architectural layout. The result is then BTEST'd for NT (bit 14). * Certainty: MEDIUM — single occurrence; the pattern (normalize → BTEST → JCC) matches what you would expect for "read EFLAGS.NT". -------------------------------------------------------------------------------- UOP.209(CONST.14.0BD, TMP5) Occurrence: UROM_1BF8 Context: third step of the EFLAGS mask sequence (see UOP.202, UOP.203). CONST.14.0BD is another CONST-table index. * Likely: "apply VM-mode or V86-mode filter to EFLAGS mask" — a third conditioning step on the writable-bits mask, reading a different mode bit (possibly the current VM flag) from the CONST.14 table to further restrict which EFLAGS bits this IRET is allowed to modify. * Certainty: MEDIUM — same reasoning as UOP.203; position in the mask- computation chain and CONST.14 address space are consistent. -------------------------------------------------------------------------------- UOP.20A(CONST.14.109, CONST.14.109) Occurrence: UROM_20B2 (@macro_iret_exit) Context: SIGEVENT(TMP3, 0xE7) ; signal instruction complete TMP0 = UOP.20A(CONST.14.109, CONST.14.109) TMP0 = AND(SystemFlags, TMP0) TMP0 = SUB(TMP0, 0x00180000) ; 0x00180000 = VIF|VIP (bits 19:17) JZ → @macro_iret_fault_gp Identical argument pattern to UOP.204 at UROM_1C08 (both use CONST.14.109 twice), and almost identical surrounding code. The difference is that this occurs after the main SIGEVENT, as a final sanity check on the committed SystemFlags. * Likely: "read EFLAGS/permission mask from CONST.14 table, variant B" — functionally equivalent to UOP.204 with the same index but possibly a different internal timing or pipeline stage. May read the committed SystemFlags permission state to verify the VIF/VIP bits are legal for the current mode after the IRET completes. * Certainty: MEDIUM — nearly identical to UOP.204(CONST.14.109,...); the distinction between the two opcodes in this context is unclear. -------------------------------------------------------------------------------- UOP.263(TMP0_gdtr, TMP1_ldtr) Occurrences (all identical pattern): UROM_1115, 1122, 112D, 113A, 1145 (segment descriptor validation loop) UROM_114E (CS descriptor validation) UROM_2E5A (same-privilege return CS validation) Context: always immediately preceded by two LOADs: TMP0 = LOAD.DSZ64(selector, GDTR) ; 64-bit GDT entry TMP1 = LOAD.DSZ64(selector, LDTR) ; 64-bit LDT entry TMP0 = UOP.263(TMP0, TMP1) ; select one TMP1 = INTEXTRACT.HI32(TMP0, ...) TMPB = USEGOP4(TMP1, TMP0, type_check, ...) The selector's TI (Table Indicator) bit determines whether the GDT or LDT entry is the correct one. Both are pre-loaded speculatively. * Likely: "select GDT vs LDT descriptor entry by TI bit" — examines the TI bit of the segment selector and returns either TMP0 (GDT path, TI=0) or TMP1 (LDT path, TI=1). Effectively a mux/select on the two 64-bit descriptor entries. The hardware loads both speculatively to avoid a branch. * Certainty: HIGH — the GDT+LDT double-load pattern is unambiguous, and this is the only step between the two loads and the descriptor validation. -------------------------------------------------------------------------------- UOP.62A(CONST.0E.102, TMPB) Occurrences: UROM_1159 UOP.62A(CONST.0E.102, TMPB) ; TSS task switch, commit CS UROM_209D UOP.62A(CONST.0E.102, CONST.0E.102) ; priv-change path UROM_2CA9 UOP.62A(CONST.0E.102, TMPB) ; V86 return, commit CS UROM_2E65 TMP0 = UOP.62A(CONST.0E.102, TMPB) ; same-priv return Context: TMPB always holds the result of a USEGOP4 call that validated and built a segment descriptor handle. Always used with CS (or the CS-equivalent context). CONST.0E.102 appears to be an internal hardware port index. In the priv-change path both args are the same constant (self-referential commit). * Likely: "commit segment descriptor into hardware CS descriptor cache" — finalizes the new CS descriptor (limit, base, access rights, etc.) computed by the USEGOP chain and makes it visible to the instruction-fetch unit and memory-access hardware. The 0E.102 port may be the "CS descriptor cache write port". * Certainty: HIGH — always follows USEGOP4 on CS, always immediately before UOP.0D8 (EIP redirect) or EOM. The segment-commit role is clear. -------------------------------------------------------------------------------- UOP.CC1(args..., segment) Occurrences: UROM_276E UOP.CC1(CONST.6, TMP8, SEG_02) ; sub_tss_save: setup write ptr UROM_115D UOP.CC1(CONST.6, TMP3, CS) ; TSS switch, CS commit signal UROM_2E61 UOP.CC1(TMPB, TMP3, CONST.6, CS) ; same-priv return, CS signal Context: appears at segment-register-update boundaries. At 276E it sets up the write pointer for TSS body stores (TMP8 is a stride/offset value). At 115D and 2E61 it appears after a new CS descriptor has been validated, signaling the load. The segment argument varies (SEG_02 for TSS access, CS for instruction-fetch context). * Likely: "notify segment unit of pending segment register load / set segment access cursor" — a pipeline signal to the segment hardware that a segment register is about to be updated, allowing it to pre-arm the descriptor cache or set an internal write/read pointer. In the TSS-body context it may initialize the sequential-access pointer for the TSS store sequence. * Certainty: MEDIUM — the segment operand and position at segment-update boundaries is clear; the exact internal mechanism is inferred. -------------------------------------------------------------------------------- UOP.CC9(CONST.6, TMP8, SEG_02) Occurrences: UROM_27D5 UOP.CC9(CONST.6, TMP8, SEG_02) ; sub_tss_save epilogue UROM_27D9 UOP.CC9(CONST.6, TMP8, SEG_02, U3.40) ; sub_tss_save epilogue Context: both appear in the sub_tss_save epilogue (loc_27CE), right before the indirect jump back to the TRANSPORTUIP caller. TMP8 is a stride value (2 or 4 depending on TSS type). Both have SEG_02 (the new-task TSS segment). The U3.40 flag on the second suggests it has pipeline-flow significance. * Likely: "finalize / commit segment sequential-access pointer" — the counterpart to UOP.CC1, signaling that the sequential TSS-body access (the store loop) is complete and the segment unit should tear down the cursor. Alternatively: "arm segment for read access" to prepare the new-task TSS read that follows. The pair of CC9 calls (write then read contexts?) may correspond to switching the TSS segment from write to read mode. * Certainty: MEDIUM — context (TSS save completion, before indirect return) is consistent, but CC9 vs CC1 distinction is inferred from position. -------------------------------------------------------------------------------- UOP.F0B(CONST.6, TMP7, GDTR) Occurrence: UROM_3102 (@tail_tss_load_continue — TSS busy-bit clear) Context: TMP7 = selector_index & 0x1F8 ; byte offset of descriptor in GDT UOP.F0B(CONST.6, TMP7, GDTR) ; ← this STRD(0, 0) ; store-buffer drain UOP.134(CONST.6, CONST.6) ; serialization TMP0 = XLOAD.DSZ64.1(...) ; atomic locked load of GDT descriptor ... BTR busy bit ... STA (write back modified descriptor) The sequence implements a locked read-modify-write on the GDT descriptor. UOP.F0B fires before the serialization and the XLOAD. * Likely: "acquire bus lock / assert LOCK# for GDT descriptor access" — asserts the external LOCK# signal (or internal lock equivalent) on the GDT entry at the byte offset in TMP7, ensuring atomicity of the subsequent XLOAD+modify+STA cycle. The GDTR segment qualifier specifies that the target is the GDT. This matches the SDM requirement that the TSS busy-bit write be atomic. * Certainty: MEDIUM-HIGH — the locked-RMW context (XLOAD.1 flag = locked, STA.1 = locked store) and the pre-lock position are consistent with a bus-lock acquire; GDTR operand is unambiguous.
The author is not affiliated with, endorsed by, or sponsored by Intel Corporation or its affiliates. All trademarks, including but not limited to Intel, Pentium, and any other registered or unregistered marks mentioned herein, are the property of their respective owners. Their use in this context is solely for descriptive and informational purposes and constitutes nominative fair use under applicable trademark laws.