Skip navigation.

CÔNG NGHỆ ĐẢO MÃ

ADVANCED REVERSING AND PROGRAMMING

Posts tagged with "Advanced reversing"

Reverse Engineering Association

Welcome, visitor
http://www.jbfonline.net/sndtuts/
There already exist a lot of cracking sites. So, why another one? This site focuses on spreading the 'reversing knowledge' So... no cracks here! If you are looking for cracks or serials, you landed on the wrong site p: .This site is 100 % legal according the european laws. Our main goal is to offer a deeper insight in win32 assembly. You can learn it here!! "Give a man a crack, and he'll be hungry again tomorrow, teach him how to crack, and he'll never be hungry again"
+ORC Read and learn. Ignorance is bliss, but knowledge is power...
http://biw.rult.at/index.php
__________________
Over the years there have been a few great groups directed towards reverse engineering. They have come and gone, and new ones have risen up. A few stood out from the crowd, but always in time they took their place among the rest, except with their sites being mirrored and their knowledge they brought to the people stored for a time.
When we came together as a group, RET was nothing more than an idea, but over the last while, we have become stronger and stronger as a group and are well on our way to producing a great amount of very useful tools and information.
Our goal is to always move one step further than the rest of the knowledge out there so that we might increase our understanding and the understanding of people around us.
Welcome to the Reverse Engineering Team.
You are free to gather whatever information suits your fancy from us and contribute what you like to the dicussions on our boards. If you have an idea bring it forth, and we'll discuss it, have fun, and never stop learning.
~The Reverse Engineering Team ~
http://www.reteam.org/
Welcome to the new BiW reversing site!
NewsSoon this place will be the new BiW Reversing site.
We hope you will like it, and you won't hesitate to contribute to the site.
All ideas, gfx art, help is welcome, mail detten (at) reversing (dot) be if you are interested.
We expect that the site will be fully functional by the end of february. Stay tuned :wink:
Old forum : http://biw.rult.at/vbb/upload/index.php?
New forum : http://www.reversing.be/
Những câu hỏi liên quan đến Olly có thể Post tại đây : http://ollydbg.win32asmcommunity.net/index.php?
Site này có một số tut hay :
http://www.coderz.gr/tutorials.php?cat=Nothing
Great Forum :
http://forum.exetools.com/
http://www.exetools.com/
Big Team :
http://cracking.accessroot.com/
Unpacking Site :
http://unpack.cjb.net/
http://myhome.hanafos.com/~comgod/packing%20&%20unpacking.htm
Một site của Trung Quốc :
http://www.pediy.com/
Olly Script :
http://ollyscript.apsvans.com/
Welcome to the RCE Messageboard's Regroupment :
http://www.woodmann.com/forum/index.php
Quote:
Welcome to Programmer's Tools, a web site dedicated for all kinds of tools and utitlities for the true WinBloze programmer :wink:. You'll find here a collection of packers, crypters (protectors), unpackers, decrypters (unprotectors), compilers, decompilers, debuggers, patchers, docs, some fun stuff and other utils. Here are also some related links. The purpose of this archive section is to keep protools with up-to-date things and still have older stuff archived.
I'm always 'open' for any suggestions, comments, additions to the content of this site, the web design, etc ... Also feel free to add link to this site :smile:.
I'm not responsible for the content (probably pr0n) of http://www.programmerstools.com, it's used to be the Programmer's Tools domain but someone overtook it :frown:.
http://protools.reverse-engineering.net/
PEid Plugin :
http://www.secretashell.com/BobSoft/
http://go.to/Hairy_Bits
http://mark0.net/plugins-peid-epscan-e.html
-------------
1 Link chứa rất nhiều Tools cần thiết (nhưng bằng tiếng Nga) :
http://www.wasm.ru/toollist.php?list=6
http://www.cracklab.ru/download.php
--------------
Quote:
This page contains binaries, sources and informations for coders.
Coder is man writing his programs in assembly language, sometimes in C.
Programs presented here are mostly written for x86 CPUs and Microsoft operating systems.
http://www.anticracking.sk/EliCZ/
-----------------
Quote:
Here you'll find information on how to program Windows using Assembly Language. I love assembly language and have programmed in asm since DOS days. When Windows "killed" DOS, I thought assembly language died with it. I could not be further from the truth! Assembly programming on Win32 platform is quite possible and easy.
http://win32asm.cjb.net/
------------------
Quote:
The Universitas Virtualis Research Project team welcomes you.
So you think you're good enough to break the protection?
You want to see how good you are in reversing applications?
And you want to do it the legal way?
Then you're at the right place!
http://www.crackmes.de/
------------------
Game Hacking University :
http://www.ghu.as.ro/
__________________
Some Tutorials :
http://www.xs4all.nl/~anvile/n2c/tuts_1.html
http://www.darkfall.demon.co.uk/fallen/crack/tutor.htm
http://www.hnc3k.com/tutorialz.htm
--------------------------------------------------------
Group & Tutorial Sites
http://zor.org/zornews - Zor's Crack News ...whats going on in scene?
http://navig8.to/mp2k - MP2k Group
http://navig8.to/mp2kforum - MP2k Board [Request Board]
http://zor.org/tsrhclub - TSRh Board [Request Board]
http://board.anticrack.de - English Board with nice members
http://www.exetools.com/forum - Board related to reversing tools
http://board.win32asmcommunity.net - ASM Coding related Board
http://www.tnp.redi.tk - German Reversing Board
http://tsrh.crackz.ws - Top Reversing Group
http://www.sndteam.da.ru - Top Reversing Group
http://cracking.accessroot.com - New group
http://zor.org/krobar - a lot of Cracking Tuturials
http://cip.myz.info - Good tuturials in German
http://biw-reversing.cjb.net - a lot of good tuturials
http://www.reteam.org - Reversing Team. Nice tools and tuts
http://www.crackmes.de - Big Crackme Site
http://cryptokg.cjb.net - Crypto Crackmes only
http://www.gamecopyworld.com - Game Cracks
http://www.megagames.com - Game Cracks
http://chiptune.com - Nice chiptunes
http://protools.anticrack.de - a good collection of reversing tools
http://unpack.no-ip.com - The UNPACKiNG GODS
Cracker's Sites
http://pumqara.cjb.net - Pumqara's site. Unpacking,Protection,Coding...
http://www.x3chun.ce.ro - Author of CryptoSearcher
http://www.witeg.cad.pl - Good crypto source codes,a lot of solved crackmes
http://kickme.to/bratalarm - Author of Generic UPX Unpacker
http://krio.cjb.net - a Bulgarian Reverser
http://www.cryptocracking.cjb.net - bLaCk-eye's site
http://tuts4you.com - Teddy Roger's site
__________________
Intro 1 số site về Ebook & English tuts :
1. Assembler books :
http://rapidshare.de/files-en/288088/Assembler.pdf.html
http://rapidshare.de/files-en/288095/pcasm-book.pdf.html
ftp://217.16.26.42/pub/data/unsorted/2004.06.22/english_books
Login bang anonymous. Pass tuy y
3. http://www.glib.hcmuns.edu.vn/ebook/computer.html#C
4. http://babybluevn.co.nr
5. http://www.oreillynet.com
6. http://ftp.cdut.edu.cn/pub3/uncate_doc
7 .http://www.netmug.org/~oscar/pdf/
8. http://www.netmug.org/~oscar/pdf/
9. http://www.personal.psu.edu/users/j/t/jtm244/White%20Papers/
10. http://fivedots.coe.psu.ac.th/Software.coe/240-371/ebook/
11. http://rfpooner.homeip.net:81/files/ebooks/
12. http://sysadmin.oreilly.com
13. http://www.cs.buffalo.edu/~milun/unix.programming.html Programmers reading
14. http://www.cs.monash.edu.au/~alanf/se_proj97/ Programming Pearls 2nd edition
15. http://www.cprogramming.com/tutorial.html
16. http://www.cs.virginia.edu/c++programdesign/slides/
17. http://www.webdesigns1.com/perl/
18. http://www.ictp.trieste.it/texi/perl/
19. http://www.daimi.au.dk/~oracle/sql/index.html Visual Basic stuff
20. http://www.ipl.org/reading/books/
21. http://www.astalavista.com
This is a biggest website for learning
www.msdn.microsoft.com - MSDN Homepage.
Các Functions căn bản.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/windowing/dialogboxes/dialogboxreference/dialogboxfunctions/getdlgitemtext.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winprog/winprog/the_entry_point_function.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_foxhelp9/html/2141eed3-b309-4cd6-9d02-5bb03fe6898e.asp
http://www.absolutelock.de/cgi-bin/ultimatebb.cgi -- forum Unpacking Gods only
This is the best Asm book, That I have read before.
http://webster.cs.ucr.edu/AoA/DOS/pdf/
E-book . pdf only
http://www.nerd-star.com/books/

CHAPTER 16+17+18 8086 EMULATION+MIXING 16-BIT AND 32-BIT CODE+IA-32 COMPATIBILITY

16 8086 Emulation
Vol. 3 16-1
CHAPTER 16 8086 EMULATION
IA-32 processors (beginning with the Intel386 processor) provide two ways to execute new or
legacy programs that are assembled and/or compiled to run on an Intel 8086 processor:
• Real-address mode.
• Virtual-8086 mode.
Figure 2-3 shows the relationship of these operating modes to protected mode and system
management mode (SMM).
When the processor is powered up or reset, it is placed in the real-address mode. This operating
mode almost exactly duplicates the execution environment of the Intel 8086 processor, with
some extensions. Virtually any program assembled and/or compiled to run on an Intel 8086
processor will run on an IA-32 processor in this mode.
When running in protected mode, the processor can be switched to virtual-8086 mode to run
8086 programs. This mode also duplicates the execution environment of the Intel 8086
processor, with extensions. In virtual-8086 mode, an 8086 program runs as a separate protectedmode
task. Legacy 8086 programs are thus able to run under an operating system (such as
Microsoft Windows*) that takes advantage of protected mode and to use protected-mode facilities,
such as the protected-mode interrupt- and exception-handling facilities. Protected-mode
multitasking permits multiple virtual-8086 mode tasks (with each task running a separate 8086
program) to be run on the processor along with other non-virtual-8086 mode tasks.
This section describes both the basic real-address mode execution environment and the virtual-
8086-mode execution environment, available on the IA-32 processors beginning with the
Intel386 processor.
16.1 REAL-ADDRESS MODE
The IA-32 architecture’s real-address mode runs programs written for the Intel 8086, Intel 8088,
Intel 80186, and Intel 80188 processors, or for the real-address mode of the Intel 286, Intel386,
Intel486, Pentium, P6 family, Pentium 4, and Intel Xeon processors.
The execution environment of the processor in real-address mode is designed to duplicate the
execution environment of the Intel 8086 processor. To an 8086 program, a processor operating
in real-address mode behaves like a high-speed 8086 processor. The principal features of this
architecture are defined in Chapter 3, Basic Execution Environment, of the IA-32 Intel Architecture
Software Developer’s Manual, Volume 1.
16-2 Vol. 3
8086 EMULATION
The following is a summary of the core features of the real-address mode execution environment
as would be seen by a program written for the 8086:
• The processor supports a nominal 1-MByte physical address space (see Section 16.1.1,
“Address Translation in Real-Address Mode”, for specific details). This address space is
divided into segments, each of which can be up to 64 KBytes in length. The base of a
segment is specified with a 16-bit segment selector, which is zero extended to form a
20-bit offset from address 0 in the address space. An operand within a segment is
addressed with a 16-bit offset from the base of the segment. A physical address is thus
formed by adding the offset to the 20-bit segment base (see Section 16.1.1, “Address
Translation in Real-Address Mode”).
• All operands in “native 8086 code” are 8-bit or 16-bit values. (Operand size override
prefixes can be used to access 32-bit operands.)
• Eight 16-bit general-purpose registers are provided: AX, BX, CX, DX, SP, BP, SI, and DI.
The extended 32 bit registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI) are
accessible to programs that explicitly perform a size override operation.
• Four segment registers are provided: CS, DS, SS, and ES. (The FS and GS registers are
accessible to programs that explicitly access them.) The CS register contains the segment
selector for the code segment; the DS and ES registers contain segment selectors for data
segments; and the SS register contains the segment selector for the stack segment.
• The 8086 16-bit instruction pointer (IP) is mapped to the lower 16-bits of the EIP register.
Note this register is a 32-bit register and unintentional address wrapping may occur.
• The 16-bit FLAGS register contains status and control flags. (This register is mapped to
the 16 least significant bits of the 32-bit EFLAGS register.)
• All of the Intel 8086 instructions are supported (see Section 16.1.3, “Instructions
Supported in Real-Address Mode”).
• A single, 16-bit-wide stack is provided for handling procedure calls and invocations of
interrupt and exception handlers. This stack is contained in the stack segment identified
with the SS register. The SP (stack pointer) register contains an offset into the stack
segment. The stack grows down (toward lower segment offsets) from the stack pointer.
The BP (base pointer) register also contains an offset into the stack segment that can be
used as a pointer to a parameter list. When a CALL instruction is executed, the processor
pushes the current instruction pointer (the 16 least-significant bits of the EIP register and,
on far calls, the current value of the CS register) onto the stack. On a return, initiated with
a RET instruction, the processor pops the saved instruction pointer from the stack into the
EIP register (and CS register on far returns). When an implicit call to an interrupt or
exception handler is executed, the processor pushes the EIP, CS, and EFLAGS (low-order
16-bits only) registers onto the stack. On a return from an interrupt or exception handler,
initiated with an IRET instruction, the processor pops the saved instruction pointer and
EFLAGS image from the stack into the EIP, CS, and EFLAGS registers.
• A single interrupt table, called the “interrupt vector table” or “interrupt table,” is
provided for handling interrupts and exceptions (see Figure 16-2). The interrupt table
(which has 4-byte entries) takes the place of the interrupt descriptor table (IDT, with
Vol. 3 16-3
8086 EMULATION
8-byte entries) used when handling protected-mode interrupts and exceptions. Interrupt
and exception vector numbers provide an index to entries in the interrupt table. Each
entry provides a pointer (called a “vector”) to an interrupt- or exception-handling
procedure. See Section 16.1.4, “Interrupt and Exception Handling”, for more details. It is
possible for software to relocate the IDT by means of the LIDT instruction on IA-32
processors beginning with the Intel386 processor.
• The x87 FPU is active and available to execute x87 FPU instructions in real-address mode.
Programs written to run on the Intel 8087 and Intel 287 math coprocessors can be run in
real-address mode without modification.
The following extensions to the Intel 8086 execution environment are available in the IA-32
architecture’s real-address mode. If backwards compatibility to Intel 286 and Intel 8086 processors
is required, these features should not be used in new programs written to run in real-address
mode.
• Two additional segment registers (FS and GS) are available.
• Many of the integer and system instructions that have been added to later IA-32 processors
can be executed in real-address mode (see Section 16.1.3, “Instructions Supported in Real-
Address Mode”).
• The 32-bit operand prefix can be used in real-address mode programs to execute the 32-bit
forms of instructions. This prefix also allows real-address mode programs to use the
processor’s 32-bit general-purpose registers.
• The 32-bit address prefix can be used in real-address mode programs, allowing 32-bit
offsets.
The following sections describe address formation, registers, available instructions, and interrupt
and exception handling in real-address mode. For information on I/O in real-address mode,
see Chapter 12, Input/Output, in the IA-32 Intel Architecture Software Developer’s Manual,
Volume 1.
16.1.1 Address Translation in Real-Address Mode
In real-address mode, the processor does not interpret segment selectors as indexes into a
descriptor table; instead, it uses them directly to form linear addresses as the 8086 processor
does. It shifts the segment selector left by 4 bits to form a 20-bit base address (see Figure 16-1).
The offset into a segment is added to the base address to create a linear address that maps directly
to the physical address space.
When using 8086-style address translation, it is possible to specify addresses larger than
1 MByte. For example, with a segment selector value of FFFFH and an offset of FFFFH, the
linear (and physical) address would be 10FFEFH (1 megabyte plus 64 KBytes). The 8086
processor, which can form addresses only up to 20 bits long, truncates the high-order bit, thereby
“wrapping” this address to FFEFH. When operating in real-address mode, however, the
processor does not truncate such an address and uses it as a physical address. (Note, however,
that for IA-32 processors beginning with the Intel486 processor, the A20M# signal can be used
in real-address mode to mask address line A20, thereby mimicking the 20-bit wrap-around
16-4 Vol. 3
8086 EMULATION
behavior of the 8086 processor.) Care should be take to ensure that A20M# based address wrapping
is handled correctly in multiprocessor based system.
The IA-32 processors beginning with the Intel386 processor can generate 32-bit offsets using an
address override prefix; however, in real-address mode, the value of a 32-bit offset may not
exceed FFFFH without causing an exception.
For full compatibility with Intel 286 real-address mode, pseudo-protection faults (interrupt 12
or 13) occur if a 32-bit offset is generated outside the range 0 through FFFFH.
16.1.2 Registers Supported in Real-Address Mode
The register set available in real-address mode includes all the registers defined for the 8086
processor plus the new registers introduced in later IA-32 processors, such as the FS and GS
segment registers, the debug registers, the control registers, and the floating-point unit registers.
The 32-bit operand prefix allows a real-address mode program to use the 32-bit general-purpose
registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI).
16.1.3 Instructions Supported in Real-Address Mode
The following instructions make up the core instruction set for the 8086 processor. If backwards
compatibility to the Intel 286 and Intel 8086 processors is required, only these instructions
should be used in a new program written to run in real-address mode.
• Move (MOV) instructions that move operands between general-purpose registers, segment
registers, and between memory and general-purpose registers.
• The exchange (XCHG) instruction.
• Load segment register instructions LDS and LES.
• Arithmetic instructions ADD, ADC, SUB, SBB, MUL, IMUL, DIV, IDIV, INC, DEC,
CMP, and NEG.
Figure 16-1. Real-Address Mode Address Translation
19 0
16-bit Segment Selector
3
Base 0 0 0 0
19 0
16-bit Effective Address
15
Offset 0 0 0 0
0
Linear 20-bit Linear Address
Address
+
=
4
16
19
Vol. 3 16-5
8086 EMULATION
• Logical instructions AND, OR, XOR, and NOT.
• Decimal instructions DAA, DAS, AAA, AAS, AAM, and AAD.
• Stack instructions PUSH and POP (to general-purpose registers and segment registers).
• Type conversion instructions CWD, CDQ, CBW, and CWDE.
• Shift and rotate instructions SAL, SHL, SHR, SAR, ROL, ROR, RCL, and RCR.
• TEST instruction.
• Control instructions JMP, Jcc, CALL, RET, LOOP, LOOPE, and LOOPNE.
• Interrupt instructions INT n, INTO, and IRET.
• EFLAGS control instructions STC, CLC, CMC, CLD, STD, LAHF, SAHF, PUSHF, and
POPF.
• I/O instructions IN, INS, OUT, and OUTS.
• Load effective address (LEA) instruction, and translate (XLATB) instruction.
• LOCK prefix.
• Repeat prefixes REP, REPE, REPZ, REPNE, and REPNZ.
• Processor halt (HLT) instruction.
• No operation (NOP) instruction.
The following instructions, added to later IA-32 processors (some in the Intel 286 processor and
the remainder in the Intel386 processor), can be executed in real-address mode, if backwards
compatibility to the Intel 8086 processor is not required.
• Move (MOV) instructions that operate on the control and debug registers.
• Load segment register instructions LSS, LFS, and LGS.
• Generalized multiply instructions and multiply immediate data.
• Shift and rotate by immediate counts.
• Stack instructions PUSHA, PUSHAD, POPA and POPAD, and PUSH immediate data.
• Move with sign extension instructions MOVSX and MOVZX.
• Long-displacement Jcc instructions.
• Exchange instructions CMPXCHG, CMPXCHG8B, and XADD.
• String instructions MOVS, CMPS, SCAS, LODS, and STOS.
• Bit test and bit scan instructions BT, BTS, BTR, BTC, BSF, and BSR; the byte-set-on
condition instruction SETcc; and the byte swap (BSWAP) instruction.
• Double shift instructions SHLD and SHRD.
• EFLAGS control instructions PUSHF and POPF.
16-6 Vol. 3
8086 EMULATION
• ENTER and LEAVE control instructions.
• BOUND instruction.
• CPU identification (CPUID) instruction.
• System instructions CLTS, INVD, WINVD, INVLPG, LGDT, SGDT, LIDT, SIDT,
LMSW, SMSW, RDMSR, WRMSR, RDTSC, and RDPMC.
Execution of any of the other IA-32 architecture instructions (not given in the previous two lists)
in real-address mode result in an invalid-opcode exception (#UD) being generated.
16.1.4 Interrupt and Exception Handling
When operating in real-address mode, software must provide interrupt and exception-handling
facilities that are separate from those provided in protected mode. Even during the early stages
of processor initialization when the processor is still in real-address mode, elementary realaddress
mode interrupt and exception-handling facilities must be provided to insure reliable
operation of the processor, or the initialization code must insure that no interrupts or exceptions
will occur.
The IA-32 processors handle interrupts and exceptions in real-address mode similar to the way
they handle them in protected mode. When a processor receives an interrupt or generates an
exception, it uses the vector number of the interrupt or exception as an index into the interrupt
table. (In protected mode, the interrupt table is called the interrupt descriptor table (IDT), but
in real-address mode, the table is usually called the interrupt vector table, or simply the interrupt
table.) The entry in the interrupt vector table provides a pointer to an interrupt- or exception-
handler procedure. (The pointer consists of a segment selector for a code segment and a
16-bit offset into the segment.) The processor performs the following actions to make an
implicit call to the selected handler:
1. Pushes the current values of the CS and EIP registers onto the stack. (Only the 16 leastsignificant
bits of the EIP register are pushed.)
2. Pushes the low-order 16 bits of the EFLAGS register onto the stack.
3. Clears the IF flag in the EFLAGS register to disable interrupts.
4. Clears the TF, RC, and AC flags, in the EFLAGS register.
5. Transfers program control to the location specified in the interrupt vector table.
An IRET instruction at the end of the handler procedure reverses these steps to return program
control to the interrupted program. Exceptions do not return error codes in real-address mode.
The interrupt vector table is an array of 4-byte entries (see Figure 16-2). Each entry consists of
a far pointer to a handler procedure, made up of a segment selector and an offset. The processor
scales the interrupt or exception vector by 4 to obtain an offset into the interrupt table. Following
reset, the base of the interrupt vector table is located at physical address 0 and its limit is set to
3FFH. In the Intel 8086 processor, the base address and limit of the interrupt vector table cannot
be changed. In the later IA-32 processors, the base address and limit of the interrupt vector table
are contained in the IDTR register and can be changed using the LIDT instruction.
Vol. 3 16-7
8086 EMULATION
(For backward compatibility to Intel 8086 processors, the default base address and limit of the
interrupt vector table should not be changed.)
Table 16-1 shows the interrupt and exception vectors that can be generated in real-address mode
and virtual-8086 mode, and in the Intel 8086 processor. See Chapter 5, Interrupt and Exception
Handling, for a description of the exception conditions.
16.2 VIRTUAL-8086 MODE
Virtual-8086 mode is actually a special type of a task that runs in protected mode. When the
operating-system or executive switches to a virtual-8086-mode task, the processor emulates an
Intel 8086 processor. The execution environment of the processor while in the 8086-emulation
state is the same as is described in Section 16.1, “Real-Address Mode” for real-address mode,
including the extensions. The major difference between the two modes is that in virtual-8086
mode the 8086 emulator uses some protected-mode services (such as the protected-mode interrupt
and exception-handling and paging facilities).
As in real-address mode, any new or legacy program that has been assembled and/or compiled
to run on an Intel 8086 processor will run in a virtual-8086-mode task. And several 8086
programs can be run as virtual-8086-mode tasks concurrently with normal protected-mode
tasks, using the processor’s multitasking facilities.
Figure 16-2. Interrupt Vector Table in Real-Address Mode
0
2
4
8
12
15 0
Segment Selector
Offset
* Interrupt vector number 0 selects entry 0
Interrupt Vector 0*
Entry 1
Entry 2
Entry 3
Up to Entry 255
(called “interrupt vector 0”) in the interrupt IDTR
vector table. Interrupt vector 0 in turn
points to the start of the interrupt handler
for interrupt 0.
16-8 Vol. 3
8086 EMULATION
Table 16-1. Real-Address Mode Exceptions and Interrupts
Vector
No. Description
Real-Address
Mode
Virtual-8086
Mode
Intel 8086
Processor
0 Divide Error (#DE) Yes Yes Yes
1 Debug Exception (#DB) Yes Yes No
2 NMI Interrupt Yes Yes Yes
3 Breakpoint (#BP) Yes Yes Yes
4 Overflow (#OF) Yes Yes Yes
5 BOUND Range Exceeded (#BR) Yes Yes Reserved
6 Invalid Opcode (#UD) Yes Yes Reserved
7 Device Not Available (#NM) Yes Yes Reserved
8 Double Fault (#DF) Yes Yes Reserved
9 (Intel reserved. Do not use.) Reserved Reserved Reserved
10 Invalid TSS (#TS) Reserved Yes Reserved
11 Segment Not Present (#NP) Reserved Yes Reserved
12 Stack Fault (#SS) Yes Yes Reserved
13 General Protection (#GP)* Yes Yes Reserved
14 Page Fault (#PF) Reserved Yes Reserved
15 (Intel reserved. Do not use.) Reserved Reserved Reserved
16 Floating-Point Error (#MF) Yes Yes Reserved
17 Alignment Check (#AC) Reserved Yes Reserved
18 Machine Check (#MC) Yes Yes Reserved
19-31 (Intel reserved. Do not use.) Reserved Reserved Reserved
32-255 User Defined Interrupts Yes Yes Yes
NOTE:
* In the real-address mode, vector 13 is the segment overrun exception. In protected and virtual-8086
modes, this exception covers all general-protection error conditions, including traps to the virtual-
8086 monitor from virtual-8086 mode.
Vol. 3 16-9
8086 EMULATION
16.2.1 Enabling Virtual-8086 Mode
The processor runs in virtual-8086 mode when the VM (virtual machine) flag in the EFLAGS
register is set. This flag can only be set when the processor switches to a new protected-mode
task or resumes virtual-8086 mode via an IRET instruction.
System software cannot change the state of the VM flag directly in the EFLAGS register (for
example, by using the POPFD instruction). Instead it changes the flag in the image of the
EFLAGS register stored in the TSS or on the stack following a call to an interrupt- or exceptionhandler
procedure. For example, software sets the VM flag in the EFLAGS image in the TSS
when first creating a virtual-8086 task.
The processor tests the VM flag under three general conditions:
• When loading segment registers, to determine whether to use 8086-style address
translation.
• When decoding instructions, to determine which instructions are not supported in virtual-
8086 mode and which instructions are sensitive to IOPL.
• When checking privileged instructions, on page accesses, or when performing other
permission checks. (Virtual-8086 mode always executes at CPL 3.)
16.2.2 Structure of a Virtual-8086 Task
A virtual-8086-mode task consists of the following items:
• A 32-bit TSS for the task.
• The 8086 program.
• A virtual-8086 monitor.
• 8086 operating-system services.
The TSS of the new task must be a 32-bit TSS, not a 16-bit TSS, because the 16-bit TSS does
not load the most-significant word of the EFLAGS register, which contains the VM flag. All
TSS’s, stacks, data, and code used to handle exceptions when in virtual-8086 mode must also be
32-bit segments.
The processor enters virtual-8086 mode to run the 8086 program and returns to protected mode
to run the virtual-8086 monitor.
The virtual-8086 monitor is a 32-bit protected-mode code module that runs at a CPL of 0. The
monitor consists of initialization, interrupt- and exception-handling, and I/O emulation procedures
that emulate a personal computer or other 8086-based platform. Typically, the monitor is
either part of or closely associated with the protected-mode general-protection (#GP) exception
handler, which also runs at a CPL of 0. As with any protected-mode code module, code-segment
descriptors for the virtual-8086 monitor must exist in the GDT or in the task’s LDT. The virtual-
8086 monitor also may need data-segment descriptors so it can examine the IDT or other parts
of the 8086 program in the first 1 MByte of the address space. The linear addresses above
10FFEFH are available for the monitor, the operating system, and other system software.
16-10 Vol. 3
8086 EMULATION
The 8086 operating-system services consists of a kernel and/or operating-system procedures
that the 8086 program makes calls to. These services can be implemented in either of the
following two ways:
• They can be included in the 8086 program. This approach is desirable for either of the
following reasons:
— The 8086 program code modifies the 8086 operating-system services.
— There is not sufficient development time to merge the 8086 operating-system services
into main operating system or executive.
• They can be implemented or emulated in the virtual-8086 monitor. This approach is
desirable for any of the following reasons:
— The 8086 operating-system procedures can be more easily coordinated among several
virtual-8086 tasks.
— Memory can be saved by not duplicating 8086 operating-system procedure code for
several virtual-8086 tasks.
— The 8086 operating-system procedures can be easily emulated by calls to the main
operating system or executive.
The approach chosen for implementing the 8086 operating-system services may result in
different virtual-8086-mode tasks using different 8086 operating-system services.
16.2.3 Paging of Virtual-8086 Tasks
Even though a program running in virtual-8086 mode can use only 20-bit linear addresses, the
processor converts these addresses into 32-bit linear addresses before mapping them to the physical
address space. If paging is being used, the 8086 address space for a program running in
virtual-8086 mode can be paged and located in a set of pages in physical address space. If paging
is used, it is transparent to the program running in virtual-8086 mode just as it is for any task
running on the processor.
Paging is not necessary for a single virtual-8086-mode task, but paging is useful or necessary in
the following situations:
• When running multiple virtual-8086-mode tasks. Here, paging allows the lower 1 MByte
of the linear address space for each virtual-8086-mode task to be mapped to a different
physical address location.
• When emulating the 8086 address-wraparound that occurs at 1 MByte. When using 8086-
style address translation, it is possible to specify addresses larger than 1 MByte. These
addresses automatically wraparound in the Intel 8086 processor (see Section 16.1.1,
“Address Translation in Real-Address Mode”). If any 8086 programs depend on address
wraparound, the same effect can be achieved in a virtual-8086-mode task by mapping the
linear addresses between 100000H and 110000H and linear addresses between 0 and
10000H to the same physical addresses.
Vol. 3 16-11
8086 EMULATION
• When sharing the 8086 operating-system services or ROM code that is common to several
8086 programs running as different 8086-mode tasks.
• When redirecting or trapping references to memory-mapped I/O devices.
16.2.4 Protection within a Virtual-8086 Task
Protection is not enforced between the segments of an 8086 program. Either of the following
techniques can be used to protect the system software running in a virtual-8086-mode task from
the 8086 program:
• Reserve the first 1 MByte plus 64 KBytes of each task’s linear address space for the 8086
program. An 8086 processor task cannot generate addresses outside this range.
• Use the U/S flag of page-table entries to protect the virtual-8086 monitor and other system
software in the virtual-8086 mode task space. When the processor is in virtual-8086 mode,
the CPL is 3. Therefore, an 8086 processor program has only user privileges. If the pages
of the virtual-8086 monitor have supervisor privilege, they cannot be accessed by the 8086
program.
16.2.5 Entering Virtual-8086 Mode
Figure 16-3 summarizes the methods of entering and leaving virtual-8086 mode. The processor
switches to virtual-8086 mode in either of the following situations:
• Task switch when the VM flag is set to 1 in the EFLAGS register image stored in the TSS
for the task. Here the task switch can be initiated in either of two ways:
— A CALL or JMP instruction.
— An IRET instruction, where the NT flag in the EFLAGS image is set to 1.
• Return from a protected-mode interrupt or exception handler when the VM flag is set to 1
in the EFLAGS register image on the stack.
When a task switch is used to enter virtual-8086 mode, the TSS for the virtual-8086-mode task
must be a 32-bit TSS. (If the new TSS is a 16-bit TSS, the upper word of the EFLAGS register
is not in the TSS, causing the processor to clear the VM flag when it loads the EFLAGS register.)
The processor updates the VM flag prior to loading the segment registers from their images in
the new TSS. The new setting of the VM flag determines whether the processor interprets the
contents of the segment registers as 8086-style segment selectors or protected-mode segment
selectors. When the VM flag is set, the segment registers are loaded from the TSS, using 8086-
style address translation to form base addresses.
See Section 16.3, “Interrupt and Exception Handling in Virtual-8086 Mode”, for information on
entering virtual-8086 mode on a return from an interrupt or exception handler.
16-12 Vol. 3
8086 EMULATION
Figure 16-3. Entering and Leaving Virtual-8086 Mode
Monitor
Virtual-8086
Real Mode
Code
Protected-
Mode Tasks
Virtual-8086
Mode Tasks
(8086
Programs)
Protected-
Mode Interrupt
and Exception
Handlers
Task Switch1
VM = 1
Protected
Mode
Virtual-8086
Mode
Real-Address
Mode
RESET
PE=1
PE=0 or
RESET
#GP Exception3
CALL
RET
Task Switch
VM=0
Redirect Interrupt to 8086 Program
Interrupt or Exception Handler6
IRET4
Interrupt or
Exception2
VM = 0
NOTES:
- CALL or JMP where the VM flag in the EFLAGS image is 1.
- IRET where VM is 1 and NT is 1.
4. Normal return from protected-mode interrupt or exception handler.
3. General-protection exception caused by software interrupt (INT n), IRET,
POPF, PUSHF, IN, or OUT when IOPL is less than 3.
2. Hardware interrupt or exception; software interrupt (INT n) when IOPL is 3.
5. A return from the 8086 monitor to redirect an interrupt or exception back
to an interrupt or exception handler in the 8086 program running in virtual-
6. Internal redirection of a software interrupt (INT n) when VME is 1,
IOPL is <3, and the redirection bit is 1.
IRET5
8086 mode.
1. Task switch carried out in either of two ways:
Vol. 3 16-13
8086 EMULATION
16.2.6 Leaving Virtual-8086 Mode
The processor can leave the virtual-8086 mode only through an interrupt or exception. The
following are situations where an interrupt or exception will lead to the processor leaving
virtual-8086 mode (see Figure 16-3):
• The processor services a hardware interrupt generated to signal the suspension of
execution of the virtual-8086 application. This hardware interrupt may be generated by a
timer or other external mechanism. Upon receiving the hardware interrupt, the processor
enters protected mode and switches to a protected-mode (or another virtual-8086 mode)
task either through a task gate in the protected-mode IDT or through a trap or interrupt gate
that points to a handler that initiates a task switch. A task switch from a virtual-8086 task
to another task loads the EFLAGS register from the TSS of the new task. The value of the
VM flag in the new EFLAGS determines if the new task executes in virtual-8086 mode or
not.
• The processor services an exception caused by code executing the virtual-8086 task or
services a hardware interrupt that “belongs to” the virtual-8086 task. Here, the processor
enters protected mode and services the exception or hardware interrupt through the
protected-mode IDT (normally through an interrupt or trap gate) and the protected-mode
exception- and interrupt-handlers. The processor may handle the exception or interrupt
within the context of the virtual 8086 task and return to virtual-8086 mode on a return from
the handler procedure. The processor may also execute a task switch and handle the
exception or interrupt in the context of another task.
• The processor services a software interrupt generated by code executing in the virtual-
8086 task (such as a software interrupt to call a MS-DOS* operating system routine). The
processor provides several methods of handling these software interrupts, which are
discussed in detail in Section 16.3.3, “Class 3—Software Interrupt Handling in Virtual-
8086 Mode”. Most of them involve the processor entering protected mode, often by means
of a general-protection (#GP) exception. In protected mode, the processor can send the
interrupt to the virtual-8086 monitor for handling and/or redirect the interrupt back to the
application program running in virtual-8086 mode task for handling.
IA-32 processors that incorporate the virtual mode extension (enabled with the VME flag
in control register CR4) are capable of redirecting software-generated interrupts back to
the program’s interrupt handlers without leaving virtual-8086 mode. See Section 16.3.3.4,
“Method 5: Software Interrupt Handling”, for more information on this mechanism.
• A hardware reset initiated by asserting the RESET or INIT pin is a special kind of
interrupt. When a RESET or INIT is signaled while the processor is in virtual-8086 mode,
the processor leaves virtual-8086 mode and enters real-address mode.
• Execution of the HLT instruction in virtual-8086 mode will cause a general-protection
(GP#) fault, which the protected-mode handler generally sends to the virtual-8086 monitor.
The virtual-8086 monitor then determines the correct execution sequence after verifying
that it was entered as a result of a HLT execution.
See Section 16.3, “Interrupt and Exception Handling in Virtual-8086 Mode”, for information on
leaving virtual-8086 mode to handle an interrupt or exception generated in virtual-8086 mode.
16-14 Vol. 3
8086 EMULATION
16.2.7 Sensitive Instructions
When an IA-32 processor is running in virtual-8086 mode, the CLI, STI, PUSHF, POPF, INT n,
and IRET instructions are sensitive to IOPL. The IN, INS, OUT, and OUTS instructions, which
are sensitive to IOPL in protected mode, are not sensitive in virtual-8086 mode.
The CPL is always 3 while running in virtual-8086 mode; if the IOPL is less than 3, an attempt
to use the IOPL-sensitive instructions listed above triggers a general-protection exception
(#GP). These instructions are sensitive to IOPL to give the virtual-8086 monitor a chance to
emulate the facilities they affect.
16.2.8 Virtual-8086 Mode I/O
Many 8086 programs written for non-multitasking systems directly access I/O ports. This practice
may cause problems in a multitasking environment. If more than one program accesses the
same port, they may interfere with each other. Most multitasking systems require application
programs to access I/O ports through the operating system. This results in simplified, centralized
control.
The processor provides I/O protection for creating I/O that is compatible with the environment
and transparent to 8086 programs. Designers may take any of several possible approaches to
protecting I/O ports:
• Protect the I/O address space and generate exceptions for all attempts to perform I/O
directly.
• Let the 8086 program perform I/O directly.
• Generate exceptions on attempts to access specific I/O ports.
• Generate exceptions on attempts to access specific memory-mapped I/O ports.
The method of controlling access to I/O ports depends upon whether they are I/O-port mapped
or memory mapped.
16.2.8.1 I/O-Port-Mapped I/O
The I/O permission bit map in the TSS can be used to generate exceptions on attempts to access
specific I/O port addresses. The I/O permission bit map of each virtual-8086-mode task determines
which I/O addresses generate exceptions for that task. Because each task may have a
different I/O permission bit map, the addresses that generate exceptions for one task may be
different from the addresses for another task. This differs from protected mode in which, if the
CPL is less than or equal to the IOPL, I/O access is allowed without checking the I/O permission
bit map. See Chapter 12, Input/Output, in the IA-32 Intel Architecture Software Developer’s
Manual, Volume 1, for more information about the I/O permission bit map.
Vol. 3 16-15
8086 EMULATION
16.2.8.2 Memory-Mapped I/O
In systems which use memory-mapped I/O, the paging facilities of the processor can be used to
generate exceptions for attempts to access I/O ports. The virtual-8086 monitor may use paging
to control memory-mapped I/O in these ways:
• Map part of the linear address space of each task that needs to perform I/O to the physical
address space where I/O ports are placed. By putting the I/O ports at different addresses (in
different pages), the paging mechanism can enforce isolation between tasks.
• Map part of the linear address space to pages that are not-present. This generates an
exception whenever a task attempts to perform I/O to those pages. System software then
can interpret the I/O operation being attempted.
Software emulation of the I/O space may require too much operating system intervention under
some conditions. In these cases, it may be possible to generate an exception for only the first
attempt to access I/O. The system software then may determine whether a program can be given
exclusive control of I/O temporarily, the protection of the I/O space may be lifted, and the
program allowed to run at full speed.
16.2.8.3 Special I/O Buffers
Buffers of intelligent controllers (for example, a bit-mapped frame buffer) also can be emulated
using page mapping. The linear space for the buffer can be mapped to a different physical space
for each virtual-8086-mode task. The virtual-8086 monitor then can control which virtual buffer
to copy onto the real buffer in the physical address space.
16.3 INTERRUPT AND EXCEPTION HANDLING
IN VIRTUAL-8086 MODE
When the processor receives an interrupt or detects an exception condition while in virtual-8086
mode, it invokes an interrupt or exception handler, just as it does in protected or real-address
mode. The interrupt or exception handler that is invoked and the mechanism used to invoke it
depends on the class of interrupt or exception that has been detected or generated and the state
of various system flags and fields.
In virtual-8086 mode, the interrupts and exceptions are divided into three classes for the
purposes of handling:
• Class 1 — All processor-generated exceptions and all hardware interrupts, including the
NMI interrupt and the hardware interrupts sent to the processor’s external interrupt
delivery pins. All class 1 exceptions and interrupts are handled by the protected-mode
exception and interrupt handlers.
• Class 2 — Special case for maskable hardware interrupts (Section 5.3.2, “Maskable
Hardware Interrupts”) when the virtual mode extensions are enabled.
• Class 3 — All software-generated interrupts, that is interrupts generated with the INT n
instruction1.
16-16 Vol. 3
8086 EMULATION
The method the processor uses to handle class 2 and 3 interrupts depends on the setting of the
following flags and fields:
• IOPL field (bits 12 and 13 in the EFLAGS register) — Controls how class 3 software
interrupts are handled when the processor is in virtual-8086 mode (see Section 2.3,
“System Flags and Fields in the EFLAGS Register”). This field also controls the enabling
of the VIF and VIP flags in the EFLAGS register when the VME flag is set. The VIF and
VIP flags are provided to assist in the handling of class 2 maskable hardware interrupts.
• VME flag (bit 0 in control register CR4) — Enables the virtual mode extension for the
processor when set (see Section 2.5, “Control Registers”).
• Software interrupt redirection bit map (32 bytes in the TSS, see Figure 16-5) —
Contains 256 flags that indicates how class 3 software interrupts should be handled when
they occur in virtual-8086 mode. A software interrupt can be directed either to the interrupt
and exception handlers in the currently running 8086 program or to the protected-mode
interrupt and exception handlers.
• The virtual interrupt flag (VIF) and virtual interrupt pending flag (VIP) in the
EFLAGS register — Provides virtual interrupt support for the handling of class 2
maskable hardware interrupts (see Section 16.3.2, “Class 2—Maskable Hardware Interrupt
Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism”).
NOTE
The VME flag, software interrupt redirection bit map, and VIF and VIP flags
are only available in IA-32 processors that support the virtual mode
extensions. These extensions were introduced in the IA-32 architecture with
the Pentium processor.
The following sections describe the actions that processor takes and the possible actions of interrupt
and exception handlers for the two classes of interrupts described in the previous paragraphs.
These sections describe three possible types of interrupt and exception handlers:
• Protected-mode interrupt and exceptions handlers — These are the standard handlers
that the processor calls through the protected-mode IDT.
• Virtual-8086 monitor interrupt and exception handlers — These handlers are resident
in the virtual-8086 monitor, and they are commonly accessed through a general-protection
exception (#GP, interrupt 13) that is directed to the protected-mode general-protection
exception handler.
• 8086 program interrupt and exception handlers — These handlers are part of the 8086
program that is running in virtual-8086 mode.
The following sections describe how these handlers are used, depending on the selected class
and method of interrupt and exception handling.
1. The INT 3 instruction is a special case (see the description of the INT n instruction in Chapter 3, Instruction
Set Reference, of the IA-32 Intel Architecture Software Developer’s Manual, Volume 2).
Vol. 3 16-17
8086 EMULATION
16.3.1 Class 1—Hardware Interrupt and Exception Handling
in Virtual-8086 Mode
In virtual-8086 mode, the Pentium, P6 family, Pentium 4, and Intel Xeon processors handle
hardware interrupts and exceptions in the same manner as they are handled by the Intel486 and
Intel386 processors. They invoke the protected-mode interrupt or exception handler that the
interrupt or exception vector points to in the IDT. Here, the IDT entry must contain either a
32-bit trap or interrupt gate or a task gate. The following sections describe various ways that a
virtual-8086 mode interrupt or exception can be handled after the protected-mode handler has
been invoked.
See Section 16.3.2, “Class 2—Maskable Hardware Interrupt Handling in Virtual-8086 Mode
Using the Virtual Interrupt Mechanism”, for a description of the virtual interrupt mechanism that
is available for handling maskable hardware interrupts while in virtual-8086 mode. When this
mechanism is either not available or not enabled, maskable hardware interrupts are handled in
the same manner as exceptions, as described in the following sections.
16.3.1.1 Handling an Interrupt or Exception Through a
Protected-Mode Trap or Interrupt Gate
When an interrupt or exception vector points to a 32-bit trap or interrupt gate in the IDT, the gate
must in turn point to a nonconforming, privilege-level 0, code segment. When accessing this
code segment, processor performs the following steps.
1. Switches to 32-bit protected mode and privilege level 0.
2. Saves the state of the processor on the privilege-level 0 stack. The states of the EIP, CS,
EFLAGS, ESP, SS, ES, DS, FS, and GS registers are saved (see Figure 16-4).
3. Clears the segment registers. Saving the DS, ES, FS, and GS registers on the stack and then
clearing the registers lets the interrupt or exception handler safely save and restore these
registers regardless of the type segment selectors they contain (protected-mode or 8086-
style). The interrupt and exception handlers, which may be called in the context of either a
protected-mode task or a virtual-8086-mode task, can use the same code sequences for
saving and restoring the registers for any task. Clearing these registers before execution of
the IRET instruction does not cause a trap in the interrupt handler. Interrupt procedures that
expect values in the segment registers or that return values in the segment registers must
use the register images saved on the stack for privilege level 0.
4. Clears VM, NT, RF and TF flags (in the EFLAGS register). If the gate is an interrupt gate,
clears the IF flag.
5. Begins executing the selected interrupt or exception handler.
If the trap or interrupt gate references a procedure in a conforming segment or in a segment at a
privilege level other than 0, the processor generates a general-protection exception (#GP). Here,
the error code is the segment selector of the code segment to which a call was attempted.
16-18 Vol. 3
8086 EMULATION
Interrupt and exception handlers can examine the VM flag on the stack to determine if the interrupted
procedure was running in virtual-8086 mode. If so, the interrupt or exception can be
handled in one of three ways:
• The protected-mode interrupt or exception handler that was called can handle the interrupt
or exception.
• The protected-mode interrupt or exception handler can call the virtual-8086 monitor to
handle the interrupt or exception.
• The virtual-8086 monitor (if called) can in turn pass control back to the 8086 program’s
interrupt and exception handler.
If the interrupt or exception is handled with a protected-mode handler, the handler can return to
the interrupted program in virtual-8086 mode by executing an IRET instruction. This instruction
loads the EFLAGS and segment registers from the images saved in the privilege level 0 stack
(see Figure 16-4). A set VM flag in the EFLAGS image causes the processor to switch back to
virtual-8086 mode. The CPL at the time the IRET instruction is executed must be 0, otherwise
the processor does not change the state of the VM flag.
Figure 16-4. Privilege Level 0 Stack After Interrupt or Exception in Virtual-8086 Mode
Unused
Old GS
Old ESP
With Error Code
ESP from
Old FS
Old DS
Old ES
Old SS
Old EFLAGS
Old CS
Old EIP
Error Code New ESP
Unused TSS
Old GS
Old ESP
Without Error Code
ESP from
Old FS
Old DS
Old ES
Old SS
Old EFLAGS
Old CS
Old EIP New ESP
TSS
Vol. 3 16-19
8086 EMULATION
The virtual-8086 monitor runs at privilege level 0, like the protected-mode interrupt and exception
handlers. It is commonly closely tied to the protected-mode general-protection exception
(#GP, vector 13) handler. If the protected-mode interrupt or exception handler calls the virtual-
8086 monitor to handle the interrupt or exception, the return from the virtual-8086 monitor to
the interrupted virtual-8086 mode program requires two return instructions: a RET instruction
to return to the protected-mode handler and an IRET instruction to return to the interrupted
program.
The virtual-8086 monitor has the option of directing the interrupt and exception back to an
interrupt or exception handler that is part of the interrupted 8086 program, as described in
Section 16.3.1.2, “Handling an Interrupt or Exception With an 8086 Program Interrupt or
Exception Handler”.
16.3.1.2 Handling an Interrupt or Exception With an
8086 Program Interrupt or Exception Handler
Because it was designed to run on an 8086 processor, an 8086 program running in a virtual-
8086-mode task contains an 8086-style interrupt vector table, which starts at linear address 0. If
the virtual-8086 monitor correctly directs an interrupt or exception vector back to the virtual-
8086-mode task it came from, the handlers in the 8086 program can handle the interrupt or
exception. The virtual-8086 monitor must carry out the following steps to send an interrupt or
exception back to the 8086 program:
1. Use the 8086 interrupt vector to locate the appropriate handler procedure in the 8086
program interrupt table.
2. Store the EFLAGS (low-order 16 bits only), CS and EIP values of the 8086 program on the
privilege-level 3 stack. This is the stack that the virtual-8086-mode task is using. (The
8086 handler may use or modify this information.)
3. Change the return link on the privilege-level 0 stack to point to the privilege-level 3
handler procedure.
4. Execute an IRET instruction to pass control to the 8086 program handler.
5. When the IRET instruction from the privilege-level 3 handler triggers a general-protection
exception (#GP) and thus effectively again calls the virtual-8086 monitor, restore the
return link on the privilege-level 0 stack to point to the original, interrupted, privilege-level
3 procedure.
6. Copy the low order 16 bits of the EFLAGS image from the privilege-level 3 stack to the
privilege-level 0 stack (because some 8086 handlers modify these flags to return
information to the code that caused the interrupt).
7. Execute an IRET instruction to pass control back to the interrupted 8086 program.
Note that if an operating system intends to support all 8086 MS-DOS-based programs, it is
necessary to use the actual 8086 interrupt and exception handlers supplied with the program.
The reason for this is that some programs modify their own interrupt vector table to substitute
(or hook in series) their own specialized interrupt and exception handlers.
16-20 Vol. 3
8086 EMULATION
16.3.1.3 Handling an Interrupt or Exception Through a Task Gate
When an interrupt or exception vector points to a task gate in the IDT, the processor performs a
task switch to the selected interrupt- or exception-handling task. The following actions are
carried out as part of this task switch:
1. The EFLAGS register with the VM flag set is saved in the current TSS.
2. The link field in the TSS of the called task is loaded with the segment selector of the TSS
for the interrupted virtual-8086-mode task.
3. The EFLAGS register is loaded from the image in the new TSS, which clears the VM flag
and causes the processor to switch to protected mode.
4. The NT flag in the EFLAGS register is set.
5. The processor begins executing the selected interrupt- or exception-handler task.
When an IRET instruction is executed in the handler task and the NT flag in the EFLAGS
register is set, the processors switches from a protected-mode interrupt- or exception-handler
task back to a virtual-8086-mode task. Here, the EFLAGS and segment registers are loaded from
images saved in the TSS for the virtual-8086-mode task. If the VM flag is set in the EFLAGS
image, the processor switches back to virtual-8086 mode on the task switch. The CPL at the time
the IRET instruction is executed must be 0, otherwise the processor does not change the state of
the VM flag.
16.3.2 Class 2—Maskable Hardware Interrupt Handling in
Virtual-8086 Mode Using the Virtual Interrupt Mechanism
Maskable hardware interrupts are those interrupts that are delivered through the INTR# pin or
through an interrupt request to the local APIC (see Section 5.3.2, “Maskable Hardware Interrupts”).
These interrupts can be inhibited (masked) from interrupting an executing program or
task by clearing the IF flag in the EFLAGS register.
When the VME flag in control register CR4 is set and the IOPL field in the EFLAGS register is
less than 3, two additional flags are activated in the EFLAGS register:
• VIF (virtual interrupt) flag, bit 19 of the EFLAGS register.
• VIP (virtual interrupt pending) flag, bit 20 of the EFLAGS register.
These flags provide the virtual-8086 monitor with more efficient control over handling
maskable hardware interrupts that occur during virtual-8086 mode tasks. They also reduce interrupt-
handling overhead, by eliminating the need for all IF related operations (such as PUSHF,
POPF, CLI, and STI instructions) to trap to the virtual-8086 monitor. The purpose and use of
these flags are as follows.
NOTE
The VIF and VIP flags are only available in IA-32 processors that support the
virtual mode extensions. These extensions were introduced in the IA-32
architecture with the Pentium processor. When this mechanism is either not
Vol. 3 16-21
8086 EMULATION
available or not enabled, maskable hardware interrupts are handled as class 1
interrupts. Here, if VIF and VIP flags are needed, the virtual-8086 monitor
can implement them in software.
Existing 8086 programs commonly set and clear the IF flag in the EFLAGS register to enable
and disable maskable hardware interrupts, respectively; for example, to disable interrupts while
handling another interrupt or an exception. This practice works well in single task environments,
but can cause problems in multitasking and multiple-processor environments, where it is often
desirable to prevent an application program from having direct control over the handling of
hardware interrupts. When using earlier IA-32 processors, this problem was often solved by
creating a virtual IF flag in software. The IA-32 processors (beginning with the Pentium
processor) provide hardware support for this virtual IF flag through the VIF and VIP flags.
The VIF flag is a virtualized version of the IF flag, which an application program running from
within a virtual-8086 task can used to control the handling of maskable hardware interrupts.
When the VIF flag is enabled, the CLI and STI instructions operate on the VIF flag instead of
the IF flag. When an 8086 program executes the CLI instruction, the processor clears the VIF
flag to request that the virtual-8086 monitor inhibit maskable hardware interrupts from interrupting
program execution; when it executes the STI instruction, the processor sets the VIF flag
requesting that the virtual-8086 monitor enable maskable hardware interrupts for the 8086
program. But actually the IF flag, managed by the operating system, always controls whether
maskable hardware interrupts are enabled. Also, if under these circumstances an 8086 program
tries to read or change the IF flag using the PUSHF or POPF instructions, the processor will
change the VIF flag instead, leaving IF unchanged.
The VIP flag provides software a means of recording the existence of a deferred (or pending)
maskable hardware interrupt. This flag is read by the processor but never explicitly written by
the processor; it can only be written by software.
If the IF flag is set and the VIF and VIP flags are enabled, and the processor receives a maskable
hardware interrupt (interrupt vector 0 through 255), the processor performs and the interrupt
handler software should perform the following operations:
1. The processor invokes the protected-mode interrupt handler for the interrupt received, as
described in the following steps. These steps are almost identical to those described for
method 1 interrupt and exception handling in Section 16.3.1.1, “Handling an Interrupt or
Exception Through a Protected-Mode Trap or Interrupt Gate”:
a. Switches to 32-bit protected mode and privilege level 0.
b. Saves the state of the processor on the privilege-level 0 stack. The states of the EIP,
CS, EFLAGS, ESP, SS, ES, DS, FS, and GS registers are saved (see Figure 16-4).
c. Clears the segment registers.
d. Clears the VM flag in the EFLAGS register.
e. Begins executing the selected protected-mode interrupt handler.
16-22 Vol. 3
8086 EMULATION
2. The recommended action of the protected-mode interrupt handler is to read the VM flag
from the EFLAGS image on the stack. If this flag is set, the handler makes a call to the
virtual-8086 monitor.
3. The virtual-8086 monitor should read the VIF flag in the EFLAGS register.
— If the VIF flag is clear, the virtual-8086 monitor sets the VIP flag in the EFLAGS
image on the stack to indicate that there is a deferred interrupt pending and returns to
the protected-mode handler.
— If the VIF flag is set, the virtual-8086 monitor can handle the interrupt if it “belongs”
to the 8086 program running in the interrupted virtual-8086 task; otherwise, it can call
the protected-mode interrupt handler to handle the interrupt.
4. The protected-mode handler executes a return to the program executing in virtual-8086
mode.
5. Upon returning to virtual-8086 mode, the processor continues execution of the 8086
program.
When the 8086 program is ready to receive maskable hardware interrupts, it executes the STI
instruction to set the VIF flag (enabling maskable hardware interrupts). Prior to setting the VIF
flag, the processor automatically checks the VIP flag and does one of the following, depending
on the state of the flag:
• If the VIP flag is clear (indicating no pending interrupts), the processor sets the VIF flag.
• If the VIP flag is set (indicating a pending interrupt), the processor generates a generalprotection
exception (#GP).
The recommended action of the protected-mode general-protection exception handler is to then
call the virtual-8086 monitor and let it handle the pending interrupt. After handling the pending
interrupt, the typical action of the virtual-8086 monitor is to clear the VIP flag and set the VIF
flag in the EFLAGS image on the stack, and then execute a return to the virtual-8086 mode. The
next time the processor receives a maskable hardware interrupt, it will then handle it as
described in steps 1 through 5 earlier in this section.
If the processor finds that both the VIF and VIP flags are set at the beginning of an instruction,
it generates a general-protection exception. This action allows the virtual-8086 monitor to
handle the pending interrupt for the virtual-8086 mode task for which the VIF flag is enabled.
Note that this situation can only occur immediately following execution of a POPF or IRET
instruction or upon entering a virtual-8086 mode task through a task switch.
Note that the states of the VIF and VIP flags are not modified in real-address mode or during
transitions between real-address and protected modes.
NOTE
The virtual interrupt mechanism described in this section is also available for
use in protected mode, see Section 16.4, “Protected-Mode Virtual Interrupts”.
Vol. 3 16-23
8086 EMULATION
16.3.3 Class 3—Software Interrupt Handling in
Virtual-8086 Mode
When the processor receives a software interrupt (an interrupt generated with the INT n
instruction) while in virtual-8086 mode, it can use any of six different methods to handle the
interrupt. The method selected depends on the settings of the VME flag in control register CR4,
the IOPL field in the EFLAGS register, and the software interrupt redirection bit map in the
TSS. Table 16-2 lists the six methods of handling software interrupts in virtual-8086 mode and
the respective settings of the VME flag, IOPL field, and the bits in the interrupt redirection bit
map for each method. The table also summarizes the various actions the processor takes for
each method.
The VME flag enables the virtual mode extensions for the Pentium and later IA-32 processors.
When this flag is clear, the processor responds to interrupts and exceptions in virtual-8086 mode
in the same manner as an Intel386 or Intel486 processor does. When this flag is set, the virtual
mode extension provides the following enhancements to virtual-8086 mode:
• Speeds up the handling of software-generated interrupts in virtual-8086 mode by allowing
the processor to bypass the virtual-8086 monitor and redirect software interrupts back to
the interrupt handlers that are part of the currently running 8086 program.
• Supports virtual interrupts for software written to run on the 8086 processor.
The IOPL value interacts with the VME flag and the bits in the interrupt redirection bit map to
determine how specific software interrupts should be handled.
The software interrupt redirection bit map (see Figure 16-5) is a 32-byte field in the TSS. This
map is located directly below the I/O permission bit map in the TSS. Each bit in the interrupt
redirection bit map is mapped to an interrupt vector. Bit 0 in the interrupt redirection bit map
(which maps to vector zero in the interrupt table) is located at the I/O base map address in the
TSS minus 32 bytes. When a bit in this bit map is set, it indicates that the associated software
interrupt (interrupt generated with an INT n instruction) should be handled through the
protected-mode IDT and interrupt and exception handlers. When a bit in this bit map is clear,
the processor redirects the associated software interrupt back to the interrupt table in the 8086
program (located at linear address 0 in the program’s address space).
NOTE
The software interrupt redirection bit map does not affect hardware generated
interrupts and exceptions. Hardware generated interrupts and exceptions are
always handled by the protected-mode interrupt and exception handlers.
16-24 Vol. 3
8086 EMULATION
Table 16-2. Software Interrupt Handling Methods While in Virtual-8086 Mode
Method VME IOPL
Bit in
Redir.
Bitmap* Processor Action
1 0 3 X Interrupt directed to a protected-mode interrupt handler:
- Switches to privilege-level 0 stack
- Pushes GS, FS, DS and ES onto privilege-level 0 stack
- Pushes SS, ESP, EFLAGS, CS and EIP of interrupted
task onto privilege-level 0 stack
- Clears VM, RF, NT, and TF flags
- If serviced through interrupt gate, clears IF flag
- Clears GS, FS, DS and ES to 0
- Sets CS and EIP from interrupt gate
2 0 < 3 X Interrupt directed to protected-mode general-protection
exception (#GP) handler.
3 1 < 3 1 Interrupt directed to a protected-mode general-protection
exception (#GP) handler; VIF and VIP flag support for handling
class 2 maskable hardware interrupts.
4 1 3 1 Interrupt directed to protected-mode interrupt handler: (see
method 1 processor action).
5 1 3 0 Interrupt redirected to 8086 program interrupt handler:
- Pushes EFLAGS
- Pushes CS and EIP (lower 16 bits only)
- Clears IF flag
- Clears TF flag
- Loads CS and EIP (lower 16 bits only) from selected entry in
the interrupt vector table of the current virtual-8086 task
6 1 < 3 0 Interrupt redirected to 8086 program interrupt handler; VIF and
VIP flag support for handling class 2 maskable hardware
interrupts:
- Pushes EFLAGS with IOPL set to 3 and VIF copied to IF
- Pushes CS and EIP (lower 16 bits only)
- Clears the VIF flag
- Clears TF flag
- Loads CS and EIP (lower 16 bits only) from selected entry in
the interrupt vector table of the current virtual-8086 task
NOTE:
* When set to 0, software interrupt is redirected back to the 8086 program interrupt handler; when set to
1, interrupt is directed to protected-mode handler.
Vol. 3 16-25
8086 EMULATION
Redirecting software interrupts back to the 8086 program potentially speeds up interrupt
handling because a switch back and forth between virtual-8086 mode and protected mode is not
required. This latter interrupt-handling technique is particularly useful for 8086 operating
systems (such as MS-DOS) that use the INT n instruction to call operating system procedures.
The CPUID instruction can be used to verify that the virtual mode extension is implemented on
the processor. Bit 1 of the feature flags register (EDX) indicates the availability of the virtual
mode extension (see “CPUID—CPU Identification” in Chapter 3 of the IA-32 Intel Architecture
Software Developer’s Manual, Volume 2).
The following sections describe the six methods (or mechanisms) for handling software interrupts
in virtual-8086 mode. See Section 16.3.2, “Class 2—Maskable Hardware Interrupt
Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism”, for a description of the
use of the VIF and VIP flags in the EFLAGS register for handling maskable hardware interrupts.
16.3.3.1 Method 1: Software Interrupt Handling
When the VME flag in control register CR4 is clear and the IOPL field is 3, a Pentium or later
IA-32 processor handles software interrupts in the same manner as they are handled by an
Intel386 or Intel486 processor. It executes an implicit call to the interrupt handler in the
protected-mode IDT pointed to by the interrupt vector. See Section 16.3.1, “Class 1—Hardware
Interrupt and Exception Handling in Virtual-8086 Mode”, for a complete description of this
mechanism and its possible uses.
Figure 16-5. Software Interrupt Redirection Bit Map in TSS
I/O Map Base
Task-State Segment (TSS)
64H
31 24 23 0
1 1 1 1 1 1 1 1
I/O Permission Bit Map
0
I/O map base
must not exceed
DFFFH.
Last byte of bit
map must be
followed by a
byte with all bits
Software Interrupt Redirection Bit Map (32 Bytes)
16-26 Vol. 3
8086 EMULATION
16.3.3.2 Methods 2 and 3: Software Interrupt Handling
When a software interrupt occurs in virtual-8086 mode and the method 2 or 3 conditions are
present, the processor generates a general-protection exception (#GP). Method 2 is enabled
when the VME flag is set to 0 and the IOPL value is less than 3. Here the IOPL value is used to
bypass the protected-mode interrupt handlers and cause any software interrupt that occurs in
virtual-8086 mode to be treated as a protected-mode general-protection exception (#GP). The
general-protection exception handler calls the virtual-8086 monitor, which can then emulate an
8086-program interrupt handler or pass control back to the 8086 program’s handler, as described
in Section 16.3.1.2, “Handling an Interrupt or Exception With an 8086 Program Interrupt or
Exception Handler”.
Method 3 is enabled when the VME flag is set to 1, the IOPL value is less than 3, and the corresponding
bit for the software interrupt in the software interrupt redirection bit map is set to 1.
Here, the processor performs the same operation as it does for method 2 software interrupt
handling. If the corresponding bit for the software interrupt in the software interrupt redirection
bit map is set to 0, the interrupt is handled using method 6 (see Section 16.3.3.5, “Method 6:
Software Interrupt Handling”).
16.3.3.3 Method 4: Software Interrupt Handling
Method 4 handling is enabled when the VME flag is set to 1, the IOPL value is 3, and the bit for
the interrupt vector in the redirection bit map is set to 1. Method 4 software interrupt handling
allows method 1 style handling when the virtual mode extension is enabled; that is, the interrupt
is directed to a protected-mode handler (see Section 16.3.3.1, “Method 1: Software Interrupt
Handling”).
16.3.3.4 Method 5: Software Interrupt Handling
Method 5 software interrupt handling provides a streamlined method of redirecting software
interrupts (invoked with the INT n instruction) that occur in virtual 8086 mode back to the 8086
program’s interrupt vector table and its interrupt handlers. Method 5 handling is enabled when
the VME flag is set to 1, the IOPL value is 3, and the bit for the interrupt vector in the redirection
bit map is set to 0. The processor performs the following actions to make an implicit call to the
selected 8086 program interrupt handler:
1. Pushes the low-order 16 bits of the EFLAGS register onto the stack.
2. Pushes the current values of the CS and EIP registers onto the current stack. (Only the 16
least-signifi

CHAPTER 15 DEBUGGING AND PERFORMANCE MONITORING

15 Debugging and Performance Monitoring
Vol. 3 15-1
CHAPTER 15 DEBUGGING AND PERFORMANCE MONITORING
The IA-32 architecture provides debug facilities for use in debugging code and monitoring
performance. These facilities are valuable for debugging application software, system software,
and multitasking operating systems. Debug support is accessed using debug registers (DB0
through DB7) and model-specific registers (MSRs):
• The debug registers hold the addresses of memory and I/O locations called breakpoints.
Breakpoints are user-selected locations in a program, a data-storage area in memory, or
specific I/O ports. They are set where a programmer or system designer wishes to halt
execution of a program and examine the state of the processor by invoking debugger
software. A debug exception (#DB) is generated when a memory or I/O access is made to a
breakpoint address.
• The MSRs (which were introduced into the IA-32 architecture in the P6 family processors)
monitor branches, interrupts, and exceptions and record the addresses of the last branch,
interrupt or exception taken and the last branch taken before an interrupt or exception.
15.1 OVERVIEW OF THE DEBUGGING SUPPORT FACILITIES
The following processor facilities support debugging and performance monitoring:
• Debug exception (#DB) — Transfers program control to a debugger procedure or task
when a debug event occurs.
• Breakpoint exception (#BP) — Transfers program control to a debugger procedure or
task when an INT 3 instruction is executed.
• Breakpoint-address registers (DR0 through DR3) — Specifies the addresses of up to 4
breakpoints.
• Debug status register (DR6) — Reports the conditions that were in effect when a debug
or breakpoint exception was generated.
• Debug control register (DR7) — Specifies the forms of memory or I/O access that cause
breakpoints to be generated.
• T (trap) flag, TSS — Generates a debug exception (#DB) when an attempt is made to
switch to a task with the T flag set in its TSS.
• RF (resume) flag, EFLAGS register — Suppresses multiple exceptions to the same
instruction.
• TF (trap) flag, EFLAGS register — Generates a debug exception (#DB) after every
execution of an instruction.
15-2 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
• Breakpoint instruction (INT 3) — Generates a breakpoint exception (#BP), which
transfers program control to the debugger procedure or task. This instruction is an
alternative way to set code breakpoints. It is especially useful when more than four
breakpoints are desired, or when breakpoints are being placed in the source code.
• Last branch recording facilities — See Section 15.5, “Last Branch, Interrupt, and
Exception Recording (Pentium 4 and Intel Xeon Processors)” and Section 15.7, “Last
Branch, Interrupt, and Exception Recording (P6 Family Processors)”.
These facilities allow a debugger to be called as a separate task or as a procedure in the context
of the current program or task. The following conditions can be used to invoke the debugger:
• Task switch to a specific task.
• Execution of the breakpoint instruction.
• Execution of any instruction.
• Execution of an instruction at a specified address.
• Read or write of a byte, word, or doubleword at a specified memory address.
• Write to a byte, word, or doubleword at a specified memory address.
• Input of a byte, word, or doubleword at a specified I/O address.
• Output of a byte, word, or doubleword at a specified I/O address.
• Attempt to change the contents of a debug register.
15.2 DEBUG REGISTERS
The eight debug registers (see Figure 15-1) control the debug operation of the processor. These
registers can be written to and read using the move to or from debug register form of the MOV
instruction. A debug register may be the source or destination operand for one of these instructions.
The debug registers are privileged resources; a MOV instruction that accesses these registers
can only be executed in real-address mode, in SMM, or in protected mode at a CPL of 0. An
attempt to read or write the debug registers from any other privilege level generates a generalprotection
exception (#GP).
The primary function of the debug registers is to set up and monitor from 1 to 4 breakpoints,
numbered 0 though 3. For each breakpoint, the following information can be specified and
detected with the debug registers:
• The linear address where the breakpoint is to occur.
• The length of the breakpoint location (1, 2, or 4 bytes).
• The operation that must be performed at the address for a debug exception to be generated.
• Whether the breakpoint is enabled.
• Whether the breakpoint condition was present when the debug exception was generated.
Vol. 3 15-3
DEBUGGING AND PERFORMANCE MONITORING
The following paragraphs describe the functions of flags and fields in the debug registers.
15.2.1 Debug Address Registers (DR0-DR3)
Each of the debug-address registers (DR0 through DR3) holds the 32-bit linear address of a
breakpoint (see Figure 15-1). Breakpoint comparisons are made before physical address translation
occurs. The contents of debug register DR7 further specifies each breakpoint condition.
Figure 15-1. Debug Registers
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
L DR7
Reserved
0
30 29 28 27 26 25 18 17 10 9 6 5 4 3 2 1
G0
L1
L2
L3
G3
LE
GE
G2
G1
GD
R/W
0
LEN
0
R/W
1
LEN
1
R/W
2
LEN
2
R/W
3
LEN
3
31 16 15 14 13 12 11 8 7 0
DR6 B0
10 9 6 5 4 3 2 1
B1
B2
B3
B
D
BS
BT
31 0
DR5
31 0
DR4
31 0
Breakpoint 3 Linear Address DR3
31 0
Breakpoint 2 Linear Address DR2
31 0
Breakpoint 1 Linear Address DR1
31 0
Breakpoint 0 Linear Address DR0
15-4 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
15.2.2 Debug Registers DR4 and DR5
Debug registers DR4 and DR5 are reserved when debug extensions are enabled (when the DE
flag in control register CR4 is set), and attempts to reference the DR4 and DR5 registers cause
an invalid-opcode exception (#UD) to be generated. When debug extensions are not enabled
(when the DE flag is clear), these registers are aliased to debug registers DR6 and DR7.
15.2.3 Debug Status Register (DR6)
The debug status register (DR6) reports the debug conditions that were sampled at the time the
last debug exception was generated (see Figure 15-1). Updates to this register only occur when
an exception is generated. The flags in this register show the following information:
• B0 through B3 (breakpoint condition detected) flags (bits 0 through 3) — Indicates
(when set) that its associated breakpoint condition was met when a debug exception was
generated. These flags are set if the condition described for each breakpoint by the LENn,
and R/Wn flags in debug control register DR7 is true. They are set even if the breakpoint is
not enabled by the Ln and Gn flags in register DR7.
• BD (debug register access detected) flag (bit 13) — Indicates that the next instruction in
the instruction stream will access one of the debug registers (DR0 through DR7). This flag
is enabled when the GD (general detect) flag in debug control register DR7 is set. See
Section 15.2.4, “Debug Control Register (DR7)”, for further explanation of the purpose of
this flag.
• BS (single step) flag (bit 14) — Indicates (when set) that the debug exception was
triggered by the single-step execution mode (enabled with the TF flag in the EFLAGS
register). The single-step mode is the highest-priority debug exception. When the BS flag
is set, any of the other debug status bits also may be set.
• BT (task switch) flag (bit 15) — Indicates (when set) that the debug exception resulted
from a task switch where the T flag (debug trap flag) in the TSS of the target task was set
(see Section 6.2.1, “Task-State Segment (TSS)”, for the format of a TSS). There is no flag
in debug control register DR7 to enable or disable this exception; the T flag of the TSS is
the only enabling flag.
Certain debug exceptions may clear bits 0-3. The remaining contents of the DR6 register are
never cleared by the processor. To avoid confusion in identifying debug exceptions, debug
handlers should clear the register before returning to the interrupted task.
Vol. 3 15-5
DEBUGGING AND PERFORMANCE MONITORING
15.2.4 Debug Control Register (DR7)
The debug control register (DR7) enables or disables breakpoints and sets breakpoint conditions
(see Figure 15-1). The flags and fields in this register control the following things:
• L0 through L3 (local breakpoint enable) flags (bits 0, 2, 4, and 6) — Enable (when set)
the breakpoint condition for the associated breakpoint for the current task. When a
breakpoint condition is detected and its associated Ln flag is set, a debug exception is
generated. The processor automatically clears these flags on every task switch to avoid
unwanted breakpoint conditions in the new task.
• G0 through G3 (global breakpoint enable) flags (bits 1, 3, 5, and 7) — Enable (when
set) the breakpoint condition for the associated breakpoint for all tasks. When a breakpoint
condition is detected and its associated Gn flag is set, a debug exception is generated. The
processor does not clear these flags on a task switch, allowing a breakpoint to be enabled
for all tasks.
• LE and GE (local and global exact breakpoint enable) flags (bits 8 and 9) — (Not
supported in the P6 family processors and later IA-32 processors.) When set, these flags
cause the processor to detect the exact instruction that caused a data breakpoint condition.
For backward and forward compatibility with other IA-32 processors, Intel recommends
that the LE and GE flags be set to 1 if exact breakpoints are required.
• GD (general detect enable) flag (bit 13) — Enables (when set) debug-register protection,
which causes a debug exception to be generated prior to any MOV instruction that
accesses a debug register. When such a condition is detected, the BD flag in debug status
register DR6 is set prior to generating the exception. This condition is provided to support
in-circuit emulators. (When the emulator needs to access the debug registers, emulator
software can set the GD flag to prevent interference from the program currently executing
on the processor.) The processor clears the GD flag upon entering to the debug exception
handler, to allow the handler access to the debug registers.
• R/W0 through R/W3 (read/write) fields (bits 16, 17, 20, 21, 24, 25, 28, and 29) —
Specifies the breakpoint condition for the corresponding breakpoint. The DE (debug
extensions) flag in control register CR4 determines how the bits in the R/Wn fields are
interpreted. When the DE flag is set, the processor interprets these bits as follows:
00 — Break on instruction execution only.
01 — Break on data writes only.
10 — Break on I/O reads or writes.
11 — Break on data reads or writes but not instruction fetches.
When the DE flag is clear, the processor interprets the R/Wn bits the same as for the
Intel386™ and Intel486™ processors, which is as follows:
00 — Break on instruction execution only.
01 — Break on data writes only.
10 — Undefined.
11 — Break on data reads or writes but not instruction fetches.
15-6 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
• LEN0 through LEN3 (Length) fields (bits 18, 19, 22, 23, 26, 27, 30, and 31) — Specify
the size of the memory location at the address specified in the corresponding breakpoint
address register (DR0 through DR3). These fields are interpreted as follows:
00 — 1-byte length
01 — 2-byte length
10 — Undefined (or 8 byte length, see note below)
11 — 4-byte length
If the corresponding RWn field in register DR7 is 00 (instruction execution), then the LENn
field should also be 00. The effect of using any other length is undefined. See Section 15.2.5,
“Breakpoint Field Recognition”, for further information on the use of these fields.
For Pentium 4 and Intel Xeon processor with CPUID signature corresponding to family 15
(model 3 or 4) the break point condition permit specifying 8 byte length on data read/write with
the encoding 10B in the LENx field. Otherwise, the encoding 10B is undefined for other IA-32
processors.
15.2.5 Breakpoint Field Recognition
The breakpoint address registers (debug registers DR0 through DR3) and the LENn fields for
each breakpoint define a range of sequential byte addresses for a data or I/O breakpoint. The
LENn fields permit specification of a 1-, 2-, 4-, or 8-byte range beginning at the linear address
specified in the corresponding debug register (DRn). Two-byte ranges must be aligned on word
boundaries and 4-byte ranges must be aligned on doubleword boundaries. I/O breakpoint
addresses are zero extended from 16 to 32 bits for purposes of comparison with the breakpoint
address in the selected debug register. These requirements are enforced by the processor; it uses
the LENn field bits to mask the lower address bits in the debug registers. Unaligned data or I/O
breakpoint addresses do not yield the expected results.
A data breakpoint for reading or writing data is triggered if any of the bytes participating in an
access is within the range defined by a breakpoint address register and its LENn field. Table 15-1
gives an example setup of the debug registers and the data accesses that would subsequently trap
or not trap on the breakpoints.
A data breakpoint for an unaligned operand can be constructed using two breakpoints, where
each breakpoint is byte-aligned, and the two breakpoints together cover the operand. These
breakpoints generate exceptions only for the operand, not for any neighboring bytes.
Instruction breakpoint addresses must have a length specification of 1 byte (the LENn field is
set to 00). The behavior of code breakpoints for other operand sizes is undefined. The processor
recognizes an instruction breakpoint address only when it points to the first byte of an instruction.
If the instruction has any prefixes, the breakpoint address must point to the first prefix.
Vol. 3 15-7
DEBUGGING AND PERFORMANCE MONITORING
15.2.6 Debug Registers and Intel EM64T
For IA-32 processors that support Intel EM64T, debug registers DR0–DR7 are 64 bits. In 16-bit
modes or 32-bit modes (including protected mode and compatibility mode), writes to a debug
register fill the upper 32 bits with zeros. Reads from a debug register return the lower 32 bits. In
64-bit mode, MOV DRn instructions read or write all 64 register bits. Operand-size prefixes are
ignored.
In 64-bit mode, the upper 32 bits of DR6 and DR7 are reserved and must be written with zeros.
Writing 1 to any of the upper 32 bits results in a #GP(0) exception.
All 64 bits of DR0–DR3 are writable by software. However, MOV DRn instructions do not
check that addresses written to DR0–DR3 are in the linear-address limits of a processor implementation
(address matching is supported only on valid addresses generated by the processor
implementation). Break point conditions for 8-byte memory read/writes are supported in all
modes (see Section 15.2.4 for applicability of the encoded value for 8-byte length for fields
LEN0 through LEN3).
15.3 DEBUG EXCEPTIONS
The IA-32 processors dedicate two interrupt vectors to handling debug exceptions: vector 1
(debug exception, #DB) and vector 3 (breakpoint exception, #BP). The following sections
describe how these exceptions are generated and typical exception handler operations for
handling these exceptions.
Figure 15-2. DR6 and DR7 Layout on IA-32 Processors Supporting Intel EM64T
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
L DR7
Reserved
0
30 29 28 27 26 25 18 17 10 9 6 5 4 3 2 1
G0
L1
L2
L3G3
LE
GE
G2
G1
GD
R/W
0
LEN
0
R/W
1
LEN
1
R/W
2
LEN
2
R/W
3
LEN
3
31 16 15 14 13 12 11 8 7 0
DR6 B0
10 9 6 5 4 3 2 1
B
1
B2
B3
B
D
BS
BT
63 32
63 32
DR6
DR7
15-8 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
15.3.1 Debug Exception (#DB)—Interrupt Vector 1
The debug-exception handler is usually a debugger program or is part of a larger software
system. The processor generates a debug exception for any of several conditions. The debugger
can check flags in the DR6 and DR7 registers to determine which condition caused the exception
and which other conditions might also apply. Table 15-2 shows the states of these flags
following the generation of each kind of breakpoint condition.
Instruction-breakpoint and general-detect conditions (see Section 15.3.1.3, “General-Detect
Exception Condition”) result in faults; other debug-exception conditions result in traps. The
debug exception may report either or both at one time. The following sections describe each
class of debug exception. See Chapter 5, “Interrupt 1—Debug Exception (#DB)”, for additional
information about this exception.
Table 15-1. Breakpointing Examples
Debug Register Setup
Debug Register R/Wn Breakpoint Address LENn
DR0
DR1
DR2
DR3
R/W0 = 11 (Read/Write)
R/W1 = 01 (Write)
R/W2 = 11 (Read/Write)
R/W3 = 01 (Write)
A0001H
A0002H
B0002H
C0000H
LEN0 = 00 (1 byte)
LEN1 = 00 (1 byte)
LEN2 = 01) (2 bytes)
LEN3 = 11 (4 bytes)
Data Accesses
Operation Address Access Length
(In Bytes)
Data operations that trap
- Read or write
- Read or write
- Write
- Write
- Read or write
- Read or write
- Read or write
- Write
- Write
- Write
A0001H
A0001H
A0002H
A0002H
B0001H
B0002H
B0002H
C0000H
C0001H
C0003H
1212412421 Data operations that do not trap
- Read or write
- Read
- Read or write
- Read or write
- Read
- Read or write
A0000H
A0002H
A0003H
B0000H
C0000H
C0004H
114224
Vol. 3 15-9
DEBUGGING AND PERFORMANCE MONITORING
15.3.1.1 Instruction-Breakpoint Exception Condition
The processor reports an instruction breakpoint when it attempts to execute an instruction at an
address specified in a breakpoint-address register (DB0 through DR3) that has been set up to
detect instruction execution (R/W flag is set to 0). Upon reporting the instruction breakpoint, the
processor generates a fault-class, debug exception (#DB) before it executes the target instruction
for the breakpoint.
Instruction breakpoints are the highest priority debug exceptions. They are serviced before any
other exceptions detected during the decoding or execution of an instruction. Note, however,
that if a code instruction breakpoint is placed on an instruction located immediately after a POP
SS/MOV SS instruction, it may not be triggered. In most situations, POP SS/MOV SS will
inhibit such interrupts (see “MOV-Move” and “POP-Pop a Value from the Stack” in the IA-32
Intel Architecture Software Developer’s Manual, Volumes 2A & 2B).
Because the debug exception for an instruction breakpoint is generated before the instruction is
executed, if the instruction breakpoint is not removed by the exception handler, the processor
will detect the instruction breakpoint again when the instruction is restarted and generate another
debug exception. To prevent looping on an instruction breakpoint, the IA-32 architecture
provides the RF flag (resume flag) in the EFLAGS register (see Section 2.3, “System Flags and
Fields in the EFLAGS Register”). When the RF flag is set, the processor ignores instruction
breakpoints.
All IA-32 processors manage the RF flag as follows. The processor sets the RF flag automatically
prior to calling an exception handler for any fault-class exception except a debug exception
that was generated in response to an instruction breakpoint. For debug exceptions resulting
from instruction breakpoints, the processor does not set the RF flag prior to calling the debug
exception handler. The debug exception handler then has the option of disabling the instruction
Table 15-2. Debug Exception Conditions
Debug or Breakpoint Condition DR6 Flags Tested DR7 Flags Tested Exception Class
Single-step trap BS = 1 Trap
Instruction breakpoint, at addresses
defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 0 Fault
Data write breakpoint, at addresses
defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 1 Trap
I/O read or write breakpoint, at addresses
defined by DRn and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 2 Trap
Data read or write (but not instruction
fetches), at addresses defined by DRn
and LENn
Bn = 1 and
(Gn or Ln = 1)
R/Wn = 3 Trap
General detect fault, resulting from an
attempt to modify debug registers
(usually in conjunction with in-circuit
emulation)
BD = 1 Fault
Task switch BT = 1 Trap
15-10 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
breakpoint or setting the RF flag in the EFLAGS image on the stack. If the RF flag in the
EFLAGS image is set when the processor returns from the exception handler, it is copied into
the RF flag in the EFLAGS register by the IRETD or task switch instruction that causes the
return. The processor then ignores instruction breakpoints for the duration of the next instruction.
(Note that the POPF, POPFD, and IRET instructions do not transfer the RF image into the
EFLAGS register.) Setting the RF flag does not prevent other types of debug-exception conditions
(such as, I/O or data breakpoints) from being detected, nor does it prevent non-debug
exceptions from being generated. After the instruction is successfully executed, the processor
clears the RF flag in the EFLAGS register, except after an IRETD instruction or after a JMP,
CALL, or INT n instruction that causes a task switch.
Note that the processor also does not set the RF flag when calling exception or interrupt handlers
for trap-class exceptions, for hardware interrupts, or for software-generated interrupts.
For the Pentium processor, when an instruction breakpoint coincides with another fault-type
exception (such as a page fault), the processor may generate one spurious debug exception after
the second exception has been handled, even though the debug exception handler set the RF flag
in the EFLAGS image. To prevent this spurious exception with Pentium processors, all faultclass
exception handlers should set the RF flag in the EFLAGS image.
15.3.1.2 Data Memory and I/O Breakpoint Exception Conditions
Data memory and I/O breakpoints are reported when the processor attempts to access a memory
or I/O address specified in a breakpoint-address register (DB0 through DR3) that has been set
up to detect data or I/O accesses (R/W flag is set to 1, 2, or 3). The processor generates the exception
after it executes the instruction that made the access, so these breakpoint condition causes
a trap-class exception to be generated.
Because data breakpoints are traps, the original data is overwritten before the trap exception is
generated. If a debugger needs to save the contents of a write breakpoint location, it should save
the original contents before setting the breakpoint. The handler can report the saved value after
the breakpoint is triggered. The address in the debug registers can be used to locate the new
value stored by the instruction that triggered the breakpoint.
The Intel486 and later IA-32 processors ignore the GE and LE flags in DR7. In the Intel386
processor, exact data breakpoint matching does not occur unless it is enabled by setting the LE
and/or the GE flags.
The P6 family processors, however, are unable to report data breakpoints exactly for the REP
MOVS and REP STOS instructions until the completion of the iteration after the iteration in
which the breakpoint occurred.
For repeated INS and OUTS instructions that generate an I/O-breakpoint debug exception, the
processor generates the exception after the completion of the first iteration. Repeated INS and
OUTS instructions generate an I/O-breakpoint debug exception after the iteration in which the
memory address breakpoint location is accessed.
Vol. 3 15-11
DEBUGGING AND PERFORMANCE MONITORING
15.3.1.3 General-Detect Exception Condition
When the GD flag in DR7 is set, the general-detect debug exception occurs when a program
attempts to access any of the debug registers (DR0 through DR7) at the same time they are being
used by another application, such as an emulator or debugger. This additional protection feature
guarantees full control over the debug registers when required. The debug exception handler can
detect this condition by checking the state of the BD flag of the DR6 register. The processor
generates the exception before it executes the MOV instruction that accesses a debug register,
which causes a fault-class exception to be generated.
15.3.1.4 Single-Step Exception Condition
The processor generates a single-step debug exception if (while an instruction is being executed)
it detects that the TF flag in the EFLAGS register is set. The exception is a trap-class exception,
because the exception is generated after the instruction is executed. (Note that the processor does
not generate this exception after an instruction that sets the TF flag. For example, if the POPF
instruction is used to set the TF flag, a single-step trap does not occur until after the instruction
that follows the POPF instruction.)
The processor clears the TF flag before calling the exception handler. If the TF flag was set in a
TSS at the time of a task switch, the exception occurs after the first instruction is executed in the
new task.
The TF flag normally is not cleared by privilege changes inside a task. The INT n and INTO
instructions, however, do clear this flag. Therefore, software debuggers that single-step code
must recognize and emulate INT n or INTO instructions rather than executing them directly. To
maintain protection, the operating system should check the CPL after any single-step trap to see
if single stepping should continue at the current privilege level.
The interrupt priorities guarantee that, if an external interrupt occurs, single stepping stops.
When both an external interrupt and a single-step interrupt occur together, the single-step interrupt
is processed first. This operation clears the TF flag. After saving the return address or
switching tasks, the external interrupt input is examined before the first instruction of the singlestep
handler executes. If the external interrupt is still pending, then it is serviced. The external
interrupt handler does not run in single-step mode. To single step an interrupt handler, single step
an INT n instruction that calls the interrupt handler.
15.3.1.5 Task-Switch Exception Condition
The processor generates a debug exception after a task switch if the T flag of the new task's TSS
is set. This exception is generated after program control has passed to the new task, and prior to
the execution of the first instruction of that task. The exception handler can detect this condition
by examining the BT flag of the DR6 register.
Note that, if the debug exception handler is a task, the T bit of its TSS should not be set. Failure
to observe this rule will put the processor in a loop.
15-12 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
15.3.2 Breakpoint Exception (#BP)—Interrupt Vector 3
The breakpoint exception (interrupt 3) is caused by execution of an INT 3 instruction (see
Chapter 5, “Interrupt 3—Breakpoint Exception (#BP)”). Debuggers use break exceptions in the
same way that they use the breakpoint registers; that is, as a mechanism for suspending program
execution to examine registers and memory locations. With earlier IA-32 processors, breakpoint
exceptions are used extensively for setting instruction breakpoints.
With the Intel386 and later IA-32 processors, it is more convenient to set breakpoints with the
breakpoint-address registers (DR0 through DR3). However, the breakpoint exception still is
useful for breakpointing debuggers, because the breakpoint exception can call a separate exception
handler. The breakpoint exception is also useful when it is necessary to set more breakpoints
than there are debug registers or when breakpoints are being placed in the source code of a
program under development.
Note that with Pentium M processors, #BPs for fast string operations are reported only on cache
line boundaries.
15.4 LAST BRANCH RECORDING OVERVIEW
The P6 family processors introduced the ability to set breakpoints on taken branches, interrupts,
and exceptions, and to single-step from one branch to the next. This capability was modified and
extended in the Pentium 4 and Intel Xeon processors to allow the logging of branch trace
messages in a branch trace store (BTS) buffer in memory. See the following sections for descriptions
of the mechanism for last branch recording:
— Section 15.5, “Last Branch, Interrupt, and Exception Recording (Pentium 4 and Intel
Xeon Processors)”
— Section 15.6, “Last Branch, Interrupt, and Exception Recording (Pentium M
Processors)”
— Section 15.7, “Last Branch, Interrupt, and Exception Recording (P6 Family
Processors)”
The IA-32 branch instructions that are tracked with the last branch recording mechanism are the
JMP, Jcc, LOOP, and CALL instructions.
15.5 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING
(PENTIUM 4 AND INTEL XEON PROCESSORS)
The Pentium 4 and Intel Xeon processors provide the following methods of recording taken
branches, interrupts and exceptions:
• Store branch records in the last branch record (LBR) stack MSRs for the most recent taken
branches, interrupts, and/or exceptions in MSRs. A branch record consist of a branch-from
and a branch-to instruction address.
Vol. 3 15-13
DEBUGGING AND PERFORMANCE MONITORING
• Send the branch records out on the system bus as branch trace messages (BTMs).
• Log BTMs in a memory-resident branch trace store (BTS) buffer.
To support these functions, the processor provides the following MSRs:
• MSR_DEBUGCTLA MSR — Enables last branch, interrupt, and exception recording;
single-stepping on taken branches; branch trace messages (BTMs); and branch trace store
(BTS). This register is named DebugCtlMSR in the P6 family processors.
• Debug store (DS) feature flag (CPUID.1:EDX.DS[bit 21]) — Indicates that the
processor provides the debug store (DS) mechanism, which allows BTMs to be stored in a
memory-resident BTS buffer.
• CPL-qualified debug store (DS) feature flag (CPUID.1:ECX.DS-CPL[bit 4]) —
Indicates that the processor provides a CPL-qualified debug store (DS) mechanism, which
allows software to selectively skip storing BTMs, according to specified current privilege
level settings, into a memory-resident BTS buffer.
• IA32_MISC_ENABLE MSR — Indicates that the processor provides the BTS facilities.
• Last branch record (LBR) stack — The LBR stack is a circular stack that consists of
four MSRs (MSR_LASTBRANCH_0 through MSR_LASTBRANCH_3) for the
Pentium 4 and Intel Xeon processor family [CPUID family 0FH, models 0H-02H]. The
LBR stack consists of 16 MSR pairs (MSR_LASTBRANCH_0_FROM_LIP through
MSR_LASTBRANCH_15_FROM_LIP and MSR_LASTBRANCH_0_TO_LIP through
MSR_LASTBRANCH_15_TO_LIP) for the Pentium 4 and Intel Xeon processor family
[CPUID family 0FH, model 03H].
• Last branch record top-of-stack (TOS) pointer — The TOS Pointer MSR contains a
2-bit pointer (0-3) to the MSR in the LBR stack that contains the most recent branch,
interrupt, or exception recorded for the Pentium 4 and Intel Xeon processor family
[CPUID family 0FH, models 0H-02H]. This pointer becomes a 4-bit pointer (0-15) for the
Pentium 4 and Intel Xeon processor family [CPUID family 0FH, model 03H]. See also:
Table 15-3, Figure 15-3, and Section 15.5.3.
• Last exception record — See Section 15.5.7, “Last Exception Records (Pentium 4 and
Intel Xeon Processors)”.
15.5.1 CPL-Qualified Last Branch Recording Mechanism
CPL-qualified last branch recording mechanism is available to a subset of IA-32 processors that
support last branch recording mechanism. Software can detect support for CPL-qualified last
branch recording mechanism by executing CPUID with EAX = 1, and examine the returned
value of bit 4 of ECX.
CPL-qualified last branch recording mechanism is similar to that described in Section 15.5,
Section 15.5.2, and Section 15.5.8 It also sends the branch records out on the system bus as
branch trace messages (BTMs). But system software can selectively specify CPL qualification
to not store BTMs associated with the specified privilege level. Two bit fields, BTS_OFF_USR
15-14 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
and BTS_OFF_OS, are provided in the debug control register to specify the CPL of those BTMs
that will not logged in the BTS buffer.
Table 15-3. LBR MSR Stack Structure for the Pentium 4 and Intel Xeon Processor Family
LBR MSRs for Family 0FH, Models 0H-02H;
MSRs at locations 1DBH-1DEH.
Decimal Value of TOS Pointer in
MSR_LASTBRANCH_TOS (bits 0-1)
MSR_LASTBRANCH_0
MSR_LASTBRANCH_1
MSR_LASTBRANCH_2
MSR_LASTBRANCH_3
0123
LBR MSRs for Family 0FH, Models; MSRs at
locations 680H-68FH.
Decimal Value of TOS Pointer in
MSR_LASTBRANCH_TOS (bits 0-3)
MSR_LASTBRANCH_0_FROM_LIP
MSR_LASTBRANCH_1_FROM_LIP
MSR_LASTBRANCH_2_FROM_LIP
MSR_LASTBRANCH_3_FROM_LIP
MSR_LASTBRANCH_4_FROM_LIP
MSR_LASTBRANCH_5_FROM_LIP
MSR_LASTBRANCH_6_FROM_LIP
MSR_LASTBRANCH_7_FROM_LIP
MSR_LASTBRANCH_8_FROM_LIP
MSR_LASTBRANCH_9_FROM_LIP
MSR_LASTBRANCH_10_FROM_LIP
MSR_LASTBRANCH_11_FROM_LIP
MSR_LASTBRANCH_12_FROM_LIP
MSR_LASTBRANCH_13_FROM_LIP
MSR_LASTBRANCH_14_FROM_LIP
MSR_LASTBRANCH_15_FROM_LIP
0123456789
10
11
12
13
14
15
LBR MSRs for Family 0FH, Model 03H; MSRs
at locations 6C0H-6CFH.
MSR_LASTBRANCH_0_TO_LIP
MSR_LASTBRANCH_1_TO_LIP
MSR_LASTBRANCH_2_TO_LIP
MSR_LASTBRANCH_3_TO_LIP
MSR_LASTBRANCH_4_TO_LIP
MSR_LASTBRANCH_5_TO_LIP
MSR_LASTBRANCH_6_TO_LIP
MSR_LASTBRANCH_7_TO_LIP
MSR_LASTBRANCH_8_TO_LIP
MSR_LASTBRANCH_9_TO_LIP
MSR_LASTBRANCH_10_TO_LIP
MSR_LASTBRANCH_11_TO_LIP
MSR_LASTBRANCH_12_TO_LIP
MSR_LASTBRANCH_13_TO_LIP
MSR_LASTBRANCH_14_TO_LIP
MSR_LASTBRANCH_15_TO_LIP
0123456789
10
11
12
13
14
15
Vol. 3 15-15
DEBUGGING AND PERFORMANCE MONITORING
NOTE
The initial implementation of BTS_OFF_USR and BTS_OFF_OS in
MSR_DEBUGCTLA is shown in Figure 15-4. The BTS_OFF_USR and
BTS_OFF_OS fields may be implemented on other model-specific debug
control register at different locations.
The following sections describe the MSR_DEBUGCTLA MSR and the various last branch
recording mechanisms. See Appendix B, Model-Specific Registers (MSRs), for a detailed
description of each of the last branch recording MSRs described above.
15.5.2 MSR_DEBUGCTLA MSR (Pentium 4 and Intel Xeon
Processors)
The MSR_DEBUGCTLA MSR enables and disables the various last branch recording mechanisms
described in the previous section. This register can be written to using the WRMSR
instruction, when operating at privilege level 0 or when in real-address mode. A protected-mode
operating system procedure is required to provide user access to this register. Figure 15-4 shows
the flags in the MSR_DEBUGCTLA MSR. The functions of these flags are as follows:
• LBR (last branch/interrupt/exception) flag (bit 0) — When set, the processor records a
running trace of the most recent branches, interrupts, and/or exceptions taken by the
processor (prior to a debug exception being generated) in the last branch record (LBR)
stack. Each branch, interrupt, or exception is recorded as a 64-bit branch record (see
Section 15.5.3, “LBR Stack (Pentium 4 and Intel Xeon Processors)”). The processor clears
this flag whenever a debug exception is generated (for example, when an instruction or
data breakpoint or a single-step trap occurs).
Figure 15-3. MSR_LASTBRANCH_TOS MSR Layout for the Pentium 4
and Intel Xeon Processor Family
31
Family 0FH, Models 01-02H
Reserved
Top-of-stack pointer (TOS)
31
Family 0FH, Model 03H+
Reserved
Top-of-stack pointer (TOS)
3 0
1 0
15-16 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
• BTF (single-step on branches) flag (bit 1) — When set, the processor treats the TF flag
in the EFLAGS register as a “single-step on branches” flag rather than a “single-step on
instructions” flag. This mechanism allows single-stepping the processor on taken
branches, interrupts, and exceptions. See Section 15.5.5, “Single-Stepping on Branches,
Exceptions, and Interrupts” for more information about the BTF flag.
• TR (trace message enable) flag (bit 2) — When set, branch trace messages are enabled.
Thereafter, when the processor detects a taken branch, interrupt, or exception, it sends the
branch record out on the system bus as a branch trace message (BTM). See Section 15.5.6,
“Branch Trace Messages” for more information about the TR flag.
• BTS (branch trace store) flag (bit 3) — When set, enables the BTS facilities to log BTMs
to a memory-resident BTS buffer that is part of the DS save area (see Section 15.10.5, “DS
Save Area”).
• BTINT (branch trace interrupt) flag (bits 4) — When set, the BTS facilities generate an
interrupt when the BTS buffer is full. When clear, BTMs are logged to the BTS buffer in a
circular fashion. (See Section 15.5.8, “Branch Trace Store (BTS)” for a description of this
mechanism.)
• BTS_OFF_OS (disable ring 0 branch trace store) flag (bit 5) — When set, enables the
BTS facilities to skip logging CPL_0 BTMs to the memory-resident BTS buffer (see
Section 15.5.1, “CPL-Qualified Last Branch Recording Mechanism”).
• BTS_OFF_USR (disable ring 0 branch trace store) flag (bit 6) — When set, enables the
BTS facilities to skip logging non-CPL_0 BTMs to the memory-resident BTS buffer (see
Section 15.5.1, “CPL-Qualified Last Branch Recording Mechanism”).
15.5.3 LBR Stack (Pentium 4 and Intel Xeon Processors)
The LBR stack is made up of LBR MSRs that are treated by the processor as a circular stack.
The TOS pointer (MSR_LASTBRANCH_TOS MSR) points to the LBR MSR (or LBR MSR
pair) that contains the most recent (last) branch record placed on the stack. Prior to placing a new
Figure 15-4. MSR_DEBUGCTLA MSR for Pentium 4 and Intel Xeon Processors
31
TR — Trace messages enable
BTINT — Branch trace interrupt
BTF — Single-step on branches
LBR — Last branch/interrupt/exception
5 4 3 2 1 0
BTS — Branch trace store
Reserved
7 6
BTS_OFF_OS — Disable storing CPL_0 BTS
BTS_OFF_USR — Disable storing non-CPL_0 BTS
Vol. 3 15-17
DEBUGGING AND PERFORMANCE MONITORING
branch record on the stack, the TOS is incremented by 1. When the TOS pointer reaches it
maximum value, it wraps around to 0. See Table 15-3 and Figure 15-3.
The registers in the LBR MSR stack and the MSR_LASTBRANCH_TOS MSR are read-only
and can be read using the RDMSR instruction.
Figure 15-5 shows the layout of a branch record in an LBR MSR (or MSR pair). Each branch
record consists of two linear addresses, which represent the “from” and “to” instruction pointers
for a branch, interrupt, or exception. The contents of the from and to addresses differ, depending
on the source of the branch:
• Taken branch — If the record is for a taken branch, the “from” address is the address of
the branch instruction and the “to” address is the target instruction of the branch.
• Interrupt — If the record is for an interrupt, the “from” address the return instruction
pointer (RIP) saved for the interrupt and the “to” address is the address of the first
instruction in the interrupt handler routine. The RIP is the linear address of the next
instruction to be executed upon returning from the interrupt handler.
• Exception — If the record is for an exception, the “from” address is the linear address of
the instruction that caused the exception to be generated and the “to” address is the address
of the first instruction in the exception handler routine.
Additional information is saved if an exception or interrupt occurs in conjunction with a branch
instruction. If a branch instruction generates a trap type exception, two branch records are stored
in the LBR stack: a branch record for the branch instruction followed by a branch record for the
exception.
Figure 15-5. LBR MSR Branch Record Layout for the Pentium 4
and Intel Xeon Processor Family
63
From Linear Address
0
To Linear Address
63
From Linear Address
0
63 0
To Linear Address
32 - 31
MSR_LASTBRANCH_0 through MSR_LASTBRANCH_3
CPUID Family 0FH, Models 0H-02H
Reserved
CPUID Family 0FH, Model 03H-04H
Reserved
MSR_LASTBRANCH_0_FROM_LIP through MSR_LASTBRANCH_15_FROM_LIP
32 - 31
32 - 31
MSR_LASTBRANCH_0_TO_LIP through MSR_LASTBRANCH_15_TO_LIP
15-18 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
If a branch instruction generates a fault type exception, a branch record is stored in the LBR
stack for the exception, but not for the branch instruction itself. Here, the location of the branch
instruction can be determined from the CS and EIP registers in the exception stack frame that is
written by the processor onto the stack.
If a branch instruction is immediately followed by an interrupt, a branch record is stored in the
LBR stack for the branch instruction followed by a record for the interrupt.
15.5.3.1 LBR Stack and Intel EM64T
For IA-32 processors that support Intel EM64T, the LBR MSRs are 64-bits. If IA-32e mode is
disabled, only the lower 32-bits are accessible. If IA-32e mode is enabled, the processor writes
64-bit values into the MSR. In 64-bit mode, last branch records stores 64-bit addresses; in
compatibility mode, the upper 32-bits of last branch records are cleared.
15.5.4 Monitoring Branches, Exceptions, and Interrupts
(Pentium 4 and Intel Xeon Processors)
When the LBR flag in the MSR_DEBUGCTLA MSR is set, the processor automatically begins
recording branch records for taken branches, interrupts, and exceptions (except for debug exceptions)
in the LBR stack MSRs.
When the processor generates a a debug exception (#DB), it automatically clears the LBR flag
before executing the exception handler. This action does not clear previously stored LBR stack
MSRs. The branch record for the last four taken branches, interrupts and/or exceptions are
retained for analysis.
A debugger can use the linear addresses in the LBR stack to reset breakpoints in the break-point
address registers (DR0 through DR3). This allows a backward trace from the manifestation of a
articular bug toward its source.
If the LBR flag is cleared and TR flag in the MSR_DEBUGCTLA MSR remains set, the
processor will continue to update LBR stack MSRs. This is because BTM information must be
generated from entries in the LBR stack (see 14.5.5). A #DB does not automatically clear the
TR flag.
15.5.5 Single-Stepping on Branches, Exceptions, and Interrupts
When software sets both the BTF flag in the MSR_DEBUGCTLA MSR and the TF flag in the
EFLAGS register, the processor generates a single-step debug exception the next time it takes a
branch, services an interrupt, or generates an exception. This mechanism allows the debugger to
single-step on control transfers caused by branches, interrupts, and exceptions. This “controlflow
single stepping” helps isolate a bug to a particular block of code before instruction singlestepping
further narrows the search. If the BTF flag is set when the processor generates a debug
exception, the processor clears the BTF flag along with the TF flag. The debugger must reset the
BTF and TF flags before resuming program execution to continue control-flow single stepping.
Vol. 3 15-19
DEBUGGING AND PERFORMANCE MONITORING
15.5.6 Branch Trace Messages
Setting The TR flag in the MSR_DEBUGCTLA MSR enables branch trace messages (BTMs).
Thereafter, when the processor detects a branch, exception, or interrupt, it sends a branch record
out on the system bus as a BTM. A debugging device that is monitoring the system bus can read
these messages and synchronize operations with taken branch, interrupt, and exception events.
When interrupts or exceptions occur in conjunction with a taken branch, additional BTMs are
sent out on the bus, as described in Section 15.5.4, “Monitoring Branches, Exceptions, and Interrupts
(Pentium 4 and Intel Xeon Processors)”.
Setting this flag (BTS) alone will greatly reduces the performance of the processor. CPL-qualified
last branch recording mechanism (See Section 15.5.1) can help mitigate the performance impact
of logging branch trace messages.
Unlike the P6 family processors, the Pentium 4 and Intel Xeon processors can collect branch
records in the LBR stack MSRs while at the same time sending BTMs out on the system bus
when both the TR and LBR flags are set in the MSR_DEBUGCTLA MSR.
15.5.7 Last Exception Records (Pentium 4 and Intel Xeon
Processors)
The Pentium 4 and Intel Xeon processors provide two 32 bit MSRs (the MSR_LER_TO_LIP
and the MSR_LER_FROM_LIP MSRs) that duplicate the functions of the LastExceptionToIP
and LastExceptionFromIP MSRs found in the P6 family processors. The MSR_LER_TO_LIP
and MSR_LER_FROM_LIP MSRs contain a branch record for the last branch that the processor
took prior to an exception or interrupt being generated.
15.5.7.1 Last Exception Records and Intel EM64T
For IA-32 processors that support Intel EM64T, the MSRs that store last exception records are
64-bits. If IA-32e mode is disabled, only the lower 32-bits are accessible. If IA-32e mode is
enabled, the processor writes 64-bit values into the MSR. In 64-bit mode, last exception records
stores 64-bit addresses; in compatibility mode, the upper 32-bits of last exception records are
cleared.
15.5.8 Branch Trace Store (BTS)
A trace of taken branches, interrupts, and exceptions is useful for debugging code by providing
a method of determining the decision path taken to reach a particular code location. The Pentium
4 and Intel Xeon processors provide a mechanism for capturing records of taken branches, interrupts,
and exceptions and saving them in the last branch record (LBR) stack MSRs and/or
sending them out onto the system bus as BTMs. The branch trace store (BTS) mechanism
provides the additional capability of saving the branch records in a memory-resident BTS buffer,
which is part of the DS save area (see Section 15.10.5, “DS Save Area”). The BTS buffer can
be configured to be circular so that the most recent branch records are always available or it can
15-20 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
be configured to generate an interrupt when the buffer is nearly full so that all the branch records
can be saved.
15.5.8.1 Detection of the BTS Facilities
The DS feature flag (bit 21) returned by the CPUID instruction indicates (when set) the availability
of the DS mechanism in the processor, which supports the BTS (and PEBS) facilities.
When this bit is set, the following BTS facilities are available:
• The BTS_UNAVAILABLE flag in the IA32_MISC_ENABLE MSR indicates (when
clear) the availability of the BTS facilities, including the ability to set the BTS and BTINT
bits in the MSR_DEBUGCTLA MSR.
• The IA32_DS_AREA MSR can be programmed to point to the DS save area.
15.5.8.2 Setting Up the DS Save Area
To save branch records with the BTS buffer, the DS save area must first be set up in memory as
described in the following procedure. See Section 15.5.8.3, “Setting Up the BTS Buffer” and
Section 15.10.8.3, “Setting Up the PEBS Buffer” for instructions for setting up a BTS buffer
and/or a PEBS buffer, respectively, in the DS save area:
1. Create the DS buffer management information area in memory (see Section 15.10.5, for
layout information and Section 15.10.5.1). See additional notes in this section.
2. Write the base linear address of the DS buffer management area into the IA32_DS_AREA
MSR.
3. Set up the performance counter entry in the xAPIC LVT for fixed delivery and edge
sensitive. See Section 8.5.1, “Local Vector Table”.
4. Establish an interrupt handler in the IDT for the vector associated with the performance
counter entry in the xAPIC LVT.
5. Write an interrupt service routine to handle the interrupt (see Section 15.5.8.5, “Writing the
DS Interrupt Service Routine”).
The following restrictions should be applied to the DS save area.
• The three DS save area sections should be allocated from a non-paged pool, and marked
accessed and dirty. It is the responsibility of the operating system to keep the pages that
contain the buffer present and to mark them accessed and dirty. The implication is that the
operating system cannot do “lazy” page-table entry propagation for these pages.
• The DS save area can be larger than a page, but the pages must be mapped to contiguous
linear addresses. The buffer may share a page, so it need not be aligned on a 4-KByte
boundary. For performance reasons, the base of the buffer must be aligned on a
doubleword boundary and should be aligned on a cache line boundary.
• It is recommended that the buffer size for the BTS buffer and the PEBS buffer be an
integer multiple of the corresponding record sizes.
Vol. 3 15-21
DEBUGGING AND PERFORMANCE MONITORING
• The precise event records buffer should be large enough to hold the number of precise
event records that can occur while waiting for the interrupt to be serviced.
• The DS save area should be in kernel space. It must not be on the same page as code, to
avoid triggering self-modifying code actions.
• There are no memory type restrictions on the buffers, although it is recommended that the
buffers be designated as WB memory type for performance considerations.
• Either the system must be prevented from entering A20M mode while DS save area is
active, or bit 20 of all addresses within buffer bounds must be 0.
• Pages that contain buffers must be mapped to the same physical addresses for all
processes, such that any change to control register CR3 will not change the DS addresses.
• The DS save area is expected to used only on systems with an enabled APIC. The LVT
Performance Counter entry in the APCI must be initialized to use an interrupt gate instead
of the trap gate.
15.5.8.3 Setting Up the BTS Buffer
Three flags in the MSR_DEBUGCTLA MSR (see Table 15-4) control the generation of branch
records and storing of them in the BTS buffer: TR, BTS, and BTINT. The TR flag enables the
generation of BTMs. The BTS flag determines whether the BTMs are sent out on the system bus
(clear) or stored in the BTS buffer (set). BTMs cannot be simultaneously sent to the system bus
and logged in the BTS buffer. The BTINT flag enables the generation of an interrupt when the
BTS buffer is full. When this flag is clear, the BTS buffer is a circular buffer.
The following procedure describes how to set up a Pentium 4 or Intel Xeon processor to collect
branch records in the BTS buffer in the DS save area:
1. Place values in the BTS buffer base, BTS index, BTS absolute maximum, and BTS
interrupt threshold fields of the DS buffer management area to set up the BTS buffer in
memory.
2. Set the TR and BTS flags in the MSR_DEBUGCTLA MSR.
3. Either clear the BTINT flag in the MSR_DEBUGCTLA MSR (to set up a circular BTS
buffer) or set the BTINT flag (to generate an interrupt when the BTS buffer is nearly full).
Table 15-4. MSR_DEBUGCTLA MSR Flag Encodings
TR BTS BTINT Description
0 X X Branch trace messages (BTMs) off
1 0 X Generate BTMs
1 1 0 Store BTMs in the BTS buffer, used here as a circular buffer
1 1 1 Store BTMs in the BTS buffer, and generate an interrupt when the
buffer is nearly full
15-22 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
15.5.8.4 Setting Up CPL-Qualified BTS
If the processor supports CPL-qualified last branch recording mechanism, the generation of
branch records and storing of them in the BTS buffer are determined by: TR, BTS,
BTS_OFF_OS, BTS_OFF_USR, and BTINT. The encoding of these five bits are shown in
Table 15-5.
15.5.8.5 Writing the DS Interrupt Service Routine
The BTS, non-precise event-based sampling, and PEBS facilities share the same interrupt vector
and interrupt service routine (called the debug store interrupt service routine or DS ISR). To
handle BTS, non-precise event-based sampling, and PEBS interrupts: separate handler routines
must be included in the DS ISR. Use the following guidelines when writing a DS ISR to handle
BTS, non-precise event-based sampling, and/or PEBS interrupts.
• The DS interrupt service routine (ISR) must be part of a kernel driver and operate at a
current privilege level of 0 to secure the buffer storage area.
• Because the BTS, non-precise event-based sampling, and PEBS facilities share the same
interrupt vector, the DS ISR must check for all the possible causes of interrupts from these
facilities and pass control on to the appropriate handler.
BTS and PEBS buffer overflow would be the sources of the interrupt if the buffer index
matches/exceeds the interrupt threshold specified. Detection of non-precise event-based
sampling as the source of the interrupt is accomplished by checking for counter overflow.
• There must be separate save areas, buffers, and state for each processor in an MP system.
• Upon entering the ISR, branch trace messages and PEBS should be disabled to prevent
race conditions during access to the DS save area. This is done by clearing TR flag in the
Table 15-5. CPL-Qualified Branch Trace Store Encodings
TR BTS BTS_OFF_
OS
BTS_OFF_
USR
BTINT Description
0 X X X X Branch trace messages (BTMs) off
1 0 X X X Generate BTM but does not store BTMs
1 1 0 0 0 Store all BTMs in the BTS buffer, used here as a
circular buffer
1 1 1 0 0 Store BTMs with CPL > 0 in the BTS buffer
1 1 0 1 0 Store BTMs with CPL =0 in the BTS buffer
1 1 1 1 X Generate BTM but does not store BTMs
1 1 0 0 1 Store all BTMs in the BTS buffer; generate an
interrupt when the buffer is nearly full
1 1 1 0 1 Store BTMs with CPL > 0 in the BTS buffer;
generate an interrupt when the buffer is nearly full
1 1 0 1 1 Store BTMs with CPL = 0 in the BTS buffer;
generate an interrupt when the buffer is nearly full
Vol. 3 15-23
DEBUGGING AND PERFORMANCE MONITORING
MSR_DEBUGCTLA MSR and by clearing the precise event enable flag in the
IA32_PEBS_ENABLE MSR. These settings should be restored to their original values
when exiting the ISR.
• The processor will not disable the DS save area when the buffer is full and the circular
mode has not been selected. The current DS setting must be retained and restored by the
ISR on exit.
• After reading the data in the appropriate buffer, up to but not including the current index
into the buffer, the ISR must reset the buffer index to the beginning of the buffer.
Otherwise, everything up to the index will look like new entries upon the next invocation
of the ISR.
• The ISR must clear the mask bit in the performance counter LVT entry.
• The ISR must re-enable the CCCR's ENABLE bit if it is servicing an overflow PMI due to
PEBS.
• The Pentium 4 Processor and Intel Xeon Processor mask PMIs upon receiving an interrupt.
Clear this condition before leaving the interrupt handler.
15.6 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (PENTIUM M PROCESSORS)
Like the Pentium 4 and Intel Xeon processor family, Pentium M processors provide last branch
interrupt and exception recording. The capability operates almost identically to that found in
Pentium 4 and Intel Xeon processors. There are differences in the shape of the stack and in some
MSR names and locations. Note the following:
• MSR_DEBUGCTLB MSR — Enables debug trace interrupt, debug trace store, trace
messages enable, performance monitoring breakpoint flags, single stepping on branches,
and last branch. For Pentium M processors, this MSR is located at register address 01D9H.
See Figure 15-6 and the entries below for a description of the flags.
— LBR (last branch/interrupt/exception) flag (bit 0) — When set, the processor
records a running trace of the most recent branches, interrupts, and/or exceptions
taken by the processor (prior to a debug exception being generated) in the last
branch record (LBR) stack. For more information, see the “Last Branch Record
(LBR) Stack” bullet below.
— BTF (single-step on branches) flag (bit 1) — When set, the processor treats the
TF flag in the EFLAGS register as a “single-step on branches” flag rather than a
“single-step on instructions” flag. This mechanism allows single-stepping the
processor on taken branches, interrupts, and exceptions. See Section 15.5.5,
“Single-Stepping on Branches, Exceptions, and Interrupts” for more information
about the BTF flag.
— PBi (performance monitoring/breakpoint pins) flags (bits 5-2) — When these
flags are set, the performance monitoring/breakpoint pins on the processor (BP0#,
BP1#, BP2#, and BP3#) report breakpoint matches in the corresponding
15-24 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
breakpoint-address registers (DR0 through DR3). The processor asserts then
deasserts the corresponding BPi# pin when a breakpoint match occurs. When a
PBi flag is clear, the performance monitoring/breakpoint pins report performance
events. Processor execution is not affected by reporting performance events.
— TR (trace message enable) flag (bit 6) — When set, branch trace messages are
enabled. When the processor detects a taken branch, interrupt, or exception, it
sends the branch record out on the system bus as a branch trace message (BTM).
See Section 15.5.6, “Branch Trace Messages” for more information about the TR
flag.
— BTS (branch trace store) flag (bit 7) — When set, enables the BTS facilities to
log BTMs to a memory-resident BTS buffer that is part of the DS save area. See
Section 15.10.5, “DS Save Area”.
— BTINT (branch trace interrupt) flag (bits 8) — When set, the BTS facilities
generate an interrupt when the BTS buffer is full. When clear, BTMs are logged to
the BTS buffer in a circular fashion. See Section 15.5.8, “Branch Trace Store (BTS)”
for a description of this mechanism.
• Debug store (DS) feature flag (bit 21), returned by the CPUID instruction — Indicates
that the processor provides the debug store (DS) mechanism, which allows BTMs to be
stored in a memory-resident BTS buffer. See also: Section 15.5.8, “Branch Trace Store
(BTS)”.
• Last Branch Record (LBR) Stack — The LBR stack consists of 8 MSRs
(MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7); bits 31-0 hold the ‘from’
address, bits 63-32 hold the ‘to’ address. For Pentium M Processors, these pairs are located
at register addresses 040H-047H. See Figure 15-7.
• Last Branch Record Top-of-Stack (TOS) Pointer — The TOS Pointer MSR contains a
3-bit pointer (bits 2-0) to the MSR in the LBR stack that contains the most recent branch,
interrupt, or exception recorded. For Pentium M Processors, this MSR is located at register
address 01C9H.
Figure 15-6. MSR_DEBUGCTLB MSR for Pentium M Processors
31
TR — Trace messages enable
BTINT — Branch trace interrupt
BTF — Single-step on branches
LBR — Last branch/interrupt/exception
Reserved
8 7 6 5 4 3 2 1 0
BTS — Branch trace store
PB3/2/1/0 — Performance monitoring breakpoint flags
Vol. 3 15-25
DEBUGGING AND PERFORMANCE MONITORING
For compatibility, the Pentium M processor provides two 32-bit MSRs (the
MSR_LER_TO_LIP and the MSR_LER_FROM_LIP MSRs) that duplicate the functions of the
LastExceptionToIP and LastExceptionFromIP MSRs found in P6 family processors.
For more detail on these capabilities, see Section 15.5, “Last Branch, Interrupt, and Exception
Recording (Pentium 4 and Intel Xeon Processors)” and Section B.2, “MSRs In the Pentium M
Processor”.
15.7 LAST BRANCH, INTERRUPT, AND EXCEPTION
RECORDING (P6 FAMILY PROCESSORS)
The P6 family processors provide five MSRs for recording the last branch, interrupt, or exception
taken by the processor: DebugCtlMSR, LastBranchToIP, LastBranchFromIP, LastExceptionToIP,
and LastExceptionFromIP. These registers can be used to collect last branch records, to
set breakpoints on branches, interrupts, and exceptions, and to single-step from one branch to
the next.
See Appendix B, Model-Specific Registers (MSRs), for a detailed description of each of the last
branch recording MSRs described above.
15.7.1 DebugCtlMSR Register (P6 Family Processors)
The version of the DebugCtlMSR register found in the P6 family processors enables last branch,
interrupt, and exception recording; taken branch breakpoints; the breakpoint reporting pins; and
trace messages. This register can be written to using the WRMSR instruction, when operating
at privilege level 0 or when in real-address mode. A protected-mode operating system procedure
is required to provide user access to this register. Figure 15-8 shows the flags in the
DebugCtlMSR register for the P6 family processors. The functions of these flags are as follows:
• LBR (last branch/interrupt/exception) flag (bit 0) — When set, the processor records
the source and target addresses (in the LastBranchToIP, LastBranchFromIP, LastExceptionToIP,
and LastExceptionFromIP MSRs) for the last branch and the last exception or
interrupt taken by the processor prior to a debug exception being generated. The processor
clears this flag whenever a debug exception, such as an instruction or data breakpoint or
single-step trap occurs.
Figure 15-7. LBR Branch Record Layout for the Pentium M Processor
63 0
To Linear Address From Linear Address
32 - 31
MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7
15-26 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
• BTF (single-step on branches) flag (bit 1) — When set, the processor treats the TF flag
in the EFLAGS register as a “single-step on branches” flag (see Section 15.5.5, “Single-
Stepping on Branches, Exceptions, and Interrupts”).
• PBi (performance monitoring/breakpoint pins) flags (bits 2 through 5) — When these
flags are set, the performance monitoring/breakpoint pins on the processor (BP0#, BP1#,
BP2#, and BP3#) report breakpoint matches in the corresponding breakpoint-address
registers (DR0 through DR3). The processor asserts then deasserts the corresponding BPi#
pin when a breakpoint match occurs. When a PBi flag is clear, the performance
monitoring/breakpoint pins report performance events. Processor execution is not affected
by reporting performance events.
• TR (trace message enable) flag (bit 6) — When set, trace messages are enabled as
described in Section 15.5.6, “Branch Trace Messages”. Setting this flag greatly reduces the
performance of the processor. When trace messages are enabled, the values stored in the
LastBranchToIP, LastBranchFromIP, LastExceptionToIP, and LastExceptionFromIP MSRs
are undefined.
15.7.2 Last Branch and Last Exception MSRs (P6 Family
Processors)
The LastBranchToIP and LastBranchFromIP MSRs are 32-bit registers for recording the
instruction pointers for the last branch, interrupt, or exception that the processor took prior to a
debug exception being generated. When a branch occurs, the processor loads the address of the
branch instruction into the LastBranchFromIP MSR and loads the target address for the branch
into the LastBranchToIP MSR.
When an interrupt or exception occurs (other than a debug exception), the address of the instruction
that was interrupted by the exception or interrupt is loaded into the LastBranchFromIP MSR
and the address of the exception or interrupt handler that is called is loaded into the LastBranch-
ToIP MSR.
The LastExceptionToIP and LastExceptionFromIP MSRs (also 32-bit registers) record the
instruction pointers for the last branch that the processor took prior to an exception or interrupt
Figure 15-8. DebugCtlMSR Register (P6 Family Processors)
31
TR — Trace messages enable
PBi — Performance monitoring/breakpoint pins
BTF — Single-step on branches
LBR — Last branch/interrupt/exception
7 6 5 4 3 2 1 0
P
B2
PB1
P
B0
BT
F
T
R
LBR
P
B3
Reserved
Vol. 3 15-27
DEBUGGING AND PERFORMANCE MONITORING
being generated. When an exception or interrupt occurs, the contents of the LastBranchToIP and
LastBranchFromIP MSRs are copied into these registers before the to and from addresses of the
exception or interrupt are recorded in the LastBranchToIP and LastBranchFromIP MSRs.
These registers can be read using the RDMSR instruction.
Note that the values stored in the LastBranchToIP, LastBranchFromIP, LastExceptionToIP, and
LastExceptionFromIP MSRs are offsets into the current code segment, as opposed to linear
addresses, which are saved in last branch records for the Pentium 4 and Intel Xeon processors.
15.7.3 Monitoring Branches, Exceptions, and Interrupts (P6
Family Processors)
When the LBR flag in the DebugCtlMSR register is set, the processor automatically begins
recording branches that it takes, exceptions that are generated (except for debug exceptions), and
interrupts that are serviced. Each time a branch, exception, or interrupt occurs, the

CHAPTER 14 MACHINE-CHECK ARCHITECTURE

14 Machine-Check Architecture

Vol. 3 14-1
CHAPTER 14 MACHINE-CHECK ARCHITECTURE
This chapter describes the machine-check architecture and machine-check exception mechanism
found in the Pentium 4, Intel Xeon, and P6 family processors. See Chapter 5, “Interrupt
18—Machine-Check Exception (#MC)”, for more information on machine-check exceptions. A
brief description of the Pentium processor’s machine check capability is also given.
14.1 MACHINE-CHECK EXCEPTIONS AND ARCHITECTURE
The Pentium 4, Intel Xeon, and P6 family processors implement a machine-check architecture
that provides a mechanism for detecting and reporting hardware (machine) errors, such as:
system bus errors, ECC errors, parity errors, cache errors, and TLB errors. It consists of a set of
model-specific registers (MSRs) that are used to set up machine checking and additional banks
of MSRs used for recording errors that are detected.
The processor signals the detection of a machine-check error by generating a machine-check
exception (#MC), which is an abort class exception. The implementation of the machine-check
architecture does not ordinarily permit the processor to be restarted reliably after generating a
machine-check exception. However, the machine-check-exception handler can collect information
about the machine-check error from the machine-check MSRs.
14.2 COMPATIBILITY WITH PENTIUM PROCESSOR
The Pentium 4, Intel Xeon, and P6 family processors support and extend the machine-check
exception mechanism introduced in the Pentium processor. The Pentium processor reports the
following machine-check errors:
• data parity errors during read cycles
• unsuccessful completion of a bus cycle
The above errors are reported using the P5_MC_TYPE and P5_MC_ADDR MSRs (implementation
specific for the Pentium processor). Use the RDMSR instruction to read these MSRs. See
Table B-5 for the addresses.
The machine-check error reporting mechanism that Pentium processors use is similar to that
used in Pentium 4, Intel Xeon, and P6 family processors. When an error is detected, it is
recorded in P5_MC_TYPE and P5_MC_ADDR; the processor then generates a machine-check
exception (#MC).
See Section 14.3.3, Mapping of the Pentium Processor Machine-Check Errors to the Machine-
Check Architecture, and Section 14.7.3, Pentium Processor Machine-Check Exception
Handling, for information on compatibility between machine-check code written to run on the
Pentium processors and code written to run on P6 family processors.
14-2 Vol. 3
MACHINE-CHECK ARCHITECTURE
14.3 MACHINE-CHECK MSRS
Machine check MSRs in the Pentium 4, Intel Xeon, and P6 family processors consist of a set of
global control and status registers and several error-reporting register banks (see Figure 14-1).
Each error-reporting bank is associated with a specific hardware unit (or group of hardware
units) in the processor. Use RDMSR and WRMSR to read and to write these registers.
14.3.1 Machine-Check Global Control MSRs
The machine-check global control MSRs include the IA32_MCG_CAP, IA32_MCG_STATUS,
and IA32_MCG_CTL. See Appendix B, Model-Specific Registers (MSRs), for the addresses of
these registers.
The structure of the IA32_MCG_CAP is implemented differently in Pentium 4 and Intel Xeon
processors and in P6 family processors. Also, note that the register names used for P6 family
processors do not have the ‘IA32’ prefix.
14.3.1.1 IA32_MCG_CAP MSR (PENTIUM 4 AND INTEL XEON PROCESSORS)
The IA32_MCG_CAP MSR is a read-only register that provides information about the
machine-check architecture implementation in Pentium 4 and Intel Xeon processors (see
Figure 14-2).
Figure 14-1. Machine-Check MSRs
0
63 0
63
IA32_MCG_CAP MSR
IA32_MCG_STATUS MSR
Error-Reporting Bank Registers
0
63 0
63
IA32_MCi_CTL MSR
IA32_MCi_STATUS MSR
0
63 0
63
IA32_MCi_ADDR MSR
IA32_MCi_MISC MSR
Global Control MSRs
(One Set for Each Hardware Unit)
63 0
IA32_MCG_CTL MSR
Vol. 3 14-3
MACHINE-CHECK ARCHITECTURE
Where:
• Count field, bits 0 through 7 — Indicates the number of hardware unit error-reporting
banks available in a particular processor implementation.
• MCG_CTL_P (control MSR present) flag, bit 8 — Indicates that the processor
implements the IA32_MCG_CTL MSR when set; this register is absent when clear.
• MCG_EXT_P (extended MSRs present) flag, bit 9 — Indicates that the processor
implements the extended machine-check state registers found starting at MSR address
180H; these registers are absent when clear. This is a feature was introduced in the Pentium
4 and Intel Xeon processors.
• MCG_EXT_CNT, bits 16 through 23 — Indicates the number of extended machinecheck
state registers present. This field is meaningful only when the MCG_EXT_P flag is
set.
Bits 10 through 15 and 24 through 63 are reserved. The effect of writing to the
IA32_MCG_CAP register is undefined.
14.3.1.2 MCG_CAP MSR (P6 FAMILY PROCESSORS)
The MCG_CAP MSR is a read-only register that provides information about the machine-check
architecture implementation in P6 family processors (see Figure 14-3).
Figure 14-2. IA32_MCG_CAP Register
Figure 14-3. MCG_CAP Register
MCG_CTL_P
63 0
Reserved
7
Count
24 23 16 15 10 9 8
MCG_EXT_P
MCG_EXT_CNT
Reserved
Count—Number of reporting banks
MCG_CTL_P—MCG_CTL register present
63 0
Reserved
7
Count
9 8
14-4 Vol. 3
MACHINE-CHECK ARCHITECTURE
Where:
• Count field, bits 0 through 7 — Indicates the number of hardware unit error-reporting
banks available in a particular processor implementation.
• MCG_CTL_P (register present) flag, bit 8 — Indicates that the MCG_CTL register is
present when set and absent when clear.
Bits 9 through 63 are reserved. The effect of writing to the MCG_CAP register is undefined.
14.3.1.3 IA32_MCG_STATUS MSR
The IA32_MCG_STATUS MSR (called the MCG_STATUS MSR for P6 family processors)
describes the current state of the processor after a machine-check exception has occurred (see
Figure 14-4).
Where:
• RIPV (restart IP valid) flag, bit 0 — Indicates (when set) that program execution can be
restarted reliably at the instruction pointed to by the instruction pointer pushed on the stack
when the machine-check exception is generated. When clear, the program cannot be
reliably restarted at the pushed instruction pointer.
• EIPV (error IP valid) flag, bit 1 — Indicates (when set) that the instruction pointed to by
the instruction pointer pushed onto the stack when the machine-check exception is
generated is directly associated with the error. When this flag is cleared, the instruction
pointed to may not be associated with the error.
• MCIP (machine check in progress) flag, bit 2 — Indicates (when set) that a machinecheck
exception was generated. Software can set or clear this flag. The occurrence of a
second Machine-Check Event while MCIP is set will cause the processor to enter a
shutdown state. For information on processor behavior in the shutdown state, please refer
to the description in Chapter 5, Interrupt and Exception Handling: “Interrupt 8—Double
Fault Exception (#DF)”.
Bits 3 through 63 in IA32_MCG_STATUS are reserved.
Figure 14-4. IA32_MCG_STATUS Register
EIPV—Error IP valid flag
MCIP—Machine check in progress flag
63 0
Reserved
3 2 1
EIPV
MCIP
RIPV
RIPV—Restart IP valid flag
Vol. 3 14-5
MACHINE-CHECK ARCHITECTURE
14.3.1.4 IA32_MCG_CTL MSR
The IA32_MCG_CTL MSR (called the MCG_CTL MSR in P6 family processors) is present if
the capability flag MCG_CTL_P is set in the IA32_MCG_CAP MSR (or the MCG_CAP MSR).
IA32_MCG_CTL (or MCG_CTL) controls the reporting of machine-check exceptions. If
present, writing all 1s to this register enables all machine-check features and writing all 0s
disables all machine-check features. All other values are undefined and/or implementation
specific.
14.3.2 Error-Reporting Register Banks
Each error-reporting register bank can contain an the IA32_MCi_CTL, IA32_MCi_STATUS,
IA32_MCi_ADDR, and IA32_MCi_MISC MSRs (called MCi_CTL, MCi_STATUS,
MCi_ADDR, and MCi_MISC in P6 family processors). The Pentium 4 and Intel Xeon processors
provide four banks of error-reporting registers; the P6 family processors provide five banks
of error-reporting registers. The first error-reporting register (IA32_MC0_CTL) always starts at
address 400H.
See Table B-1 for the addresses of the error-reporting registers in the Pentium 4 and Intel Xeon
processors; see Table B-4 for the addresses of the error-reporting registers P6 family processors.
14.3.2.1 IA32_MCI_CTL MSRS
The IA32_MCi_CTL MSR (called MCi_CTL in P6 family processors) controls error reporting
for errors produced by a particular hardware unit (or group of hardware units). Each of the 64
flags (EEj) represents a potential error. Setting an EEj flag enables reporting of the associated
error and clearing it disables reporting of the error. The processor does not write changes to bits
that are not implemented. Figure 14-5 shows the bit fields of IA32_MCi_CTL.
NOTE
For P6 family processors only: the operating system or executive software
must not modify the contents of the MC0_CTL MSR. This MSR is internally
aliased to the EBL_CR_POWERON MSR and controls platform-specific
error handling features. System specific firmware (the BIOS) is responsible
for the appropriate initialization of the MC0_CTL MSR. P6 family
processors only allow the writing of all 1s or all 0s to the MCi_CTL MSR.
Figure 14-5. IA32_MCi_CTL Register
EEj—Error reporting enable flag
63 3 2 1 0
EE01
EE02
EE00
EE61
EE62
EE63
62 61 . . . . .
(where j is 00 through 63)
14-6 Vol. 3
MACHINE-CHECK ARCHITECTURE
14.3.2.2 IA32_MCI_STATUS MSRS
Each IA32_MCi_STATUS MSR (called MCi_STATUS in P6 family processors) contains information
related to a machine-check error if its VAL (valid) flag is set (see Figure 14-6). Software
is responsible for clearing IA32_MCi_STATUS MSRs by explicitly writing 0s to them; writing
1s to them causes a general-protection exception.
Where:
• MCA (machine-check architecture) error code field, bits 0 through 15 — Specifies the
machine-check architecture-defined error code for the machine-check error condition
detected. The machine-check architecture-defined error codes are guaranteed to be the
same for all IA-32 processors that implement the machine-check architecture. See Section
14.6., Interpreting the MCA Error Codes and Appendix E, Interpreting Machine-Check
Error Codes, for information on machine-check error codes.
• Model-specific error code field, bits 16 through 31 — Specifies the model-specific error
code that uniquely identifies the machine-check error condition detected. The modelspecific
error codes may differ among IA-32 processors for the same machine-check error
condition. See Appendix E, Interpreting Machine-Check Error Codes, for information on
model-specific error codes.
• Other information field, bits 32 through 56 — The functions of these bits are implementation
specific and are not part of the machine-check architecture. Software that is intended
to be portable among IA-32 processors should not rely on these values.
• PCC (processor context corrupt) flag, bit 57 — Indicates (when set) that the state of the
processor might have been corrupted by the error condition detected and that reliable
restarting of the processor may not be possible. When clear, this flag indicates that the
error did not affect the processor’s state.
• ADDRV (IA32_MCi_ADDR register valid) flag, bit 58 — Indicates (when set) that the
IA32_MCi_ADDR register contains the address where the error occurred (see Section
14.3.2.3, IA32_MCi_ADDR MSRs). When clear, this flag indicates that the
IA32_MCi_ADDR register is either not implemented or does not contain the address
Figure 14-6. IA32_MCi_STATUS Register
PCC—Processor context corrupt
63 62 6160 59 58 5756 32 31 16 15 0
V
O UC
E
N
PCC
Other Information Model-Specific MCA Error Code
Error Code
ADDRV—MCi_ADDR register valid
MISCV—MCi_MISC register valid
EN—Error enabled
UC—Uncorrected error
OVER—Error overflow
VAL—MCi_STATUS register valid
AL
Vol. 3 14-7
MACHINE-CHECK ARCHITECTURE
where the error occurred. Do not read these registers if they are not implemented in the
processor.
• MISCV (IA32_MCi_MISC register valid) flag, bit 59 — Indicates (when set) that the
IA32_MCi_MISC register contains additional information regarding the error. When clear,
this flag indicates that the IA32_MCi_MISC register is either not implemented or does not
contain additional information regarding the error. Do not read these registers if they are
not implemented in the processor
• EN (error enabled) flag, bit 60 — Indicates (when set) that the error was enabled by the
associated EEj bit of the IA32_MCi_CTL register.
• UC (error uncorrected) flag, bit 61 — Indicates (when set) that the processor did not or
was not able to correct the error condition. When clear, this flag indicates that the
processor was able to correct the error condition.
• OVER (machine check overflow) flag, bit 62 — Indicates (when set) that a machinecheck
error occurred while the results of a previous error were still in the error-reporting
register bank (that is, the VAL bit was already set in the IA32_MCi_STATUS register).
The processor sets the OVER flag and software is responsible for clearing it. Enabled
errors are written over disabled errors, and uncorrected errors are written over corrected
errors. Uncorrected errors are not written over previous valid uncorrected errors.
• VAL (IA32_MCi_STATUS register valid) flag, bit 63 — Indicates (when set) that the
information within the IA32_MCi_STATUS register is valid. When this flag is set, the
processor follows the rules given for the OVER flag in the IA32_MCi_STATUS register
when overwriting previously valid entries. The processor sets the VAL flag and software is
responsible for clearing it.
14.3.2.3 IA32_MCI_ADDR MSRS
The IA32_MCi_ADDR MSR (called MCi_ADDR in the P6 family processors) contains the
address of the code or data memory location that produced the machine-check error if the
ADDRV flag in the IA32_MCi_STATUS register is set (see Section 14-7, IA32_MCi_ADDR
MSR). The IA32_MCi_ADDR register is either not implemented or contains no address if the
ADDRV flag in the IA32_MCi_STATUS register is clear. When not implemented in the
processor, all reads and writes to this MSR will cause a general protection exception.
The address returned is either 32-bit offset into a segment, 32-bit linear address, or 36-bit physical
address, depending upon the type of error encountered.
Bits 36-63 of this register are reserved for future address expansion and are always read as zeros.
These registers can be cleared by explicitly writing all 0s to them; writing 1s to them will cause
a general-protection exception to be generated (see Figure 14-7).
14-8 Vol. 3
MACHINE-CHECK ARCHITECTURE
14.3.2.4 IA32_MCI_MISC MSRS
The IA32_MCi_MISC MSR (called the MCi_MISC MSR in the P6 family processors) contains
additional information describing the machine-check error if the MISCV flag in the
IA32_MCi_STATUS register is set. The IA32_MCi_MISC_MSR is either not implemented or
does not contain additional information if the MISCV flag in the IA32_MCi_STATUS register
is clear.
When not implemented in the processor, all reads and writes to this MSR will cause a general
protection exception. When implemented in a processor, these registers can be cleared by
explicitly writing all 0s to them; writing 1s to them causes a general-protection exception to be
generated. This register is not implemented in any of the error-reporting register banks for the
P6 family processors.
14.3.2.5 IA32_MCG EXTENDED MACHINE CHECK STATE MSRS
The Pentium 4 and Intel Xeon processors implement a variable number of extended machinecheck
state MSRs (the architectural entries are documented in Table 14-1). The MCG_EXT_P
flag in the IA32_MCG_CAP MSR indicates the presence of these extended registers, and the
MCG_EXT_CNT field indicates the number of these registers actually implemented (see
Section 14.3.1.1, IA32_MCG_CAP MSR (Pentium 4 and Intel Xeon Processors)).
There may be registers available beyond the IA32_MCG_MISC register. These registers should
be referred to as IA32_MCG_RESERVED1 to IA32_MCG_RESERVEDn depending on the
actual number.
Figure 14-7. IA32_MCi_ADDR MSR
Table 14-1. Extended Machine Check State MSRs
MSR Address Description
IA32_MCG_EAX 180H State of the EAX register at the time of the machine-check error.
IA32_MCG_EBX 181H State of the EBX register at the time of the machine-check error.
IA32_MCG_ECX 182H State of the ECX register at the time of the machine-check error.
IA32_MCG_EDX 183H State of the EDX register at the time of the machine-check error.
IA32_MCG_ESI 184H State of the ESI register at the time of the machine-check error.
IA32_MCG_EDI 185H State of the EDI register at the time of the machine-check error.
IA32_MCG_EBP 186H State of the EBP register at the time of the machine-check error.
IA32_MCG_ESP 187H State of the ESP register at the time of the machine-check error.
Address
63 0
Reserved
36 35
Vol. 3 14-9
MACHINE-CHECK ARCHITECTURE
When a machine-check error is detected on a Pentium 4 or Intel Xeon processor, the processor
saves the state of the general-purpose registers, the EFLAGS register, and the EIP in these
extended machine-check state MSRs. This information can be used by a debugger to analyze the
error.
These registers are read/write to zero registers. This means software can read them; but if software
writes to them, only all zeros is allowed. If software attempts to write a non-zero value into
one of these registers, a general-protection (#GP) exception is generated. These registers are
cleared on a hardware reset (power-up or RESET), but maintain their contents following a soft
reset (INIT reset).
14.3.3 Mapping of the Pentium Processor Machine-Check Errors
to the Machine-Check Architecture
The Pentium processor reports machine-check errors using two registers: P5_MC_TYPE and
P5_MC_ADDR. The Pentium 4, Intel Xeon, and P6 family processors map these registers to the
IA32_MCi_STATUS and IA32_MCi_ADDR in the error-reporting register bank. This bank
reports on the same type of external bus errors reported in P5_MC_TYPE and P5_MC_ADDR.
The information in these registers can then be accessed in two ways:
• By reading the IA32_MCi_STATUS and IA32_MCi_ADDR registers as part of a general
machine-check exception handler written for Pentium 4 and P6 family processors.
• By reading the P5_MC_TYPE and P5_MC_ADDR registers using the RDMSR
instruction.
The second capability permits a machine-check exception handler written to run on a Pentium
processor to be run on a Pentium 4, Intel Xeon, or P6 family processor. There is a limitation in
that information returned by the Pentium 4, Intel Xeon, and P6 family processors is encoded
differently than information returned by the Pentium processor. To run a Pentium processor
machine-check exception handler on a Pentium 4, Intel Xeon, or P6 family processor; the
handler must be written to interpret P5_MC_TYPE encodings correctly.
IA32_MCG_EFLAGS 188H State of the EFLAGS register at the time of the machine-check
error.
IA32_MCG_EIP 189H State of the EIP register at the time of the machine-check error.
IA32_MCG_MISC 18AH When set, indicates that a page assist or page fault occurred
during DS normal operation.
Table 14-1. Extended Machine Check State MSRs (Contd.)
MSR Address Description
14-10 Vol. 3
MACHINE-CHECK ARCHITECTURE
14.4 MACHINE-CHECK AVAILABILITY
The machine-check architecture and machine-check exception (#MC) are model-specific
features. Software can execute the CPUID instruction to determine whether a processor implements
these features. Following the execution of the CPUID instruction, the settings of the MCA
flag (bit 14) and MCE flag (bit 7) in EDX indicate whether the processor implements the
machine-check architecture and machine-check exception.
14.5 MACHINE-CHECK INITIALIZATION
To use the processors machine-check architecture, software must initialize the processor to activate
the machine-check exception and the error-reporting mechanism.
Example 14-1 gives pseudocode for performing this initialization. This pseudocode checks for
the existence of the machine-check architecture and exception; it then enables machine-check
exception and the error-reporting register banks. The pseudocode shown is compatible with the
Pentium 4, Intel Xeon, P6 family, and Pentium processors.
Following power up or power cycling, IA32_MCi_STATUS registers are not guaranteed to have
valid data until after they are initially cleared to zero by software (as shown in the initialization
pseudocode in Example 14-1). In addition, when using P6 family processors, software must set
MCi_STATUS registers to zero when doing a soft-reset.
Example 14-1. Machine-Check Initialization Pseudocode
Check CPUID Feature Flags for MCE and MCA support
IF CPU supports MCE
THEN
IF CPU supports MCA
THEN
IF (IA32_MCG_CAP.MCG_CTL_P = 1)
(* IA32_MCG_CTL register is present *)
THEN
IA32_MCG_CTL ← FFFFFFFFFFFFFFFFH;
(* enables all MCA features *)
FI
(* Determine number of error-reporting banks supported *)
COUNT← IA32_MCG_CAP.Count;
MAX_BANK_NUMBER ← COUNT - 1;
IF (Processor Family is 6H)
THEN
(* Enable logging of all errors except for MC0_CTL register *)
FOR error-reporting banks (1 through MAX_BANK_NUMBER)
DO
IA32_MCi_CTL ← 0FFFFFFFFFFFFFFFFH;
Vol. 3 14-11
MACHINE-CHECK ARCHITECTURE
OD
(* Clear all errors *)
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
IA32_MCi_STATUS ← 0;
OD
ELSE IF (Processor Family is 0FH) (*any Processor Extended Family *)
THEN
(* Enable logging of all errors including MC0_CTL register *)
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
IA32_MCi_CTL ← 0FFFFFFFFFFFFFFFFH;
OD
(* BIOS clears all errors only on power-on reset *)
IF (BIOS detects Power-on reset)
THEN
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
IA32_MCi_STATUS ← 0;
OD
ELSE
FOR error-reporting banks (0 through MAX_BANK_NUMBER)
DO
(Optional for BIOS and OS) Log valid errors
(OS only) IA32_MCi_STATUS ← 0;
OD
FI
FI
FI
Setup the Machine Check Exception (#MC) handler for vector 18 in IDT
Set the MCE bit (bit 6) in CR4 register to enable Machine-Check Exceptions
FI
14.6. INTERPRETING THE MCA ERROR CODES
When the processor detects a machine-check error condition, it writes a 16-bit error code to the
MCA error code field of one of the IA32_MCi_STATUS registers and sets the VAL (valid) flag
in that register. The processor may also write a 16-bit model-specific error code in the
IA32_MCi_STATUS register depending on the implementation of the machine-check architecture
of the processor.
14-12 Vol. 3
MACHINE-CHECK ARCHITECTURE
The MCA error codes are architecturally defined for IA-32 processors. However, the specific
IA32_MCi_STATUS register that a code is ‘written to’ is model specific. To determine the cause
of a machine-check exception, the machine-check exception handler must read the VAL flag for
each IA32_MCi_STATUS register. If the flag is set, the machine check-exception handler must
then read the MCA error code field of the register. It is the encoding of the MCA error code field
[15:0] that determines the type of error being reported and not the register bank reporting it.
There are two types of MCA error codes: simple error codes and compound error codes.
14.6.1 Simple Error Codes
Table 14-2 shows the simple error codes. These unique codes indicate global error information.
14.6.2 Compound Error Codes
Compound error codes describe errors related to the TLBs, memory, caches, bus and interconnect
logic, and internal timer. A set of sub-fields is common to all of compound errors. These
sub-fields describe the type of access, level in the memory hierarchy, and type of request.
Table 14-4 shows the general form of the compound error codes. The interpretation column
indicates the name of a compound error. The name is constructed by substituting mnemonics
from Tables 14-4 through 14-7 for the sub-field names given within curly braces.
Table 14-2. IA32_MCi_Status [15:0] Simple Error Code Encoding
Error Code Binary Encoding Meaning
No Error 0000 0000 0000 0000 No error has been reported to this bank of
error-reporting registers.
Unclassified 0000 0000 0000 0001 This error has not been classified into the
MCA error classes.
Microcode ROM Parity
Error
0000 0000 0000 0010 Parity error in internal microcode ROM
External Error 0000 0000 0000 0011 The BINIT# from another processor caused
this processor to enter machine check.1
FRC Error 0000 0000 0000 0100 FRC (functional redundancy check)
master/slave error
Internal Unclassified 0000 01xx xxxx xxxx Internal unclassified errors 2
NOTES:
1. BINIT# assertion will cause a machine check exception if the processor (or any processor on the same
external bus) has BINIT# observation enabled during power-on configuration (hardware strapping) and
if machine check exceptions are enabled (by setting CR4.MCE = 1).
2. Internal unclassified errors have not been classified. This is because no additional information is
included in the machine check register.
Vol. 3 14-13
MACHINE-CHECK ARCHITECTURE
For example, the error code ICACHEL1_RD_ERR is constructed from the form:
{TT}CACHE{LL}_{RRRR}_ERR,
where {TT} is replaced by I, {LL} is replaced by L1, and {RRRR} is replaced by RD.
The 2-bit TT sub-field (Table 14-4) indicates the type of transaction (data, instruction, or
generic). The sub-field applies to the TLB, cache, and interconnect error conditions. Note that
interconnect error conditions are primarily associated with P6 family and Pentium processors,
which utilize an external APIC bus separate from the system bus. The generic type is reported
when the processor cannot determine the transaction type.
The 2-bit LL sub-field (see Table 14-5) indicates the level in the memory hierarchy where the
error occurred (level 0, level 1, level 2, or generic). The LL sub-field also applies to the TLB,
cache, and interconnect error conditions. The Pentium 4, Intel Xeon, and P6 family processors
support two levels in the cache hierarchy and one level in the TLBs. Again, the generic type is
reported when the processor cannot determine the hierarchy level.
Table 14-3. IA32_MCi_Status [15:0] Compound Error Code Encoding
Type Form Interpretation
TLB Errors 0000 0000 0001 TTLL {TT}TLB{LL}_ERR
Memory Hierarchy Errors 0000 0001 RRRR TTLL {TT}CACHE{LL}_{RRRR}_ERR
Bus and Interconnect
Errors
0000 1PPT RRRR IILL BUS{LL}_{PP}_{RRRR}_{II}_{T}_ERR
Internal Timer 0000 0100 0000 0000
Table 14-4. Encoding for TT (Transaction Type) Sub-Field
Transaction Type Mnemonic Binary Encoding
Instruction I 00
Data D 01
Generic G 10
Table 14-5. Level Encoding for LL (Memory Hierarchy Level) Sub-Field
Hierarchy Level Mnemonic Binary Encoding
Level 0 L0 00
Level 1 L1 01
Level 2 L2 10
Generic LG 11
14-14 Vol. 3
MACHINE-CHECK ARCHITECTURE
The 4-bit RRRR sub-field (see Table 14-6) indicates the type of action associated with the error.
Actions include read and write operations, prefetches, cache evictions, and snoops. Generic
error is returned when the type of error cannot be determined. Generic read and generic write
are returned when the processor cannot determine the type of instruction or data request that
caused the error. Eviction and snoop requests apply only to the caches. All of the other requests
apply to TLBs, caches and interconnects.
The bus and interconnect errors are defined with the 2-bit PP (participation), 1-bit T (timeout),
and 2-bit II (memory or I/O) sub-fields, in addition to the LL and RRRR sub-fields (see
Table 14-7). The bus error conditions are implementation dependent and related to the type of
bus implemented by the processor. Likewise, the interconnect error conditions are predicated
on a specific implementation-dependent interconnect model that describes the connections
between the different levels of the storage hierarchy. The type of bus is implementation dependent,
and as such is not specified in this document. A bus or interconnect transaction consists
of a request involving an address and a response.
Table 14-6. Encoding of Request (RRRR) Sub-Field
Request Type Mnemonic Binary Encoding
Generic Error ERR 0000
Generic Read RD 0001
Generic Write WR 0010
Data Read DRD 0011
Data Write DWR 0100
Instruction Fetch IRD 0101
Prefetch PREFETCH 0110
Eviction EVICT 0111
Snoop SNOOP 1000
Vol. 3 14-15
MACHINE-CHECK ARCHITECTURE
14.6.3 Machine-Check Error Codes Interpretation
Appendix E, Interpreting Machine-Check Error Codes, provides information on interpreting the
MCA error code, model-specific error code, and other information error code fields. For P6
family processors, information has been included on decoding external bus errors. For Pentium
4 and Intel Xeon processors; information is included on external bus, internal timer and memory
hierarchy errors.
14.7 GUIDELINES FOR WRITING MACHINE-CHECK SOFTWARE
The machine-check architecture and error logging can be used in two different ways:
• To detect machine errors during normal instruction execution, using the machine-check
exception (#MC).
• To periodically check and log machine errors.
To use the machine-check exception, the operating system or executive software must provide
a machine-check exception handler. This handler can be designed specifically for Pentium 4 and
Intel Xeon processors or for P6 family processors. It can also be a portable handler that handles
processor machine-check errors from several generations of IA-32 processors.
A special program or utility is required to log machine errors.
Guidelines for writing a machine-check exception handler or a machine-error logging utility are
given in the following sections.
Table 14-7. Encodings of PP, T, and II Sub-Fields
Sub-Field Transaction Mnemonic Binary Encoding
PP (Participation) Local processor1 originated request SRC 00
Local processor1 responded to request RES 01
Local processor1 observed error as third
party
OBS 10
Generic 11
T (Time-out) Request timed out TIMEOUT 1
Request did not time out NOTIMEOUT 0
II (Memory or I/O) Memory Access M 00
Reserved 01
I/O IO 10
Other transaction 11
NOTE:
1. Local processor differentiates the processor reporting the error from other system components (including
the APIC, other processors, etc.).
14-16 Vol. 3
MACHINE-CHECK ARCHITECTURE
14.7.1 Machine-Check Exception Handler
The machine-check exception (#MC) corresponds to vector 18. To service machine-check
exceptions, a trap gate must be added to the IDT. The pointer in the trap gate must point to a
machine-check exception handler. Two approaches can be taken to designing the exception
handler:
1. The handler can merely log all the machine status and error information, then call a
debugger or shut down the system.
2. The handler can analyze the reported error information and, in some cases, attempt to
correct the error and restart the processor.
For Pentium 4, Intel Xeon, P6 family, and Pentium processors; virtually all machine-check
conditions cannot be corrected (they result in abort-type exceptions). The logging of status and
error information is therefore a baseline implementation requirement. See Section 14.7 for more
information on logging errors.
When recovery from a machine-check error may be possible, consider the following when
writing a machine-check exception handler:
• To determine the nature of the error, the handler must read each of the error-reporting
register banks. The count field in the IA32_MCG_CAP register gives number of register
banks. The first register of register bank 0 is at address 400H.
• The VAL (valid) flag in each IA32_MCi_STATUS register indicates whether the error
information in the register is valid. If this flag is clear, the registers in that bank do not
contain valid error information and do not need to be checked.
• To write a portable exception handler, only the MCA error code field in the
IA32_MCi_STATUS register should be checked. See Section 14.6. for information that
can be used to write an algorithm to interpret this field.
• The RIPV, PCC, and OVER flags in each IA32_MCi_STATUS register indicate whether
recovery from the error is possible. If one of these fields is set, recovery is not possible.
The OVER field indicates that two or more machine-check errors occurred. When
recovery is not possible, the handler typically records the error information and signals an
abort to the operating system.
• Correctable errors are corrected automatically by the processor. The UC flag in each
IA32_MCi_STATUS register indicates whether the processor automatically corrected an
error.
• The RIPV flag in the IA32_MCG_STATUS register indicates whether the program can be
restarted at the instruction indicated by the instruction pointer (the address of the
instruction pushed on the stack when the exception was generated). If this flag is clear, the
processor may still be able to be restarted (for debugging purposes) but not without loss of
program continuity.
• For unrecoverable errors, the EIPV flag in the IA32_MCG_STATUS register indicates
whether the instruction indicated by the instruction pointer pushed on the stack (when the
exception was generated) is related to the error. If the flag is clear, the pushed instruction
may not be related to the error.
Vol. 3 14-17
MACHINE-CHECK ARCHITECTURE
• The MCIP flag in the IA32_MCG_STATUS register indicates whether a machine-check
exception was generated. Before returning from the machine-check exception handler,
software should clear this flag so that it can be used reliably by an error logging utility. The
MCIP flag also detects recursion. The machine-check architecture does not support
recursion. When the processor detects machine-check recursion, it enters the shutdown
state.
14.7.2 Enabling BINIT# Drive and BINIT# Observation
For complete operation of the processors machine check capabilities, it is essential that the
system BIOS enable BINIT# drive and BINIT# observation. This allows the processor to use
BINIT# to clear internal blocking states and some external blocking states. This also allows the
processor to correctly report a wide range of machine check exceptions.
For example, on a Pentium III processor that is:
• Executing a locked CMPXCHG8B instruction.
• Reports a machine check exception on the initial data read.
• And the comparison operation fails.
The processor unlocks the bus after completion of the locked sequence by asserting a BINIT#
signal. Without BINIT# drive (UP environment) or BINIT# drive and observation enabled (MP
environment); the machine check error is logged but the machine check exception is not taken
(if MCE's are enabled).
Example 14-2 gives typical steps carried out by a machine-check exception handler.
Example 14-2. Machine-Check Exception Handler Pseudocode
IF CPU supports MCE
THEN
IF CPU supports MCA
THEN
call errorlogging routine; (* returns restartability *)
FI;
ELSE (* Pentium(R) processor compatible *)
READ P5_MC_ADDR
READ P5_MC_TYPE;
report RESTARTABILITY to console;
FI;
IF error is not restartable
THEN
report RESTARTABILITY to console;
abort system;
FI;
CLEAR MCIP flag in IA32_MCG_STATUS;
14-18 Vol. 3
MACHINE-CHECK ARCHITECTURE
14.7.3 Pentium Processor Machine-Check Exception Handling
To make the machine-check exception handler portable to the Pentium 4, Intel Xeon, P6 family,
and Pentium processors, checks can be made (using CPUID) to determine the processor type.
Then based on the processor type, machine-check exceptions can be handled specifically for
Pentium 4, Intel Xeon, P6 family, or Pentium processors.
When machine-check exceptions are enabled for the Pentium processor (MCE flag is set in
control register CR4), the machine-check exception handler uses the RDMSR instruction to read
the error type from the P5_MC_TYPE register and the machine check address from the
P5_MC_ADDR register. The handler then normally reports these register values to the system
console before aborting execution (see Example 14-2).
14.7.4 Logging Correctable Machine-Check Errors
If a machine-check error is correctable, the processor does not generate a machine-check exception
for it. To detect correctable machine-check errors, a utility program must be written that
reads each of the machine-check error-reporting register banks and logs the results in an
accounting file or data structure. This utility can be implemented in either of the following ways.
• A system daemon that polls the register banks on an infrequent basis, such as hourly or
daily.
• A user-initiated application that polls the register banks and records the exceptions. Here,
the actual polling service is provided by an operating-system driver or through the system
call interface.
Example 14-3 gives pseudocode for an error logging utility.
Example 14-3. Machine-Check Error Logging Pseudocode
Assume that execution is restartable;
IF the processor supports MCA
THEN
FOR each bank of machine-check registers
DO
READ IA32_MCi_STATUS;
IF VAL flag in IA32_MCi_STATUS = 1
THEN
IF ADDRV flag in IA32_MCi_STATUS = 1
THEN READ IA32_MCi_ADDR;
FI;
IF MISCV flag in IA32_MCi_STATUS = 1
THEN READ IA32_MCi_MISC;
FI;
IF MCIP flag in IA32_MCG_STATUS = 1
(* Machine-check exception is in progress *)
AND PCC flag in IA32_MCi_STATUS = 1
Vol. 3 14-19
MACHINE-CHECK ARCHITECTURE
AND RIPV flag in IA32_MCG_STATUS = 0
(* execution is not restartable *)
THEN
RESTARTABILITY = FALSE;
return RESTARTABILITY to calling procedure;
FI;
Save time-stamp counter and processor ID;
Set IA32_MCi_STATUS to all 0s;
Execute serializing instruction (i.e., CPUID);
FI;
OD;
FI;
If the processor supports the machine-check architecture, the utility reads through the banks of
error-reporting registers looking for valid register entries. It then saves the values of the
IA32_MCi_STATUS, IA32_MCi_ADDR, IA32_MCi_MISC and IA32_MCG_STATUS registers
for each bank that is valid. The routine minimizes processing time by recording the raw data
into a system data structure or file, reducing the overhead associated with polling. User utilities
analyze the collected data in an off-line environment.
When the MCIP flag is set in the IA32_MCG_STATUS register, a machine-check exception is
in progress and the machine-check exception handler has called the exception logging routine.
Once the logging process has been completed the exception-handling routine must determine
whether execution can be restarted, which is usually possible when damage has not occurred
(The PCC flag is clear, in the IA32_MCi_STATUS register) and when the processor can guarantee
that execution is restartable (the RIPV flag is set in the IA32_MCG_STATUS register). If
execution cannot be restarted, the system is not recoverable and the exception-handling routine
should signal the console appropriately before returning the error status to the Operating System
kernel for subsequent shutdown.
The machine-check architecture allows buffering of exceptions from a given error-reporting
bank although the Pentium 4, Intel Xeon, and P6 family processors do not implement this
feature. The error logging routine should provide compatibility with future processors by
reading each hardware error-reporting bank's IA32_MCi_STATUS register and then writing 0s
to clear the OVER and VAL flags in this register. The error logging utility should re-read the
IA32_MCi_STATUS register for the bank ensuring that the valid bit is clear. The processor will
write the next error into the register bank and set the VAL flags.
Additional information that should be stored by the exception-logging routine includes the
processor’s time-stamp counter value, which provides a mechanism to indicate the frequency of
exceptions. A multiprocessing operating system stores the identity of the processor node incurring
the exception using a unique identifier, such as the processor’s APIC ID (see Section 8.8,
Handling Interrupts).
14-20 Vol. 3
MACHINE-CHECK ARCHITECTURE
The basic algorithm given in Example 14-3 can be modified to provide more robust recovery
techniques. For example, software has the flexibility to attempt recovery using information
unavailable to the hardware. Specifically, the machine-check exception handler can, after
logging carefully analyze the error-reporting registers when the error-logging routine reports an
error that does not allow execution to be restarted. These recovery techniques can use external
bus related model-specific information provided with the error report to localize the source of
the error within the system and determine the appropriate recovery strategy.

CHAPTER 13 SYSTEM MANAGEMENT

13 System Management

Vol. 3 13-1
CHAPTER 13 SYSTEM MANAGEMENT
This chapter describes the two aspects of IA-32 architecture used to manage system resources:
system management mode (SMM) and the thermal monitoring facilities.
SMM provides an alternate operating environment that can be used to monitor and manage
various system resources for more efficient energy usage, to control system hardware, and/or to
run proprietary code. It was introduced into the IA-32 architecture in the Intel386 SL processor
(a mobile specialized version of the Intel386 processor). It is also available in the Pentium 4,
Intel Xeon, P6 family, and Pentium and Intel486 processors (beginning with the enhanced
versions of the Intel486 SL and Intel486 processors). For a detailed description of the hardware
that supports SMM, see the developer’s manual for each of the IA-32 processors.
The thermal monitoring facilities enable monitoring and controlling the core temperature of an
IA-32 processor. These facilities were introduced in the P6 family processors and extended in
the Pentium 4, Intel Xeon and Pentium M processors.
13.1 SYSTEM MANAGEMENT MODE OVERVIEW
SMM is a special-purpose operating mode provided for handling system-wide functions like
power management, system hardware control, or proprietary OEM-designed code. It is intended
for use only by system firmware, not by applications software or general-purpose systems software.
The main benefit of SMM is that it offers a distinct and easily isolated processor environment
that operates transparently to the operating system or executive and software applications.
When SMM is invoked through a system management interrupt (SMI), the processor saves the
current state of the processor (the processor’s context), then switches to a separate operating
environment contained in system management RAM (SMRAM). While in SMM, the processor
executes SMI handler code to perform operations such as powering down unused disk drives or
monitors, executing proprietary code, or placing the whole system in a suspended state. When
the SMI handler has completed its operations, it executes a resume (RSM) instruction. This
instruction causes the processor to reload the saved context of the processor, switch back to
protected or real mode, and resume executing the interrupted application or operating-system
program or task.
The following SMM mechanisms make it transparent to applications programs and operating
systems:
• The only way to enter SMM is by means of an SMI.
• The processor executes SMM code in a separate address space (SMRAM) that can be
made inaccessible from the other operating modes.
• Upon entering SMM, the processor saves the context of the interrupted program or task.
13-2 Vol. 3
SYSTEM MANAGEMENT
• All interrupts normally handled by the operating system are disabled upon entry into
SMM.
• The RSM instruction can be executed only in SMM.
SMM is similar to real-address mode in that there are no privilege levels or address mapping.
An SMM program can address up to 4 GBytes of memory and can execute all I/O and applicable
system instructions. See Section 13.5, “SMI Handler Execution Environment”, for more information
about the SMM execution environment.
NOTE
The physical address extension (PAE) mechanism available in the P6 family
processors is not supported when a processor is in SMM.
13.2 SYSTEM MANAGEMENT INTERRUPT (SMI)
The only way to enter SMM is by signaling an SMI through the SMI# pin on the processor or
through an SMI message received through the APIC bus. The SMI is a nonmaskable external
interrupt that operates independently from the processor’s interrupt- and exception-handling
mechanism and the local APIC. The SMI takes precedence over an NMI and a maskable interrupt.
SMM is non-reentrant; that is, the SMI is disabled while the processor is in SMM.
NOTE
In the Pentium 4, Intel Xeon, and P6 family processors, when a processor that
is designated as an application processor during an MP initialization
sequence is waiting for a startup IPI (SIPI), it is in a mode where SMIs are
masked. However if a SMI is received while an application processor is in the
wait for SIPI mode, the SMI will be pended. The processor then responds on
receipt of a SIPI by immediately servicing the pended SMI and going into
SMM before handling the SIPI.
13.3 SWITCHING BETWEEN SMM AND THE OTHER
PROCESSOR OPERATING MODES
Figure 2-3 shows how the processor moves between SMM and the other processor operating
modes (protected, real-address, and virtual-8086). Signaling an SMI while the processor is in
real-address, protected, or virtual-8086 modes always causes the processor to switch to SMM.
Upon execution of the RSM instruction, the processor always returns to the mode it was in when
the SMI occurred.
Vol. 3 13-3
SYSTEM MANAGEMENT
13.3.1 Entering SMM
The processor always handles an SMI on an architecturally defined “interruptible” point in
program execution (which is commonly at an IA-32 architecture instruction boundary). When
the processor receives an SMI, it waits for all instructions to retire and for all stores to complete.
The processor then saves its current context in SMRAM (see Section 13.4, “SMRAM”), enters
SMM, and begins to execute the SMI handler.
Upon entering SMM, the processor signals external hardware that SMM handling has begun.
The signaling mechanism used is implementation dependent. For the P6 family processors, an
SMI acknowledge transaction is generated on the system bus and the multiplexed status signal
EXF4 is asserted each time a bus transaction is generated while the processor is in SMM. For
the Pentium and Intel486 processors, the SMIACT# pin is asserted.
An SMI has a greater priority than debug exceptions and external interrupts. Thus, if an NMI,
maskable hardware interrupt, or a debug exception occurs at an instruction boundary along with
an SMI, only the SMI is handled. Subsequent SMI requests are not acknowledged while the
processor is in SMM. The first SMI interrupt request that occurs while the processor is in SMM
(that is, after SMM has been acknowledged to external hardware) is latched and serviced when
the processor exits SMM with the RSM instruction. The processor will latch only one SMI while
in SMM.
See Section 13.5, “SMI Handler Execution Environment”, for a detailed description of the
execution environment when in SMM.
13.3.2 Exiting From SMM
The only way to exit SMM is to execute the RSM instruction. The RSM instruction is only available
to the SMI handler; if the processor is not in SMM, attempts to execute the RSM instruction
result in an invalid-opcode exception (#UD) being generated.
The RSM instruction restores the processor’s context by loading the state save image from
SMRAM back into the processor’s registers. The processor then returns an SMIACK transaction
on the system bus and returns program control back to the interrupted program.
Upon successful completion of the RSM instruction, the processor signals external hardware
that SMM has been exited. For the P6 family processors, an SMI acknowledge transaction is
generated on the system bus and the multiplexed status signal EXF4 is no longer generated on
bus cycles. For the Pentium and Intel486 processors, the SMIACT# pin is deserted.
If the processor detects invalid state information saved in the SMRAM, it enters the shutdown
state and generates a special bus cycle to indicate it has entered shutdown state. Shutdown
happens only in the following situations:
• A reserved bit in control register CR4 is set to 1 on a write to CR4. This error should not
happen unless SMI handler code modifies reserved areas of the SMRAM saved state map
(see Section 13.4.1, “SMRAM State Save Map”). Note that CR4 is saved in the state map
in a reserved location and cannot be read or modified in its saved state.
13-4 Vol. 3
SYSTEM MANAGEMENT
• An illegal combination of bits is written to control register CR0, in particular PG set to 1
and PE set to 0, or NW set to 1 and CD set to 0.
• (For the Pentium and Intel486 processors only.) If the address stored in the SMBASE
register when an RSM instruction is executed is not aligned on a 32-KByte boundary. This
restriction does not apply to the P6 family processors.
In the shutdown state, Intel processors stop executing instructions until a RESET#, INIT# or
NMI# is asserted. While Pentium family processors recognize the SMI# signal in shutdown
state, P6 family and Intel486 processors do not. Intel does not support using SMI# to recover
from shutdown states for any processor family; the response of processors in this circumstance
is not well defined. On Pentium 4 and later processors, shutdown will inhibit INTR and A20M
but will not change any of the other inhibits. On these processors, NMIs will be inhibited if no
action is taken in the SMM handler to uninhibit them (see Section 13.8).
If the processor is in the HALT state when the SMI is received, the processor handles the return
from SMM slightly differently (see Section 13.11, “Auto HALT Restart”). Also, the SMBASE
address can be changed on a return from SMM (see Section 13.12, “SMBASE Relocation”).
13.4 SMRAM
While in SMM, the processor executes code and stores data in the SMRAM space. The SMRAM
space is mapped to the physical address space of the processor and can be up to 4 GBytes in size.
The processor uses this space to save the context of the processor and to store the SMI handler
code, data and stack. It can also be used to store system management information (such as the
system configuration and specific information about powered-down devices) and OEM-specific
information.
The default SMRAM size is 64 KBytes beginning at a base physical address in physical memory
called the SMBASE (see Figure 13-1). The SMBASE default value following a hardware reset
is 30000H. The processor looks for the first instruction of the SMI handler at the address
[SMBASE + 8000H]. It stores the processor’s state in the area from [SMBASE + FE00H] to
[SMBASE + FFFFH]. See Section 13.4.1, “SMRAM State Save Map”, for a description of the
mapping of the state save area.
The system logic is minimally required to decode the physical address range for the SMRAM
from [SMBASE + 8000H] to [SMBASE + FFFFH]. A larger area can be decoded if needed. The
size of this SMRAM can be between 32 KBytes and 4 GBytes.
The location of the SMRAM can be changed by changing the SMBASE value (see Section
13.12, “SMBASE Relocation”). It should be noted that all processors in a multiple-processor
system are initialized with the same SMBASE value (30000H). Initialization software must
sequentially place each processor in SMM and change its SMBASE so that it does not overlap
those of other processors.
The actual physical location of the SMRAM can be in system memory or in a separate RAM
memory. The processor generates an SMI acknowledge transaction (P6 family processors) or
asserts the SMIACT# pin (Pentium and Intel486 processors) when the processor receives an
SMI (see Section 13.3.1, “Entering SMM”).
Vol. 3 13-5
SYSTEM MANAGEMENT
System logic can use the SMI acknowledge transaction or the assertion of the SMIACT# pin to
decode accesses to the SMRAM and redirect them (if desired) to specific SMRAM memory. If
a separate RAM memory is used for SMRAM, system logic should provide a programmable
method of mapping the SMRAM into system memory space when the processor is not in SMM.
This mechanism will enable start-up procedures to initialize the SMRAM space (that is, load the
SMI handler) before executing the SMI handler during SMM.
13.4.1 SMRAM State Save Map
When an IA-32 processor that does not support Intel Em64T initially enters SMM, it writes its
state to the state save area of the SMRAM. The state save area begins at [SMBASE + 8000H
+ 7FFFH] and extends down to [SMBASE + 8000H + 7E00H]. Table 13-1 shows the state save
map. The offset in column 1 is relative to the SMBASE value plus 8000H. Reserved spaces
should not be used by software.
Some of the registers in the SMRAM state save area (marked YES in column 3) may be read
and changed by the SMI handler, with the changed values restored to the processor registers by
the RSM instruction. Some register images are read-only, and must not be modified (modifying
these registers will result in unpredictable behavior). An SMI handler should not rely on any
values stored in an area that is marked as reserved.
Figure 13-1. SMRAM Usage
Table 13-1. SMRAM State Save Map
Offset
(Added to SMBASE +
8000H) Register Writable?
7FFCH CR0 No
7FF8H CR3 No
7FF4H EFLAGS Yes
7FF0H EIP Yes
Start of State Save Area
SMBASE + FFFFH
SMBASE
SMBASE + 8000H
SMRAM
SMI Handler Entry Point
13-6 Vol. 3
SYSTEM MANAGEMENT
7FECH EDI Yes
7FE8H ESI Yes
7FE4H EBP Yes
7FE0H ESP Yes
7FDCH EBX Yes
7FD8H EDX Yes
7FD4H ECX Yes
7FD0H EAX Yes
7FCCH DR6 No
7FC8H DR7 No
7FC4H TR* No
7FC0H Reserved No
7FBCH GS* No
7FB8H FS* No
7FB4H DS* No
7FB0H SS* No
7FACH CS* No
7FA8H ES* No
7FA4H I/O State Field, see Section 13.7 No
7FA0H I/O Memory Address Field, see Section 13.7 No
7F9FH-7F03H Reserved No
7F02H Auto HALT Restart Field (Word) Yes
7F00H I/O Instruction Restart Field (Word) Yes
7EFCH SMM Revision Identifier Field (Doubleword) No
7EF8H SMBASE Field (Doubleword) Yes
7EF7H - 7E00H Reserved No
NOTE:
* The two most significant bytes are reserved.
Table 13-1. SMRAM State Save Map (Contd.)
Offset
(Added to SMBASE +
8000H) Register Writable?
Vol. 3 13-7
SYSTEM MANAGEMENT
The following registers are saved (but not readable) and restored upon exiting SMM:
• Control register CR4. (This register is cleared to all 0s while in SMM).
• The hidden segment descriptor information stored in segment registers CS, DS, ES, FS,
GS, and SS.
If an SMI request is issued for the purpose of powering down the processor, the values of all
reserved locations in the SMM state save must be saved to nonvolatile memory.
The following state is not automatically saved and restored following an SMI and the RSM
instruction, respectively:
• Debug registers DR0 through DR3.
• The x87 FPU registers.
• The MTRRs.
• Control register CR2.
• The model-specific registers (for the P6 family and Pentium processors) or test registers
TR3 through TR7 (for the Pentium and Intel486 processors).
• The state of the trap controller.
• The machine-check architecture registers.
• The APIC internal interrupt state (ISR, IRR, etc.).
• The microcode update state.
If an SMI is used to power down the processor, a power-on reset will be required before
returning to SMM, which will reset much of this state back to its default values. So an SMI
handler that is going to trigger power down should first read these registers listed above directly,
and save them (along with the rest of RAM) to nonvolatile storage. After the power-on reset, the
continuation of the SMI handler should restore these values, along with the rest of the system's
state. Anytime the SMI handler changes these registers in the processor, it must also save and
restore them.
NOTES
A small subset of the MSRs (such as, the time-stamp counter and
performance-monitoring counters) are not arbitrarily writable and therefore
cannot be saved and restored. SMM-based power-down and restoration
should only be performed with operating systems that do not use or rely on
the values of these registers.
Operating system developers should be aware of this fact and insure that their
operating-system assisted power-down and restoration software is immune to
unexpected changes in these register values.
13-8 Vol. 3
SYSTEM MANAGEMENT
13.4.1.1 SMRAM State Save Map and Intel EM64T
When the processor initially enters SMM, it writes its state to the state save area of the SMRAM.
The state save area on an IA-32 processor that supports Intel EM64T begins at [SMBASE +
8000H + 7FFFH] and extends to [SMBASE + 8000H + 7C00H].
Intel EM64T is supported in an IA-32 processor if the processor reports
CPUID.80000001:EDX[29] = 1. The layout of the SMRAM state save map is shown in Table
13-2.
Table 13-2. SMRAM State Save Map for Intel EM64T
Offset
(Added to SMBASE +
8000H) Register Writable?
7FF8H CR0 No
7FF0H CR3 No
7FE8H RFLAGS Yes
7FE0H IA32_EFER Yes
7FD8H RIP Yes
7FD0H DR6 No
7FC8H DR7 No
7FC4H TR SEL* No
7FC0H LDTR SEL* No
7FBCH GS SEL* No
7FB8H FS SEL* No
7FB4H DS SEL* No
7FB0H SS SEL* No
7FACH CS SEL* No
7FA8H ES SEL* No
7FA4H IO_MISC No
7F9CH IO_MEM_ADDR No
7F94H RDI Yes
7F8CH RSI Yes
7F84H RBP Yes
7F7CH RSP Yes
7F74H RBX Yes
7F6CH RDX Yes
7F64H RCX Yes
7F5CH RAX Yes
Vol. 3 13-9
SYSTEM MANAGEMENT
7F54H R8 Yes
7F4CH R9 Yes
7F44H R10 Yes
7F3CH R11 Yes
7F34H R12 Yes
7F2CH R13 Yes
7F24H R14 Yes
7F1CH R15 Yes
7F1BH-7F04H Reserved No
7F02H Auto HALT Restart Field (Word) Yes
7F00H I/O Instruction Restart Field (Word) Yes
7EFCH SMM Revision Identifier Field (Doubleword) No
7EF8H SMBASE Field (Doubleword) Yes
7EF7H - 7EA8H Reserved No
7EA4H LDT Info No
7EA0H LDT Limit No
7E9CH LDT Base (lower 32 bits) No
7E98H IDT Limit No
7E94H IDT Base (lower 32 bits) No
7E90H GDT Limit No
7E8CH GDT Base (lower 32 bits) No
7E8BH - 7E44H Reserved No
7E40H CR4 No
7E3FH - 7DF0H Reserved No
7DE8H IO_EIP Yes
7DE7H - 7DDCH Reserved No
7DD8H IDT Base (Upper 32 bits) No
7DD4H LDT Base (Upper 32 bits) No
7DD0H GDT Base (Upper 32 bits) No
7DCFH - 7C00H Reserved No
Table 13-2. SMRAM State Save Map for Intel EM64T (Contd.)
Offset
(Added to SMBASE +
8000H) Register Writable?
13-10 Vol. 3
SYSTEM MANAGEMENT
13.4.2 SMRAM Caching
An IA-32 processor does not automatically write back and invalidate its caches before entering
SMM or before exiting SMM. Because of this behavior, care must be taken in the placement of
the SMRAM in system memory and in the caching of the SMRAM to prevent cache incoherence
when switching back and forth between SMM and protected mode operation. Either of the
following three methods of locating the SMRAM in system memory will guarantee cache
coherency:
• Place the SRAM in a dedicated section of system memory that the operating system and
applications are prevented from accessing. Here, the SRAM can be designated as
cacheable (WB, WT, or WC) for optimum processor performance, without risking cache
incoherence when entering or exiting SMM.
• Place the SRAM in a section of memory that overlaps an area used by the operating system
(such as the video memory), but designate the SMRAM as uncacheable (UC). This method
prevents cache access when in SMM to maintain cache coherency, but the use of
uncacheable memory reduces the performance of SMM code.
• Place the SRAM in a section of system memory that overlaps an area used by the operating
system and/or application code, but explicitly flush (write back and invalidate) the caches
upon entering and exiting SMM mode. This method maintains cache coherency, but the
incurs the overhead of two complete cache flushes.
For Pentium 4, Intel Xeon, and P6 family processors, a combination of the first two methods of
locating the SMRAM is recommended. Here the SMRAM is split between an overlapping and
a dedicated region of memory. Upon entering SMM, the SMRAM space that is accessed overlaps
video memory (typically located in low memory). This SMRAM section is designated as
UC memory. The initial SMM code then jumps to a second SMRAM section that is located in a
dedicated region of system memory (typically in high memory). This SMRAM section can be
cached for optimum processor performance.
For systems that explicitly flush the caches upon entering SMM (the third method described
above), the cache flush can be accomplished by asserting the FLUSH# pin at the same time as
the request to enter SMM (generally initiated by asserting the SMI# pin). The priorities of the
FLUSH# and SMI# pins are such that the FLUSH# is serviced first. To guarantee this behavior,
the processor requires that the following constraints on the interaction of FLUSH# and SMI# be
met. In a system where the FLUSH# and SMI# pins are synchronous and the set up and hold
times are met, then the FLUSH# and SMI# pins may be asserted in the same clock. In asynchronous
systems, the FLUSH# pin must be asserted at least one clock before the SMI# pin to guarantee
that the FLUSH# pin is serviced first.
Upon leaving SMM (for systems that explicitly flush the caches), the WBINVD instruction
should be executed prior to leaving SMM to flush the caches.
Vol. 3 13-11
SYSTEM MANAGEMENT
NOTES
In systems based on the Pentium processor that use the FLUSH# pin to write
back and invalidate cache contents before entering SMM, the processor will
prefetch at least one cache line in between when the Flush Acknowledge
cycle is run and the subsequent recognition of SMI# and the assertion of
SMIACT#.
It is the obligation of the system to ensure that these lines are not cached by
returning KEN# inactive to the Pentium processor.
13.5 SMI HANDLER EXECUTION ENVIRONMENT
After saving the current context of the processor, the processor initializes its core registers to the
values shown in Table 13-3. Upon entering SMM, the PE and PG flags in control register CR0
are cleared, which places the processor is in an environment similar to real-address mode. The
differences between the SMM execution environment and the real-address mode execution
environment are as follows:
• The addressable SMRAM address space ranges from 0 to FFFFFFFFH (4 GBytes). (The
physical address extension (enabled with the PAE flag in control register CR4) is not
supported in SMM.)
• The normal 64-KByte segment limit for real-address mode is increased to 4 GBytes.
• The default operand and address sizes are set to 16 bits, which restricts the addressable
SMRAM address space to the 1-MByte real-address mode limit for native real-addressmode
code. However, operand-size and address-size override prefixes can be used to
access the address space beyond the 1-MByte.
Table 13-3. Processor Register Initialization in SMM
Register Contents
General-purpose registers Undefined
EFLAGS 00000002H
EIP 00008000H
CS selector SMM Base shifted right 4 bits (default 3000H)
CS base SMM Base (default 30000H)
DS, ES, FS, GS, SS Selectors 0000H
DS, ES, FS, GS, SS Bases 000000000H
DS, ES, FS, GS, SS Limits 0FFFFFFFFH
CR0 PE, EM, TS, and PG flags set to 0; others unmodified
CR4 Cleared to zero
DR6 Undefined
DR7 00000400H
13-12 Vol. 3
SYSTEM MANAGEMENT
• Near jumps and calls can be made to anywhere in the 4-GByte address space if a 32-bit
operand-size override prefix is used. Due to the real-address-mode style of base-address
formation, a far call or jump cannot transfer control to a segment with a base address of
more than 20 bits (1 MByte). However, since the segment limit in SMM is 4 GBytes,
offsets into a segment that go beyond the 1-MByte limit are allowed when using 32-bit
operand-size override prefixes. Any program control transfer that does not have a 32-bit
operand-size override prefix truncates the EIP value to the 16 low-order bits.
• Data and the stack can be located anywhere in the 4-GByte address space, but can be
accessed only with a 32-bit address-size override if they are located above 1 MByte. As
with the code segment, the base address for a data or stack segment cannot be more than
20 bits.
The value in segment register CS is automatically set to the default of 30000H for the SMBASE
shifted 4 bits to the right; that is, 3000H. The EIP register is set to 8000H. When the EIP value
is added to shifted CS value (the SMBASE), the resulting linear address points to the first
instruction of the SMI handler.
The other segment registers (DS, SS, ES, FS, and GS) are cleared to 0 and their segment limits
are set to 4 GBytes. In this state, the SMRAM address space may be treated as a single flat
4-GByte linear address space. If a segment register is loaded with a 16-bit value, that value is
then shifted left by 4 bits and loaded into the segment base (hidden part of the segment register).
The limits and attributes are not modified.
Maskable hardware interrupts, exceptions, NMI interrupts, SMI interrupts, A20M interrupts,
single-step traps, breakpoint traps, and INIT operations are inhibited when the processor enters
SMM. Maskable hardware interrupts, exceptions, single-step traps, and breakpoint traps can be
enabled in SMM if the SMM execution environment provides and initializes an interrupt table
and the necessary interrupt and exception handlers (see Section 13.6, “Exceptions and Interrupts
Within SMM”).
13.6 EXCEPTIONS AND INTERRUPTS WITHIN SMM
When the processor enters SMM, all hardware interrupts are disabled in the following manner:
• The IF flag in the EFLAGS register is cleared, which inhibits maskable hardware
interrupts from being generated.
• The TF flag in the EFLAGS register is cleared, which disables single-step traps.
• Debug register DR7 is cleared, which disables breakpoint traps. (This action prevents a
debugger from accidentally breaking into an SMM handler if a debug breakpoint is set in
normal address space that overlays code or data in SMRAM.)
• NMI, SMI, and A20M interrupts are blocked by internal SMM logic. (See Section 13.8,
“NMI Handling While in SMM”, for further information about how NMIs are handled in
SMM.)
Vol. 3 13-13
SYSTEM MANAGEMENT
Software-invoked interrupts and exceptions can still occur, and maskable hardware interrupts
can be enabled by setting the IF flag. Intel recommends that SMM code be written in so that it
does not invoke software interrupts (with the INT n, INTO, INT 3, or BOUND instructions) or
generate exceptions.
If the SMM handler requires interrupt and exception handling, an SMM interrupt table and the
necessary exception and interrupt handlers must be created and initialized from within SMM.
Until the interrupt table is correctly initialized (using the LIDT instruction), exceptions and software
interrupts will result in unpredictable processor behavior.
The following restrictions apply when designing SMM interrupt and exception-handling
facilities:
• The interrupt table should be located at linear address 0 and must contain real-address
mode style interrupt vectors (4 bytes containing CS and IP).
• Due to the real-address mode style of base address formation, an interrupt or exception
cannot transfer control to a segment with a base address of more that 20 bits.
• An interrupt or exception cannot transfer control to a segment offset of more than 16 bits
(64 KBytes).
• When an exception or interrupt occurs, only the 16 least-significant bits of the return
address (EIP) are pushed onto the stack. If the offset of the interrupted procedure is greater
than 64 KBytes, it is not possible for the interrupt/exception handler to return control to
that procedure. (One solution to this problem is for a handler to adjust the return address on
the stack.)
• The SMBASE relocation feature affects the way the processor will return from an interrupt
or exception generated while the SMI handler is executing. For example, if the SMBASE
is relocated to above 1 MByte, but the exception handlers are below 1 MByte, a normal
return to the SMI handler is not possible. One solution is to provide the exception handler
with a mechanism for calculating a return address above 1 MByte from the 16-bit return
address on the stack, then use a 32-bit far call to return to the interrupted procedure.
• If an SMI handler needs access to the debug trap facilities, it must insure that an SMM
accessible debug handler is available and save the current contents of debug registers DR0
through DR3 (for later restoration). Debug registers DR0 through DR3 and DR7 must then
be initialized with the appropriate values.
• If an SMI handler needs access to the single-step mechanism, it must insure that an SMM
accessible single-step handler is available, and then set the TF flag in the EFLAGS
register.
• If the SMI design requires the processor to respond to maskable hardware interrupts or
software-generated interrupts while in SMM, it must ensure that SMM accessible interrupt
handlers are available and then set the IF flag in the EFLAGS register (using the STI
instruction). Software interrupts are not blocked upon entry to SMM, so they do not need
to be enabled.
13-14 Vol. 3
SYSTEM MANAGEMENT
13.7 MANAGING SYNCHRONOUS AND ASYNCHRONOUS
SYSTEM MANAGEMENT INTERRUPTS
When coding for a multiprocessor system or a system with Intel HT Technology, it was not
always possible for an SMI handler to distinguish between a synchronous SMI (triggered during
an I/O instruction) and an asynchronous SMI. To facilitate the discrimination of these two
events, incremental state information has been added to the SMM state save map.
Processors that have an SMM revision ID of 30004H or higher have the incremental state information
described below.
13.7.1 I/O State Implementation
Within the extended SMM state save map, a bit (IO_SMI) is provided that is set only when an
SMI is either taken immediately after a successful I/O instruction or is taken after a successful
iteration of a REP I/O instruction (note that the successful notion pertains to the processor point
of view; not necessarily to the corresponding platform function). When set, the IO_SMI bit
provides a strong indication that the corresponding SMI was synchronous. In this case, the SMM
State Save Map also supplies the port address of the I/O operation. The IO_SMI bit and the I/O
Port Address may be used in conjunction with the information logged by the platform to confirm
that the SMI was indeed synchronous.
Note that the IO_SMI bit by itself is a strong indication, not a guarantee, that the SMI is synchronous.
This is because an asynchronous SMI might coincidentally be taken after an I/O instruction.
In such a case, the IO_SMI bit would still be set in the SMM state save map.
Information characterizing the I/O instruction is saved in two locations in the SMM State Save
Map (Table 13-4). Note that the IO_SMI bit also serves as a valid bit for the rest of the I/O information
fields. The contents of these I/O information fields are not defined when the IO_SMI bit
is not set.
Table 13-4. I/O Instruction Information in the SMM State Save Map
State (SMM Rev. ID: 30004H or
higher) Format
31 16 15 8 7 4 3 1 0
I/0 State Field
SMRAM offset 7FA4
I/O Port
Reserved
I/O Type
I/O Length
IO_SMI
31 0
I/O Memory Address Field
SMRAM offset 7FA0
I/O Memory Address
Vol. 3 13-15
SYSTEM MANAGEMENT
When IO_SMI is set, the other fields may be interpreted as follows:
• I/O length:
• 001 – Byte
• 010 – Word
• 100 – Dword
• I/O instruction type (Table 13-5)
13.8 NMI HANDLING WHILE IN SMM
NMI interrupts are blocked upon entry to the SMI handler. If an NMI request occurs during the
SMI handler, it is latched and serviced after the processor exits SMM. Only one NMI request
will be latched during the SMI handler. If an NMI request is pending when the processor
executes the RSM instruction, the NMI is serviced before the next instruction of the interrupted
code sequence. This assumes that NMIs were not blocked before the SMI occurred. If NMIs
were blocked before the SMI occurred, they are blocked after execution of RSM.
Although NMI requests are blocked when the processor enters SMM, they may be enabled
through software by executing an IRET/IRETD instruction. If the SMM handler requires the use
of NMI interrupts, it should invoke a dummy interrupt service routine for the purpose of
executing an IRET/IRETD instruction. Once an IRET/IRETD instruction is executed, NMI
interrupt requests are serviced in the same “real mode” manner in which they are handled
outside of SMM.
A special case can occur if an SMI handler nests inside an NMI handler and then another NMI
occurs. During NMI interrupt handling, NMI interrupts are disabled, so normally NMI interrupts
are serviced and completed with an IRET instruction one at a time. When the processor
enters SMM while executing an NMI handler, the processor saves the SMRAM state save map
but does not save the attribute to keep NMI interrupts disabled. Potentially, an NMI could be
latched (while in SMM or upon exit) and serviced upon exit of SMM even though the previous
Table 13-5. I/O Instruction Type Encodings
Instruction Encoding
IN Immediate 1001
IN DX 0001
OUT Immediate 1000
OUT DX 0000
INS 0011
OUTS 0010
REP INS 0111
REP OUTS 0110
13-16 Vol. 3
SYSTEM MANAGEMENT
NMI handler has still not completed. One or more NMIs could thus be nested inside the first
NMI handler. The NMI interrupt handler should take this possibility into consideration.
Also, for the Pentium processor, exceptions that invoke a trap or fault handler will enable NMI
interrupts from inside of SMM. This behavior is implementation specific for the Pentium
processor and is not part the IA-32 architecture.
13.9 SAVING THE X87 FPU STATE WHILE IN SMM
In some instances (for example prior to powering down system memory when entering a 0-volt
suspend state), it is necessary to save the state of the x87 FPU while in SMM. Care should be
taken when performing this operation to insure that relevant x87 FPU state information is not
lost. The safest way to perform this task is to place the processor in 32-bit protected mode before
saving the x87 FPU state. The reason for this is as follows.
The FSAVE instruction saves the x87 FPU context in any of four different formats, depending
on which mode the processor is in when FSAVE is executed (see Figures 8-9 through 8-12 in
the IA-32 Intel Architecture Software Developer’s Manual, Volume 1). When in SMM, by
default, the 16-bit real-address mode format is used (shown in Figure 8-12). If an SMI interrupt
occurs while the processor is in a mode other than 16-bit real-address mode, FSAVE and
FRSTOR will be unable to save and restore all the relevant x87 FPU information, and this situation
may result in a malfunction when the interrupted program is resumed. To avoid this
problem, the processor should be in 32-bit protected mode when executing the FSAVE and
FRSTOR instructions.
The following guidelines should be used when going into protected mode from an SMI handler
to save and restore the x87 FPU state:
• Use the CPUID instruction to insure that the processor contains an x87 FPU.
• Create a 32-bit code segment in SMRAM space that contains procedures or routines to
save and restore the x87 FPU using the FSAVE and FRSTOR instructions, respectively. A
GDT with an appropriate code-segment descriptor (D bit is set to 1) for the 32-bit code
segment must also be placed in SMRAM.
• Write a procedure or routine that can be called by the SMI handler to save and restore the
x87 FPU state. This procedure should do the following:
— Place the processor in 32-bit protected mode as describe in Section 9.9.1, “Switching
to Protected Mode”.
— Execute a far JMP to the 32-bit code segment that contains the x87 FPU save and
restore procedures.
— Place the processor back in 16-bit real-address mode before returning to the SMI
handler (see Section 9.9.2, “Switching Back to Real-Address Mode”).
The SMI handler may continue to execute in protected mode after the x87 FPU state has been
saved and return safely to the interrupted program from protected mode. However, it is recommended
that the handler execute primarily in 16- or 32-bit real-address mode.
Vol. 3 13-17
SYSTEM MANAGEMENT
13.10 SMM REVISION IDENTIFIER
The SMM revision identifier field is used to indicate the version of SMM and the SMM extensions
that are supported by the processor (see Figure 13-2). The SMM revision identifier is
written during SMM entry and can be examined in SMRAM space at offset 7EFCH. The
lower word of the SMM revision identifier refers to the version of the base SMM architecture.
The upper word of the SMM revision identifier refers to the extensions available. If the I/O
instruction restart flag (bit 16) is set, the processor supports the I/O instruction restart (see
Section 13.13, “I/O Instruction Restart”); if the SMBASE relocation flag (bit 17) is set,
SMRAM base address relocation is supported (see Section 13.12, “SMBASE Relocation”).
13.11 AUTO HALT RESTART
If the processor is in a HALT state (due to the prior execution of a HLT instruction) when it
receives an SMI, the processor records the fact in the auto HALT restart flag in the saved
processor state (see Figure 13-3). (This flag is located at offset 7F02H and bit 0 in the state save
area of the SMRAM.)
If the processor sets the auto HALT restart flag upon entering SMM (indicating that the SMI
occurred when the processor was in the HALT state), the SMI handler has two options:
• It can leave the auto HALT restart flag set, which instructs the RSM instruction to return
program control to the HLT instruction. This option in effect causes the processor to reenter
the HALT state after handling the SMI. (This is the default operation.)
• It can clear the auto HALT restart flag, with instructs the RSM instruction to return
program control to the instruction following the HLT instruction.
Figure 13-2. SMM Revision Identifier
SMM Revision Identifier
I/O Instruction Restart
SMBASE Relocation
Register Offset
7EFCH
31 0
Reserved
18 17 16 15
13-18 Vol. 3
SYSTEM MANAGEMENT
These options are summarized in Table 13-6. Note that if the processor was not in a HALT state
when the SMI was received (the auto HALT restart flag is cleared), setting the flag to 1 will
cause unpredictable behavior when the RSM instruction is executed.
If the HLT instruction is restarted, the processor will generate a memory access to fetch the HLT
instruction (if it is not in the internal cache), and execute a HLT bus transaction. This behavior
results in multiple HLT bus transactions for the same HLT instruction.
13.11.1 Executing the HLT Instruction in SMM
The HLT instruction should not be executed during SMM, unless interrupts have been enabled
by setting the IF flag in the EFLAGS register. If the processor is halted in SMM, the only event
that can remove the processor from this state is a maskable hardware interrupt or a hardware
reset.
13.12 SMBASE RELOCATION
The default base address for the SMRAM is 30000H. This value is contained in an internal
processor register called the SMBASE register. The operating system or executive can relocate
the SMRAM by setting the SMBASE field in the saved state map (at offset 7EF8H) to a new
value (see Figure 13-4). The RSM instruction reloads the internal SMBASE register with the
value in the SMBASE field each time it exits SMM. All subsequent SMI requests will use the
Figure 13-3. Auto HALT Restart Field
Table 13-6. Auto HALT Restart Flag Values
Value of Flag After
Entry to SMM
Value of Flag When
Exiting SMM Action of Processor When Exiting SMM
0011
0101
Returns to next instruction in interrupted program or task
Unpredictable
Returns to next instruction after HLT instruction
Returns to HALT state
Auto HALT Restart
15 0
Reserved Register Offset
7F02H
1
Vol. 3 13-19
SYSTEM MANAGEMENT
new SMBASE value to find the starting address for the SMI handler (at SMBASE + 8000H) and
the SMRAM state save area (from SMBASE + FE00H to SMBASE + FFFFH). (The processor
resets the value in its internal SMBASE register to 30000H on a RESET, but does not change it
on an INIT.)
In multiple-processor systems, initialization software must adjust the SMBASE value for each
processor so that the SMRAM state save areas for each processor do not overlap. (For Pentium
and Intel486 processors, the SMBASE values must be aligned on a 32-KByte boundary or the
processor will enter shutdown state during the execution of a RSM instruction.)
If the SMBASE relocation flag in the SMM revision identifier field is set, it indicates the ability
to relocate the SMBASE (see Section 13.10, “SMM Revision Identifier”).
13.12.1 Relocating SMRAM to an Address Above 1 MByte
In SMM, the segment base registers can only be updated by changing the value in the segment
registers. The segment registers contain only 16 bits, which allows only 20 bits to be used for a
segment base address (the segment register is shifted left 4 bits to determine the segment base
address). If SMRAM is relocated to an address above 1 MByte, software operating in realaddress
mode can no longer initialize the segment registers to point to the SMRAM base address
(SMBASE).
The SMRAM can still be accessed by using 32-bit address-size override prefixes to generate an
offset to the correct address. For example, if the SMBASE has been relocated to FFFFFFH
(immediately below the 16-MByte boundary) and the DS, ES, FS, and GS registers are still
initialized to 0H, data in SMRAM can be accessed by using 32-bit displacement registers, as in
the following example:
mov esi,00FFxxxxH; 64K segment immediately below 16M
mov ax,ds:[esi]
A stack located above the 1-MByte boundary can be accessed in the same manner.
13.13 I/O INSTRUCTION RESTART
If the I/O instruction restart flag in the SMM revision identifier field is set (see Section 13.10,
“SMM Revision Identifier”), the I/O instruction restart mechanism is present on the processor.
This mechanism allows an interrupted I/O instruction to be re-executed upon returning from
SMM mode. For example, if an I/O instruction is used to access a powered-down I/O device, a
Figure 13-4. SMBASE Relocation Field
31 0
SMM Base Register Offset
7EF8H
13-20 Vol. 3
SYSTEM MANAGEMENT
chip set supporting this device can intercept the access and respond by asserting SMI#. This
action invokes the SMI handler to power-up the device. Upon returning from the SMI handler,
the I/O instruction restart mechanism can be used to re-execute the I/O instruction that caused
the SMI.
The I/O instruction restart field (at offset 7F00H in the SMM state-save area, see Figure 13-5)
controls I/O instruction restart. When an RSM instruction is executed, if this field contains the
value FFH, then the EIP register is modified to point to the I/O instruction that received the SMI
request. The processor will then automatically re-execute the I/O instruction that the SMI
trapped. (The processor saves the necessary machine state to insure that re-execution of the
instruction is handled coherently.)
If the I/O instruction restart field contains the value 00H when the RSM instruction is executed,
then the processor begins program execution with the instruction following the I/O instruction.
(When a repeat prefix is being used, the next instruction may be the next I/O instruction in the
repeat loop.) Not re-executing the interrupted I/O instruction is the default behavior; the
processor automatically initializes the I/O instruction restart field to 00H upon entering SMM.
Table 13-7 summarizes the states of the I/O instruction restart field.
Note that the I/O instruction restart mechanism does not indicate the cause of the SMI. It is the
responsibility of the SMI handler to examine the state of the processor to determine the cause of
the SMI and to determine if an I/O instruction was interrupted and should be restarted upon
exiting SMM. If an SMI interrupt is signaled on a non-I/O instruction boundary, setting the I/O
instruction restart field to FFH prior to executing the RSM instruction will likely result in a
program error.
13.13.1 Back-to-Back SMI Interrupts When I/O Instruction Restart
Is Being Used
If an SMI interrupt is signaled while the processor is servicing an SMI interrupt that occurred
on an I/O instruction boundary, the processor will service the new SMI request before restarting
the originally interrupted I/O instruction. If the I/O instruction restart field is set to FFH prior to
Figure 13-5. I/O Instruction Restart Field
Table 13-7. I/O Instruction Restart Field Values
Value of Flag After
Entry to SMM
Value of Flag When
Exiting SMM Action of Processor When Exiting SMM
00H
00H
00H
FFH
Does not re-execute trapped I/O instruction.
Re-executes trapped I/O instruction.
15 0
I/O Instruction Restart Field Register Offset
7F00H
Vol. 3 13-21
SYSTEM MANAGEMENT
returning from the second SMI handler, the EIP will point to an address different from the originally
interrupted I/O instruction, which will likely lead to a program error. To avoid this situation,
the SMI handler must be able to recognize the occurrence of back-to-back SMI interrupts
when I/O instruction restart is being used and insure that the handler sets the I/O instruction
restart field to 00H prior to returning from the second invocation of the SMI handler.
13.14 SMM MULTIPLE-PROCESSOR CONSIDERATIONS
The following should be noted when designing multiple-processor systems:
• Any processor in a multiprocessor system can respond to an SMM.
• Each processor needs its own SMRAM space. This space can be in system memory or in a
separate RAM.
• The SMRAMs for different processors can be overlapped in the same memory space. The
only stipulation is that each processor needs its own state save area and its own dynamic
data storage area. (Also, for the Pentium and Intel486 processors, the SMBASE address
must be located on a 32-KByte boundary.) Code and static data can be shared among
processors. Overlapping SMRAM spaces can be done more efficiently with the P6 family
processors because they do not require that the SMBASE address be on a 32-KByte
boundary.
• The SMI handler will need to initialize the SMBASE for each processor.
• Processors can respond to local SMIs through their SMI# pins or to SMIs received through
the APIC interface. The APIC interface can distribute SMIs to different processors.
• Two or more processors can be executing in SMM at the same time.
• When operating Pentium processors in dual processing (DP) mode, the SMIACT# pin is
driven only by the MRM processor and should be sampled with ADS#. For additional
details, see Chapter 14 of the Pentium Processor Family User’s Manual, Volume 1.
SMM is not re-entrant, because the SMRAM State Save Map is fixed relative to the SMBASE.
If there is a need to support two or more processors in SMM mode at the same time then each
processor should have dedicated SMRAM spaces. This can be done by using the SMBASE
Relocation feature (see Section 13.12, “SMBASE Relocation”).
13.15 ENHANCED INTEL SPEEDSTEP® TECHNOLOGY
Enhanced Intel SpeedStep® Technology was first introduced in the Pentium M processor and is
also available in Pentium 4 and Xeon processors. It can manage processor power consumption
efficiently via performance state transitions. Processor performance states are defined as
discrete operating points associated with different frequencies.
13-22 Vol. 3
SYSTEM MANAGEMENT
Enhanced Intel SpeedStep Technology differs from previous generations of Intel SpeedStep
Technology in two basic ways:
• Centralization of the control mechanism and software interface in the processor by using
model-specific registers.
• Reduced hardware overhead; this permits more frequent performance state transitions.
Previous generations of the Intel SpeedStep Technology require processors to be a deep sleep
state, holding off bus master transfers for the duration of a performance state transition. Performance
state transitions under the Enhanced Intel SpeedStep Technology are discrete transitions
to a new target frequency.
Support is indicated by CPUID, using ECX feature bit 07. Enhanced Intel SpeedStep Technology
is enabled by setting IA32_MISC_ENABLE MSR, bit 16. On reset, bit 16 of
IA32_MISC_ENABLE MSR is cleared.
13.15.1 Software Interface For Initiating Performance State
Transitions
State transitions are initiated by writing a 16-bit value to the IA32_PERF_CTL register. If a transition
is already in progress, transition to a new value will take effect subsequently.
Reads of IA32_PERF_CTL determine the last targeted operating point. The current operating
point can be read from IA32_PERF_STATUS. IA32_PERF_STATUS is updated dynamically.
The 16-bit encoding that defines valid operating points is model-specific. Applications and
performance tools are not expected to use either IA32_PERF_CTL or IA32_PERF_STATUS
and should treat both as reserved. Performance monitoring tools can access model-specific
events and report the occurrences of state transitions.
13.16 THERMAL MONITORING AND PROTECTION
The IA-32 architecture provides three mechanisms for monitoring temperature and controlling
power consumption of an IA-32 processor:
1. A catastrophic shutdown detector that forces processor execution to stop if the
processor’s core temperature rises above a preset limit.
2. An automatic thermal monitoring mechanism that forces the processor to reduce it’s
power consumption in order to maintain a predetermined temperature limit.
3. A software controlled clock modulation mechanism that permits operating system to
implement a power management policy to reduce the power consumption of an IA-32
processor; this is in addition to the reduction offered by the automatic thermal monitoring
mechanism.
The first mechanism is not visible to software. The other two mechanisms are visible to software
using processor feature information returned by executing CPUID with EAX = 1.
Vol. 3 13-23
SYSTEM MANAGEMENT
The second mechanism, automatic thermal monitoring, provides two modes of operation. One
mode modulates the clock duty cycle; the second mode changes the processor’s frequency. Both
modes are used to control the core temperature of the processor.
The third mechanism modulates the clock duty cycle of the processor. As shown in Figure 13-6,
the phrase ‘duty cycle’ does not refer to the actual duty cycle of the clock signal. Instead it refers
to the time period during which the clock signal is allowed to drive the processor chip. By using
the stop clock mechanism to control how often the processor is clocked, processor power
consumption can be modulated.
13.16.1 Catastrophic Shutdown Detector
P6 family processors introduced a thermal sensor that acts as a catastrophic shutdown detector.
This catastrophic shutdown detector was also implemented in Pentium 4, Intel Xeon and
Pentium M processors. It is always enabled. When processor core temperature reaches a factory
preset level, the sensor trips and processor execution is halted until after the next reset cycle.
13.16.2 Thermal Monitor
Pentium 4, Intel Xeon and Pentium M processors introduced a second temperature sensor that
is factory-calibrated to trip when the processor’s core temperature crosses a level corresponding
to the recommended thermal design envelop. The trip-temperature of the second sensor is calibrated
below the temperature assigned to the catastrophic shutdown detector.
13.16.2.1 Thermal Monitor 1
The Pentium 4 processor uses the second temperature sensor in conjunction with a mechanism
called TM1 (Thermal Monitor 1) to control the core temperature of the processor. TM1 controls
the processor’s temperature by modulating the duty cycle of the processor clock. Modulation of
duty cycles is processor model specific. Note that the processors STPCLK# pin is not used here;
the stop-clock circuitry is controlled internally.
Support for TM1 is indicated by CPUID.1:EDX.TM[bit 29] = 1.
Figure 13-6. Processor Modulation Through Stop-Clock Mechanism
Clock Applied to Processor
Stop-Clock Duty Cycle
25% Duty Cycle (example only)
13-24 Vol. 3
SYSTEM MANAGEMENT
TM1 is enabled by setting the thermal-monitor enable flag (bit 3) in IA32_MISC_ENABLE [see
Appendix B, Model-Specific Registers (MSRs)]. Following a power-up or reset, the flag is
cleared, disabling TM1. BIOS is required to enable only one automatic thermal monitoring
modes. Operating systems and applications must not disable the operation of these mechanisms.
13.16.2.2 Thermal Monitor 2
An additional automatic thermal protection mechanism, called Thermal Monitor 2 (TM2), was
introduced in the Intel Pentium M processor and also incorporated in newer models of the
Pentium 4 processor family. TM2 controls the core temperature of the processor by reducing the
operating frequency and voltage of the processor and offers a higher performance level for a
given level of power reduction than TM1.
TM2 is triggered by the same temperature sensor as TM1. The mechanism to enable TM2 may
be implemented differently across various IA-32 processor families with different CPUID
signatures in the family encoding value, but will be uniform within an IA-32 processor family.
Support for TM2 is indicated by CPUID.1:ECX.TM2[bit 8] = 1.
On Pentium M processors: TM2 is enabled if the TM_SELECT flag (bit 16) of the
MSR_THERM2_CTL register is set to 1 and bit 3 of the IA32_MISC_ENABLE register is set
to 1.
Following a power-up or reset, the TM_SELECT flag is cleared. BIOS is required to enable
either TM1 or TM2. Operating systems and applications must not disable the mechanisms that
enable TM1 or TM2. If bit 3 of the IA32_MISC_ENABLE register is set and TM_SELECT flag
of the MSR_THERM2_CTL register is cleared, TM1 is enabled.
On Pentium 4 processors: support for TM2 is also reported using ECX bit 8 of the CPUID
instruction, but the interface to enable TM2 is slightly different. For a Pentium 4 processor that
supports TM2, TM2 is enable by setting bit 13 of IA32_MISC_ENABLE register to 1.
The target operating frequency and voltage for the TM2 transition after TM2 is triggered is specified
by the value written to MSR_THERM2_CTL, bits 15:0. Following a power-up or reset,
BIOS is required to enable at least one of these two thermal monitoring mechanisms. If both
TM1 and TM2 are supported, BIOS may choose to enable TM2 instead of TM1. Operating
systems and applications must not disable the mechanisms that enable TM1or TM2; and they
must not alter the value in bits 15:0 of the MSR_THERM2_CTL register.
Figure 13-7. MSR_THERM2_CTL Register for the Pentium M Processor
TM_SELECT
Reserved
31 0
Reserved
16
Vol. 3 13-25
SYSTEM MANAGEMENT
13.16.2.3 Performance State Transitions and Thermal Monitoring
If the thermal control circuitry (TCC) for thermal monitor (TM1/TM2) is active, writes to the
IA32_PERF_CTL will effect a new target operating point as follows:
• If TM1 is enabled and the TCC is engaged, the performance state transition can commence
before the TCC is disengaged.
• If TM2 is enabled and the TCC is engaged, the performance state transition specified by a
write to the IA32_PERF_CTL will commence after the TCC has disengaged.
13.16.2.4 Thermal Status Information
The status of the temperature sensor that triggers the thermal monitor (TM1/TM2) is indicated
through the thermal status flag and thermal status log flag in the IA32_THERM_STATUS MSR
(see Figure 13-9).
The functions of these flags are:
• Thermal Status flag, bit 0 — When set, indicates that the processor core temperature is
currently at the trip temperature of the thermal monitor and that the processor power
consumption is being reduced via either TM1 or TM2, depending on which is enabled.
When clear, the flag indicates that the core temperature is below the thermal monitor trip
temperature. This flag is read only.
• Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has tripped
since the last power-up or reset or since the last time that software cleared this flag. This
flag is a sticky bit; once set it remains set until cleared by software or until a power-up or
reset of the processor. The default state is clear.
Figure 13-8. MSR_THERM2_CTL Register for the Pentium 4 Processor Supporting TM2
Figure 13-9. IA32_THERM_STATUS MSR
63 0
Reserved
15
TM2 Transition Target
63 0
Reserved
2 1
Thermal Status
Thermal Status Log
13-26 Vol. 3
SYSTEM MANAGEMENT
After the second temperature sensor has been tripped, the thermal monitor (TM1/TM2) will
remain engaged for a minimum time period (on the order of 1 ms). The thermal monitor will
remain engaged until the processor core temperature drops below the preset trip temperature of
the temperature sensor, taking hysteresis into account.
While the processor is in a stop-clock state, interrupts will be blocked from interrupting the
processor. This holding off of interrupts increases the interrupt latency, but does not cause interrupts
to be lost. Outstanding interrupts remain pending until clock modulation is complete.
The thermal monitor can be programmed to generate an interrupt to the processor when the
thermal sensor is tripped. The delivery mode, mask and vector for this interrupt can be
programmed through the thermal entry in the local APIC’s LVT (see Section 8.5.1, “Local Vector
Table”). The low-temperature interrupt enable and high-temperature interrupt enable flags in the
IA32_THERM_INTERRUPT MSR (see Figure 13-10) control when the interrupt is generated;
that is, on a transition from a temperature below the trip point to above and/or vice-versa.
• High-Temperature Interrupt Enable flag, bit 0 — Enables an interrupt to be generated
on the transition from a low-temperature to a high-temperature when set; disables the
interrupt when clear.(R/W).
• Low-Temperature Interrupt Enable flag, bit 1 — Enables an interrupt to be generated
on the transition from a high-temperature to a low-temperature when set; disables the
interrupt when clear.
The thermal monitor interrupt can be masked by the thermal LVT entry. After a power-up or
reset, the low-temperature interrupt enable and high-temperature interrupt enable flags in the
IA32_THERM_INTERRUPT MSR are cleared (interrupts are disabled) and the thermal LVT
entry is set to mask interrupts. This interrupt should be handled either by the operating system
or system management mode (SMM) code.
Note that the operation of the thermal monitoring mechanism has no effect upon the clock rate
of the processor's internal high-resolution timer (time stamp counter).
Figure 13-10. IA32_THERM_INTERRUPT MSR
63 0
Reserved
2 1
High-Temperature Interrupt Enable
Low-Temperature Interrupt Enable
Vol. 3 13-27
SYSTEM MANAGEMENT
13.16.3 Software Controlled Clock Modulation
Pentium 4, Intel Xeon and Pentium M processors also support software-controlled clock modulation.
This provides a means for operating systems to implement a power management policy
to reduce the power consumption of the processor. Here, the stop-clock duty cycle is controlled
by software through the IA32_CLOCK_MODULATIONMSR (see Figure 13-11).
The IA32_CLOCK_MODULATION MSR contains the following flag and field used to enable
software-controlled clock modulation and to select the clock modulation duty cycle:
• On-Demand Clock Modulation Enable, bit 4 — Enables on-demand software controlled
clock modulation when set; disables software-controlled clock modulation when clear.
• On-Demand Clock Modulation Duty Cycle, bits 1 through 3 — Selects the on-demand
clock modulation duty cycle (see Table 13-8). This field is only active when the ondemand
clock modulation enable flag is set.
Note that the on-demand clock modulation mechanism (like the thermal monitor) controls the
processor’s stop-clock circuitry internally to modulate the clock signal. The STPCLK# pin is not
used in this mechanism.
Figure 13-11. IA32_CLOCK_MODULATION MSR
Table 13-8. On-Demand Clock Modulation Duty Cycle Field Encoding
Duty Cycle Field Encoding Duty Cycle
000B Reserved
001B 12.5% (Default)
010B 25.0%
011B 37.5%
100B 50.0%
101B 63.5%
110B 75%
111B 87.5%
63 0
Reserved
3 1
On-Demand Clock Modulation Duty Cycle
On-Demand Clock Modulation Enable
5 4
Reserved
13-28 Vol. 3
SYSTEM MANAGEMENT
The on-demand clock modulation mechanism can be used to control processor power consumption.
Power management software can write to the IA32_CLOCK_MODULATION MSR to
enable clock modulation and to select a modulation duty cycle. If on-demand clock modulation
and TM1 are both enabled and the thermal status of the processor is hot (bit 0 of the
IA32_THERM_STATUS MSR is set), clock modulation at the duty cycle specified by TM1
takes precedence, regardless of the setting of the on-demand clock modulation duty cycle.
For Hyper-Threading Technology enabled processors, the IA32_CLOCK_MODULATION
register is duplicated for each logical processor. In order for the On-demand clock modulation
feature to work properly, the feature must be enabled on all the logical processors within a physical
processor. If the programmed duty cycle is not identical for all the logical processors, the
processor clock will modulate to the highest duty cycle programmed.
For the P6 family processors, on-demand clock modulation was implemented through the
chipset, which controlled clock modulation through the processor’s STPCLK# pin.
13.16.4 Detection of Thermal Monitor and Software Controlled
Clock Modulation Facilities
The ACPI flag (bit 22) of the CPUID feature flags indicates the presence of the
IA32_THERM_STATUS, IA32_THERM_INTERRUPT, IA32_CLOCK_MODULATION
MSRs, and the xAPIC thermal LVT entry.
The TM1 flag (bit 29) of the CPUID feature flags indicates the presence of the automatic thermal
monitoring facilities that modulate clock duty cycles.

CHAPTER 12 SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING

12 SSE, SSE2 and SSE3 System Programming
Vol. 3 12-1
CHAPTER 12 SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
This chapter describes features of the streaming SIMD extensions (SSE), streaming SIMD
extensions 2 (SSE2) and streaming SIMD extensions 3 (SSE3) that must be considered when
designing or enhancing an operating system to support the Pentium III, Pentium 4, and Intel
Xeon processors. It covers enabling SSE/SSE2/SSE3 extensions, providing operating system or
executive support for the SSE/SSE2/SSE3 extensions, SIMD floating-point exceptions, exception
handling, and task (context) switching considerations.
12.1 PROVIDING OPERATING SYSTEM SUPPORT FOR
SSE/SSE2/SSE3 EXTENSIONS
To use SSE/SSE2/SSE3 extensions, the operating system or executive must provide support for
initializing the processor to use the extensions, for handling the FXSAVE and FXRSTOR state
saving instructions, and for handling SIMD floating-point exceptions. The following sections
give some guidelines for providing this support in an operating-system or executive. Because
SSE/SSE2/SSE3 extensions share the same state and perform companion operations, these
guidelines apply to all three sets of extensions.
Chapter 11, Programming with the Streaming SIMD Extensions 2 (SSE2) and Chapter 12,
Programming with the Streaming SIMD Extensions 3 (SSE3) in the IA-32 Intel Architecture Software
Developer’s Manual, Volume 1 discusses support for SSE/SSE2/SSE3 extensions from the
point of view of an applications program.
12.1.1 Adding Support to an Operating System for
SSE/SSE2/SSE3 Extensions
The following guidelines describe operations that an operating system or executive must
perform to support SSE/SSE2/SSE3 extensions:
1. Check that the processor supports the SSE/SSE2/SSE3 extensions.
2. Check that the processor supports the FXSAVE and FXRESTOR instructions.
3. Provide an initialization for the SSE, SSE2 and SSE3 states.
4. Provide support for the FXSAVE and FXRSTOR instructions.
5. Provide support (if necessary) in non-numeric exception handlers for exceptions generated
by the SSE and SSE2 instructions.
6. Provide an exception handler for the SIMD floating-point exception (#XF).
The following sections describe how to implement each of these guidelines.
12-2 Vol. 3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
12.1.2 Checking for SSE/SSE2/SSE3 Extension Support
If the processor attempts to execute an unsupported SSE/SSE2/SSE3 instruction, the processor
will generate an invalid-opcode exception (#UD).
Before an operating system or executive attempts to use SSE/SSE2/SSE3 extensions, it should
check that support is present on the processor. To make this check, execute CPUID with an argument
of 1 in the EAX register. Make sure:
• CPUID.1:EDX.SSE[bit 25] = 1
• CPUID.1:EDX.SSE2[bit 26] = 1
• CPUID.1:ECX.SSE3[bit 0] = 1
12.1.3 Checking for Support for the FXSAVE and FXRSTOR
Instructions
A separate check must be made to insure that the processor supports FXSAVE and FXRSTOR.
To make this check, execute CPUID with an argument of 1 in the EAX register. Make sure:
• CPUID.1:EDX.FXSR[bit 24] = 1
12.1.4 Initialization of the SSE/SSE2/SSE3 Extensions
The operating system or executive should carry out the following steps to set up
SSE/SSE2/SSE3 extensions for use by application programs:
1. Set CR4.OSFXSR[bit 9] = 1. Setting this flag assumes that the operating system provides
facilities for saving and restoring SSE/SSE2/SSE3 states using FXSAVE and FXRSTOR
instructions. These instructions are commonly used to save the SSE/SSE2/SSE3 state
during task switches and when invoking the SIMD floating-point exception (#XF) handler
(see Section 12.4, “Saving the SSE/SSE2/SSE3 State on Task or Context Switches” and
Section 12.1.6, “Providing an Handler for the SIMD Floating-Point Exception (#XF)”,
respectively).
If the processor does not support the FXSAVE and FXRSTOR instructions, attempting to
set the OSFXSR flag will cause an exception (#GP) to be generated.
2. Set CR4.OSXMMEXCPT[bit 10] = 1. Setting this flag assumes that the operating system
provides an SIMD floating-point exception (#XF) handler (see Section 12.1.6, “Providing
an Handler for the SIMD Floating-Point Exception (#XF)”).
Vol. 3 12-3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
NOTE
The OSFXSR and OSXMMEXCPT bits in control register CR4 must be set
by the operating system. The processor has no other way of detecting
operating-system support for the FXSAVE and FXRSTOR instructions or for
handling SIMD floating-point exceptions.
3. Clear CR0.EM[bit 2] = 0. This action disables emulation of the x87 FPU, which is required
when executing SSE/SSE2/SSE3 instructions (see Section 2.5, “Control Registers”).
4. Clear CR0.MP[bit 1] = 0. This setting is the required setting for all IA-32 processors that
support the SSE/SSE2/SSE3 extensions (see Section 9.2.1, “Configuring the x87 FPU
Environment”).
Table 12-1 shows the actions of the processor when an SSE/SSE2/SSE3 instruction is executed,
depending on the:
• OSFXSR and OSXMMEXCPT flags in control register CR4
• SSE/SSE2/SSE3 feature flags returned by CPUID
• EM, MP, and TS flags in control register CR0
Table 12-1. Action Taken for Combinations of OSFXSR, OSXMMEXCPT, SSE, SSE2,
SSE3, EM, MP, and TS1
CR4 CPUID CR0 Flags
OSFXSR OSXMMEXCPT
SSE,
SSE2,
SSE3 EM MP2 TS Action
0 X3 X X 1 X #UD exception.
1 X 0 X 1 X #UD exception.
1 X 1 1 1 X #UD exception.
1 0 1 0 1 0 Execute instruction; #UD exception if
unmasked SIMD floating-point exception
is detected.
1 1 1 0 1 0 Execute instruction; #XF exception if
unmasked SIMD floating-point exception
is detected.
1 X 1 0 1 1 #NM exception.
NOTES:
1. For execution of any SSE/SSE2/SSE3 instruction except the PAUSE, PREFETCHh, SFENCE,
LFENCE, MFENCE, MOVNTI, and CLFLUSH instructions.
2. For processors that support the MMX instructions, the MP flag should be set.
3. X — Don’t care.
12-4 Vol. 3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
The SIMD floating-point exception mask bits (bits 7 through 12), the flush-to-zero flag (bit 15),
the denormals-are-zero flag (bit 6), and the rounding control field (bits 13 and 14) in the
MXCSR register should be left in their default values of 0. This permits the application to determine
how these features are to be used.
12.1.5 Providing Non-Numeric Exception Handlers for
Exceptions Generated by the SSE/SSE2/SSE3 Instructions
SSE/SSE2/SSE3 instructions can generate the same type of memory access exceptions (such as,
page fault, segment not present, and limit violations) and other non-numeric exceptions as other
IA-32 architecture instructions generate.
Ordinarily, existing exception handlers can handle these and other non-numeric exceptions
without code modification. However, depending on the mechanisms used in existing exception
handlers, some modifications might need to be made.
The SSE/SSE2/SSE3 extensions can generate the non-numeric exceptions listed below:
• Memory Access Exceptions:
— Invalid opcode (#UD).
— Stack-segment fault (#SS).
— General protection (#GP). Executing most SSE/SSE2/SSE3 instructions with an
unaligned 128-bit memory reference generates a general-protection exception. (The
MOVUPS and MOVUPD instructions allow unaligned a loads or stores of 128-bit
memory locations, without generating a general-protection exception.) A 128-bit
reference within the stack segment that is not aligned to a 16-byte boundary will also
generate a general-protection exception, instead a stack-segment fault exception
(#SS).
— Page fault (#PF).
— Alignment check (#AC). When enabled, this type of alignment check operates on
operands that are less than 128-bits in size: 16-bit, 32-bit, and 64-bit. To enable the
generation of alignment check exceptions, do the following:
• Set the AM flag (bit 18 of control register CR0)
• Set the AC flag (bit 18 of the EFLAGS register)
• CPL must be 3.
If alignment check exceptions are enabled, 16-bit, 32-bit, and 64-bit misalignment will
be detected for the MOVUPD and MOVUPS instructions; detection of 128-bit
misalignment is not guaranteed and may vary with implementation.
• System Exceptions:
— Invalid-opcode exception (#UD). This exception is generated when executing
SSE/SSE2/SSE3 instructions under the following conditions:
Vol. 3 12-5
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
• SSE/SSE2/SSE3 feature flags returned by CPUID are set to 0. This condition does
not affect the CLFLUSH instruction.
• The CLFSH feature flag returned by the CPUID instruction is set to 0. This
exception condition only pertains to the execution of the CLFLUSH instruction.
• The EM flag (bit 2) in control register CR0 is set to 1, regardless of the value of
TS flag (bit 3) of CR0. This condition does not affect the PAUSE, PREFETCHh,
MOVNTI, SFENCE, LFENCE, MFENSE, and CLFLUSH instructions.
• The OSFXSR flag (bit 9) in control register CR4 is set to 0. This condition does
not affect the PAVGB, PAVGW, PEXTRW, PINSRW, PMAXSW, PMAXUB,
PMINSW, PMINUB, PMOVMSKB, PMULHUW, PSADBW, PSHUFW,
MASKMOVQ, MOVNTQ, MOVNTI, PAUSE, PREFETCHh, SFENCE,
LFENCE, MFENCE, and CLFLUSH instructions.
• Executing a instruction that causes a SIMD floating-point exception when the
OSXMMEXCPT flag (bit 10) in control register CR4 is set to 0. See Section
12.5.1., “Using the TS Flag to Control the Saving of the x87 FPU, MMX, SSE,
SSE2 and SSE3 State”
— Device not available (#NM). This exception is generated by executing a
SSE/SSE2/SSE3 instruction when the TS flag (bit 3) of CR0 is set to 1.
Other exceptions can occur indirectly due to faulty execution of the above exceptions.
12.1.6 Providing an Handler for the SIMD Floating-Point
Exception (#XF)
SSE/SSE2/SSE3 instructions do not generate numeric exceptions on packed integer operations.
They can generate the following numeric (SIMD floating-point) exceptions on packed and
scalar single-precision and double-precision floating-point operations.
• Invalid operation (#I)
• Divide-by-zero (#Z)
• Denormal operand (#D)
• Numeric overflow (#O)
• Numeric underflow (#U)
• Inexact result (Precision) (#P)
These SIMD floating-point exceptions (with the exception of the denormal operand exception)
are defined in the IEEE Standard 754 for Binary Floating-Point Arithmetic and represent the
same conditions that cause x87 FPU floating-point error exceptions (#MF) to be generated for
x87 FPU instructions.
Each of these exceptions can be masked, in which case the processor returns a reasonable result
to the destination operand without invoking an exception handler. However, if any of these
12-6 Vol. 3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
exceptions are left unmasked, detection of the exception condition results in a SIMD floatingpoint
exception (#XF) being generated. See Chapter 5, “Interrupt 19—SIMD Floating-Point
Exception (#XF)”.
To handle unmasked SIMD floating-point exceptions, the operating system or executive must
provide an exception handler. The section titled “SSE and SSE2 SIMD Floating-Point Exceptions”
in Chapter 11 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1,
describes the SIMD floating-point exception classes and gives suggestions for writing an exception
handler to handle them.
To indicate that the operating system provides a handler for SIMD floating-point exceptions
(#XF), the OSXMMEXCPT flag (bit 10) must be set in control register CR0.
12.1.6.1 Numeric Error flag and IGNNE#
SSE/SSE2/SSE3 extensions ignore the NE flag in control register CR0 (that is, treats it as if it
were always set) and the IGNNE# pin. When an unmasked SIMD floating-point exception is
detected, it is always reported by generating a SIMD floating-point exception (#XF).
12.2 EMULATION OF SSE/SSE2/SSE3 EXTENSIONS
The IA-32 architecture does not support emulation of the SSE/SSE2/SSE3 instructions, as it
does for x87 FPU instructions. The EM flag in control register CR0 (provided to invoke emulation
of x87 FPU instructions) cannot be used to invoke emulation of SSE/SSE2/SSE3 instructions.
If an SSE/SSE2/SSE3 instruction is executed when the EM flag is set, an invalid opcode
exception (#UD) is generated (see Table 12-1).
12.3 SAVING AND RESTORING THE SSE/SSE2/SSE3 STATE
The SSE/SSE2/SSE3 state consists of the state of the XMM and MXCSR registers. The recommended
method of saving and restoring this state follows:
• Execute an FXSAVE instruction to save the state of the XMM and MXCSR registers to
memory.
• Execute an FXRSTOR instruction to restore the state of the XMM and MXCSR registers
from the image saved in memory by the FXSAVE instruction.
This save and restore method is required for operating systems (see Section 12.5, “Designing
OS Facilities for AUTOMATICALLY Saving x87 FPU, MMX, and SSE/SSE2/SSE3 state on
Task or Context Switches”).
In some cases, applications can only save the XMM and MXCSR registers in the following way:
• Execute eight MOVDQ instructions to save the contents of the XMM0 through XMM7
registers to memory.
• Execute a STMXCSR instruction to save the state of the MXCSR register to memory.
Vol. 3 12-7
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
In some cases, applications can only restore the XMM and MXCSR registers in the following
way:
• Execute eight MOVDQ instructions to read the saved contents of XMM registers from
memory into the XMM0 through XMM7 registers.
• Execute a LDMXCSR instruction to restore the state of the MXCSR register from memory.
12.4 SAVING THE SSE/SSE2/SSE3 STATE ON TASK
OR CONTEXT SWITCHES
When switching from one task or context to another, it is often necessary to save the
SSE/SSE2/SSE3 state. The FXSAVE and FXRSTOR instructions provide a simple method for
saving and restoring this state (as described in Section 12.3, “Saving and Restoring the
SSE/SSE2/SSE3 State”). These instructions offer the added benefit of saving the x87 FPU and
MMX state as well. Guidelines for writing such procedures are in Section 12.5, “Designing OS
Facilities for AUTOMATICALLY Saving x87 FPU, MMX, and SSE/SSE2/SSE3 state on Task
or Context Switches”.
12.5 DESIGNING OS FACILITIES FOR AUTOMATICALLY SAVING
X87 FPU, MMX, AND SSE/SSE2/SSE3 STATE ON TASK OR
CONTEXT SWITCHES
The x87 FPU/MMX/SSE/SSE2/SSE3 state consists of the state of the x87 FPU, MMX, XMM,
and MXCSR registers. The FXSAVE and FXRSTOR instructions provide a fast method of
saving ad restoring this state. If task or context switching facilities are already implemented in
an operating system or executive and they use FSAVE/FNSAVE and FRSTOR to save the x87
FPU and MMX state, these facilities can also be extended to save and restore the
SSE/SSE2/SSE3 state by substituting FXSAVE and FXRSTOR for FSAVE/FNSAVE and
FRSTOR.
In cases where task or content switching facilities must be written from scratch, several
approaches can be taken for using the FXSAVE and FXRSTOR instructions to save and restore
the 87 FPU/MMX/SSE/SSE2/SSE3 state:
• The operating system can require applications that are intended be run as tasks take
responsibility for saving the state of the x87 FPU, MMX, XXM, and MXCSR registers
prior to a task suspension during a task switch and for restoring the registers when the task
is resumed. This approach is appropriate for cooperative multitasking operating systems,
where the application has control over (or is able to determine) when a task switch is about
to occur and can save state prior to the task switch.
• The operating system can take the responsibility for automatically saving the x87 FPU,
MMX, XXM, and MXCSR registers as part of the task switch process (using an FXSAVE
instruction) and automatically restoring the state of the registers when a suspended task is
resumed (using an FXRSTOR instruction). Here, the x87 FPU/MMX/SSE/SSE2/SSE3
state must be saved as part of the task state. This approach is appropriate for preemptive
12-8 Vol. 3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
multitasking operating systems, where the application cannot know when it is going to be
preempted and cannot prepare in advance for task switching. Here, the operating system is
responsible for saving and restoring the task and the x87 FPU/MMX/SSE/SSE2/SSE3
state when necessary.
• The operating system can take the responsibility for saving the x87 FPU, MMX, XXM,
and MXCSR registers as part of the task switch process, but delay the saving of the MMX
and x87 FPU state until an x87 FPU, MMX, or SSE/SSE2/SSE3 instruction is actually
executed by the new task. Using this approach, the x87 FPU/MMX/SSE/SSE2/SSE3 state
is saved only if an x87 FPU/MMX/SSE/SSE2/SSE3 instruction needs to be executed in the
new task. (See Section 12.5.1., “Using the TS Flag to Control the Saving of the x87 FPU,
MMX, SSE, SSE2 and SSE3 State”, for more information on this technique.)
12.5.1. Using the TS Flag to Control the Saving of the
x87 FPU, MMX, SSE, SSE2 and SSE3 State
Saving the x87 FPU/MMX/SSE/SSE2/SSE3 state using FXSAVE requires processor overhead.
If the new task does not access x87 FPU, MMX, XXM, and MXCSR registers, avoid overhead
by not automatically saving the state on a task switch.
The TS flag in control register CR0 is provided to allow the operating system to delay saving
the x87 FPU/MMX/SSE/SSE2/SSE3 state until an instruction that actually accesses this state is
encountered in a new task. When the TS flag is set, the processor monitors the instruction stream
for an x87 FPU/MMX/SSE/SSE2/SSE3 instruction. When the processor detects one of these
instructions, it raises a device-not-available exception (#NM) prior to executing the instruction.
The device-not-available exception handler can then be used to save the x87
FPU/MMX/SSE/SSE2/SSE3 state for the previous task (using an FXSAVE instruction) and load
the x87 FPU/MMX/SSE/SSE2/SSE3 state for the current task (using an FXRSTOR instruction).
If the task never encounters an x87 FPU/MMX/SSE/SSE2/SSE3 instruction, the device-notavailable
exception will not be raised and a task state will not be saved unnecessarily.
The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0)
or implicitly (using the IA-32 architecture’s native task switching mechanism). When the native
task switching mechanism is used, the processor automatically sets the TS flag on a task switch.
After the device-not-available handler has saved the x87 FPU/MMX/SSE/SSE2/SSE3 state, it
should execute the CLTS instruction to clear the TS flag.
Figure 12-1 gives an example of an operating system that implements x87
FPU/MMX/SSE/SSE2/SSE3 state saving using the TS flag. In this example, task A is the
currently running task and task B is the new task. The operating system maintains a save area
for the x87 FPU/MMX/SSE/SSE2/SSE3 state for each task and defines a variable
(x87_MMX_SSE_SSE2_SSE3_StateOwner) that indicates the task that “owns” the state. In this
example, task A is the current owner.
On a task switch, the operating system task switching code must execute the following pseudocode
to set the TS flag according to the current owner of the x87 FPU/MMX/SSE/SSE2/SSE3
state. If the new task (task B in this example) is not the current owner of this state, the TS flag
is set to 1; otherwise, it is set to 0.
Vol. 3 12-9
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
IF Task_Being_Switched_To ≠ x87FPU_MMX_SSE_SSE2_SSE3_StateOwner
THEN
CR0.TS ← 1;
ELSE
CR0.TS ← 0;
FI;
If a new task attempts to access an x87 FPU, MMX, XMM, or MXCSR register while the TS
flag is set to 1, a device-not-available exception (#NM) is generated. The device-not-available
exception handler executes the following pseudo-code.
FSAVE “To x87FPU/MMX/SSE/SSE2/SSE3 State Save Area for Current
x87FPU_MMX_SSE_SSE2_SSE3_StateOwner”;
FRSTOR “x87FPU/MMX/SSE/SSE2/SSE3 State From Current Task’s
x87FPU/MMX/SSE/SSE2/SSE3 State Save Area”;
x87FPU_MMX_SSE_SSE2_SSE3_StateOwner ← Current_Task;
CR0.TS ← 0;
This exception handler code performs the following tasks:
• Saves the x87 FPU, MMX, XMM, or MXCSR registers in the state save area for the
current owner of the x87 FPU/MMX/SSE/SSE2/SSE3 state.
• Restores the x87 FPU, MMX, XMM, or MXCSR registers from the new task’s save area
for the x87 FPU/MMX/SSE/SSE2/SSE3 state.
• Updates the current x87 FPU/MMX/SSE/SSE2/SSE3 state owner to be the current task.
• Clears the TS flag.

CHAPTER 12 SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING

12 SSE, SSE2 and SSE3 System Programming
Vol. 3 12-1
CHAPTER 12 SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
This chapter describes features of the streaming SIMD extensions (SSE), streaming SIMD
extensions 2 (SSE2) and streaming SIMD extensions 3 (SSE3) that must be considered when
designing or enhancing an operating system to support the Pentium III, Pentium 4, and Intel
Xeon processors. It covers enabling SSE/SSE2/SSE3 extensions, providing operating system or
executive support for the SSE/SSE2/SSE3 extensions, SIMD floating-point exceptions, exception
handling, and task (context) switching considerations.
12.1 PROVIDING OPERATING SYSTEM SUPPORT FOR
SSE/SSE2/SSE3 EXTENSIONS
To use SSE/SSE2/SSE3 extensions, the operating system or executive must provide support for
initializing the processor to use the extensions, for handling the FXSAVE and FXRSTOR state
saving instructions, and for handling SIMD floating-point exceptions. The following sections
give some guidelines for providing this support in an operating-system or executive. Because
SSE/SSE2/SSE3 extensions share the same state and perform companion operations, these
guidelines apply to all three sets of extensions.
Chapter 11, Programming with the Streaming SIMD Extensions 2 (SSE2) and Chapter 12,
Programming with the Streaming SIMD Extensions 3 (SSE3) in the IA-32 Intel Architecture Software
Developer’s Manual, Volume 1 discusses support for SSE/SSE2/SSE3 extensions from the
point of view of an applications program.
12.1.1 Adding Support to an Operating System for
SSE/SSE2/SSE3 Extensions
The following guidelines describe operations that an operating system or executive must
perform to support SSE/SSE2/SSE3 extensions:
1. Check that the processor supports the SSE/SSE2/SSE3 extensions.
2. Check that the processor supports the FXSAVE and FXRESTOR instructions.
3. Provide an initialization for the SSE, SSE2 and SSE3 states.
4. Provide support for the FXSAVE and FXRSTOR instructions.
5. Provide support (if necessary) in non-numeric exception handlers for exceptions generated
by the SSE and SSE2 instructions.
6. Provide an exception handler for the SIMD floating-point exception (#XF).
The following sections describe how to implement each of these guidelines.
12-2 Vol. 3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
12.1.2 Checking for SSE/SSE2/SSE3 Extension Support
If the processor attempts to execute an unsupported SSE/SSE2/SSE3 instruction, the processor
will generate an invalid-opcode exception (#UD).
Before an operating system or executive attempts to use SSE/SSE2/SSE3 extensions, it should
check that support is present on the processor. To make this check, execute CPUID with an argument
of 1 in the EAX register. Make sure:
• CPUID.1:EDX.SSE[bit 25] = 1
• CPUID.1:EDX.SSE2[bit 26] = 1
• CPUID.1:ECX.SSE3[bit 0] = 1
12.1.3 Checking for Support for the FXSAVE and FXRSTOR
Instructions
A separate check must be made to insure that the processor supports FXSAVE and FXRSTOR.
To make this check, execute CPUID with an argument of 1 in the EAX register. Make sure:
• CPUID.1:EDX.FXSR[bit 24] = 1
12.1.4 Initialization of the SSE/SSE2/SSE3 Extensions
The operating system or executive should carry out the following steps to set up
SSE/SSE2/SSE3 extensions for use by application programs:
1. Set CR4.OSFXSR[bit 9] = 1. Setting this flag assumes that the operating system provides
facilities for saving and restoring SSE/SSE2/SSE3 states using FXSAVE and FXRSTOR
instructions. These instructions are commonly used to save the SSE/SSE2/SSE3 state
during task switches and when invoking the SIMD floating-point exception (#XF) handler
(see Section 12.4, “Saving the SSE/SSE2/SSE3 State on Task or Context Switches” and
Section 12.1.6, “Providing an Handler for the SIMD Floating-Point Exception (#XF)”,
respectively).
If the processor does not support the FXSAVE and FXRSTOR instructions, attempting to
set the OSFXSR flag will cause an exception (#GP) to be generated.
2. Set CR4.OSXMMEXCPT[bit 10] = 1. Setting this flag assumes that the operating system
provides an SIMD floating-point exception (#XF) handler (see Section 12.1.6, “Providing
an Handler for the SIMD Floating-Point Exception (#XF)”).
Vol. 3 12-3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
NOTE
The OSFXSR and OSXMMEXCPT bits in control register CR4 must be set
by the operating system. The processor has no other way of detecting
operating-system support for the FXSAVE and FXRSTOR instructions or for
handling SIMD floating-point exceptions.
3. Clear CR0.EM[bit 2] = 0. This action disables emulation of the x87 FPU, which is required
when executing SSE/SSE2/SSE3 instructions (see Section 2.5, “Control Registers”).
4. Clear CR0.MP[bit 1] = 0. This setting is the required setting for all IA-32 processors that
support the SSE/SSE2/SSE3 extensions (see Section 9.2.1, “Configuring the x87 FPU
Environment”).
Table 12-1 shows the actions of the processor when an SSE/SSE2/SSE3 instruction is executed,
depending on the:
• OSFXSR and OSXMMEXCPT flags in control register CR4
• SSE/SSE2/SSE3 feature flags returned by CPUID
• EM, MP, and TS flags in control register CR0
Table 12-1. Action Taken for Combinations of OSFXSR, OSXMMEXCPT, SSE, SSE2,
SSE3, EM, MP, and TS1
CR4 CPUID CR0 Flags
OSFXSR OSXMMEXCPT
SSE,
SSE2,
SSE3 EM MP2 TS Action
0 X3 X X 1 X #UD exception.
1 X 0 X 1 X #UD exception.
1 X 1 1 1 X #UD exception.
1 0 1 0 1 0 Execute instruction; #UD exception if
unmasked SIMD floating-point exception
is detected.
1 1 1 0 1 0 Execute instruction; #XF exception if
unmasked SIMD floating-point exception
is detected.
1 X 1 0 1 1 #NM exception.
NOTES:
1. For execution of any SSE/SSE2/SSE3 instruction except the PAUSE, PREFETCHh, SFENCE,
LFENCE, MFENCE, MOVNTI, and CLFLUSH instructions.
2. For processors that support the MMX instructions, the MP flag should be set.
3. X — Don’t care.
12-4 Vol. 3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
The SIMD floating-point exception mask bits (bits 7 through 12), the flush-to-zero flag (bit 15),
the denormals-are-zero flag (bit 6), and the rounding control field (bits 13 and 14) in the
MXCSR register should be left in their default values of 0. This permits the application to determine
how these features are to be used.
12.1.5 Providing Non-Numeric Exception Handlers for
Exceptions Generated by the SSE/SSE2/SSE3 Instructions
SSE/SSE2/SSE3 instructions can generate the same type of memory access exceptions (such as,
page fault, segment not present, and limit violations) and other non-numeric exceptions as other
IA-32 architecture instructions generate.
Ordinarily, existing exception handlers can handle these and other non-numeric exceptions
without code modification. However, depending on the mechanisms used in existing exception
handlers, some modifications might need to be made.
The SSE/SSE2/SSE3 extensions can generate the non-numeric exceptions listed below:
• Memory Access Exceptions:
— Invalid opcode (#UD).
— Stack-segment fault (#SS).
— General protection (#GP). Executing most SSE/SSE2/SSE3 instructions with an
unaligned 128-bit memory reference generates a general-protection exception. (The
MOVUPS and MOVUPD instructions allow unaligned a loads or stores of 128-bit
memory locations, without generating a general-protection exception.) A 128-bit
reference within the stack segment that is not aligned to a 16-byte boundary will also
generate a general-protection exception, instead a stack-segment fault exception
(#SS).
— Page fault (#PF).
— Alignment check (#AC). When enabled, this type of alignment check operates on
operands that are less than 128-bits in size: 16-bit, 32-bit, and 64-bit. To enable the
generation of alignment check exceptions, do the following:
• Set the AM flag (bit 18 of control register CR0)
• Set the AC flag (bit 18 of the EFLAGS register)
• CPL must be 3.
If alignment check exceptions are enabled, 16-bit, 32-bit, and 64-bit misalignment will
be detected for the MOVUPD and MOVUPS instructions; detection of 128-bit
misalignment is not guaranteed and may vary with implementation.
• System Exceptions:
— Invalid-opcode exception (#UD). This exception is generated when executing
SSE/SSE2/SSE3 instructions under the following conditions:
Vol. 3 12-5
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
• SSE/SSE2/SSE3 feature flags returned by CPUID are set to 0. This condition does
not affect the CLFLUSH instruction.
• The CLFSH feature flag returned by the CPUID instruction is set to 0. This
exception condition only pertains to the execution of the CLFLUSH instruction.
• The EM flag (bit 2) in control register CR0 is set to 1, regardless of the value of
TS flag (bit 3) of CR0. This condition does not affect the PAUSE, PREFETCHh,
MOVNTI, SFENCE, LFENCE, MFENSE, and CLFLUSH instructions.
• The OSFXSR flag (bit 9) in control register CR4 is set to 0. This condition does
not affect the PAVGB, PAVGW, PEXTRW, PINSRW, PMAXSW, PMAXUB,
PMINSW, PMINUB, PMOVMSKB, PMULHUW, PSADBW, PSHUFW,
MASKMOVQ, MOVNTQ, MOVNTI, PAUSE, PREFETCHh, SFENCE,
LFENCE, MFENCE, and CLFLUSH instructions.
• Executing a instruction that causes a SIMD floating-point exception when the
OSXMMEXCPT flag (bit 10) in control register CR4 is set to 0. See Section
12.5.1., “Using the TS Flag to Control the Saving of the x87 FPU, MMX, SSE,
SSE2 and SSE3 State”
— Device not available (#NM). This exception is generated by executing a
SSE/SSE2/SSE3 instruction when the TS flag (bit 3) of CR0 is set to 1.
Other exceptions can occur indirectly due to faulty execution of the above exceptions.
12.1.6 Providing an Handler for the SIMD Floating-Point
Exception (#XF)
SSE/SSE2/SSE3 instructions do not generate numeric exceptions on packed integer operations.
They can generate the following numeric (SIMD floating-point) exceptions on packed and
scalar single-precision and double-precision floating-point operations.
• Invalid operation (#I)
• Divide-by-zero (#Z)
• Denormal operand (#D)
• Numeric overflow (#O)
• Numeric underflow (#U)
• Inexact result (Precision) (#P)
These SIMD floating-point exceptions (with the exception of the denormal operand exception)
are defined in the IEEE Standard 754 for Binary Floating-Point Arithmetic and represent the
same conditions that cause x87 FPU floating-point error exceptions (#MF) to be generated for
x87 FPU instructions.
Each of these exceptions can be masked, in which case the processor returns a reasonable result
to the destination operand without invoking an exception handler. However, if any of these
12-6 Vol. 3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
exceptions are left unmasked, detection of the exception condition results in a SIMD floatingpoint
exception (#XF) being generated. See Chapter 5, “Interrupt 19—SIMD Floating-Point
Exception (#XF)”.
To handle unmasked SIMD floating-point exceptions, the operating system or executive must
provide an exception handler. The section titled “SSE and SSE2 SIMD Floating-Point Exceptions”
in Chapter 11 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1,
describes the SIMD floating-point exception classes and gives suggestions for writing an exception
handler to handle them.
To indicate that the operating system provides a handler for SIMD floating-point exceptions
(#XF), the OSXMMEXCPT flag (bit 10) must be set in control register CR0.
12.1.6.1 Numeric Error flag and IGNNE#
SSE/SSE2/SSE3 extensions ignore the NE flag in control register CR0 (that is, treats it as if it
were always set) and the IGNNE# pin. When an unmasked SIMD floating-point exception is
detected, it is always reported by generating a SIMD floating-point exception (#XF).
12.2 EMULATION OF SSE/SSE2/SSE3 EXTENSIONS
The IA-32 architecture does not support emulation of the SSE/SSE2/SSE3 instructions, as it
does for x87 FPU instructions. The EM flag in control register CR0 (provided to invoke emulation
of x87 FPU instructions) cannot be used to invoke emulation of SSE/SSE2/SSE3 instructions.
If an SSE/SSE2/SSE3 instruction is executed when the EM flag is set, an invalid opcode
exception (#UD) is generated (see Table 12-1).
12.3 SAVING AND RESTORING THE SSE/SSE2/SSE3 STATE
The SSE/SSE2/SSE3 state consists of the state of the XMM and MXCSR registers. The recommended
method of saving and restoring this state follows:
• Execute an FXSAVE instruction to save the state of the XMM and MXCSR registers to
memory.
• Execute an FXRSTOR instruction to restore the state of the XMM and MXCSR registers
from the image saved in memory by the FXSAVE instruction.
This save and restore method is required for operating systems (see Section 12.5, “Designing
OS Facilities for AUTOMATICALLY Saving x87 FPU, MMX, and SSE/SSE2/SSE3 state on
Task or Context Switches”).
In some cases, applications can only save the XMM and MXCSR registers in the following way:
• Execute eight MOVDQ instructions to save the contents of the XMM0 through XMM7
registers to memory.
• Execute a STMXCSR instruction to save the state of the MXCSR register to memory.
Vol. 3 12-7
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
In some cases, applications can only restore the XMM and MXCSR registers in the following
way:
• Execute eight MOVDQ instructions to read the saved contents of XMM registers from
memory into the XMM0 through XMM7 registers.
• Execute a LDMXCSR instruction to restore the state of the MXCSR register from memory.
12.4 SAVING THE SSE/SSE2/SSE3 STATE ON TASK
OR CONTEXT SWITCHES
When switching from one task or context to another, it is often necessary to save the
SSE/SSE2/SSE3 state. The FXSAVE and FXRSTOR instructions provide a simple method for
saving and restoring this state (as described in Section 12.3, “Saving and Restoring the
SSE/SSE2/SSE3 State”). These instructions offer the added benefit of saving the x87 FPU and
MMX state as well. Guidelines for writing such procedures are in Section 12.5, “Designing OS
Facilities for AUTOMATICALLY Saving x87 FPU, MMX, and SSE/SSE2/SSE3 state on Task
or Context Switches”.
12.5 DESIGNING OS FACILITIES FOR AUTOMATICALLY SAVING
X87 FPU, MMX, AND SSE/SSE2/SSE3 STATE ON TASK OR
CONTEXT SWITCHES
The x87 FPU/MMX/SSE/SSE2/SSE3 state consists of the state of the x87 FPU, MMX, XMM,
and MXCSR registers. The FXSAVE and FXRSTOR instructions provide a fast method of
saving ad restoring this state. If task or context switching facilities are already implemented in
an operating system or executive and they use FSAVE/FNSAVE and FRSTOR to save the x87
FPU and MMX state, these facilities can also be extended to save and restore the
SSE/SSE2/SSE3 state by substituting FXSAVE and FXRSTOR for FSAVE/FNSAVE and
FRSTOR.
In cases where task or content switching facilities must be written from scratch, several
approaches can be taken for using the FXSAVE and FXRSTOR instructions to save and restore
the 87 FPU/MMX/SSE/SSE2/SSE3 state:
• The operating system can require applications that are intended be run as tasks take
responsibility for saving the state of the x87 FPU, MMX, XXM, and MXCSR registers
prior to a task suspension during a task switch and for restoring the registers when the task
is resumed. This approach is appropriate for cooperative multitasking operating systems,
where the application has control over (or is able to determine) when a task switch is about
to occur and can save state prior to the task switch.
• The operating system can take the responsibility for automatically saving the x87 FPU,
MMX, XXM, and MXCSR registers as part of the task switch process (using an FXSAVE
instruction) and automatically restoring the state of the registers when a suspended task is
resumed (using an FXRSTOR instruction). Here, the x87 FPU/MMX/SSE/SSE2/SSE3
state must be saved as part of the task state. This approach is appropriate for preemptive
12-8 Vol. 3
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
multitasking operating systems, where the application cannot know when it is going to be
preempted and cannot prepare in advance for task switching. Here, the operating system is
responsible for saving and restoring the task and the x87 FPU/MMX/SSE/SSE2/SSE3
state when necessary.
• The operating system can take the responsibility for saving the x87 FPU, MMX, XXM,
and MXCSR registers as part of the task switch process, but delay the saving of the MMX
and x87 FPU state until an x87 FPU, MMX, or SSE/SSE2/SSE3 instruction is actually
executed by the new task. Using this approach, the x87 FPU/MMX/SSE/SSE2/SSE3 state
is saved only if an x87 FPU/MMX/SSE/SSE2/SSE3 instruction needs to be executed in the
new task. (See Section 12.5.1., “Using the TS Flag to Control the Saving of the x87 FPU,
MMX, SSE, SSE2 and SSE3 State”, for more information on this technique.)
12.5.1. Using the TS Flag to Control the Saving of the
x87 FPU, MMX, SSE, SSE2 and SSE3 State
Saving the x87 FPU/MMX/SSE/SSE2/SSE3 state using FXSAVE requires processor overhead.
If the new task does not access x87 FPU, MMX, XXM, and MXCSR registers, avoid overhead
by not automatically saving the state on a task switch.
The TS flag in control register CR0 is provided to allow the operating system to delay saving
the x87 FPU/MMX/SSE/SSE2/SSE3 state until an instruction that actually accesses this state is
encountered in a new task. When the TS flag is set, the processor monitors the instruction stream
for an x87 FPU/MMX/SSE/SSE2/SSE3 instruction. When the processor detects one of these
instructions, it raises a device-not-available exception (#NM) prior to executing the instruction.
The device-not-available exception handler can then be used to save the x87
FPU/MMX/SSE/SSE2/SSE3 state for the previous task (using an FXSAVE instruction) and load
the x87 FPU/MMX/SSE/SSE2/SSE3 state for the current task (using an FXRSTOR instruction).
If the task never encounters an x87 FPU/MMX/SSE/SSE2/SSE3 instruction, the device-notavailable
exception will not be raised and a task state will not be saved unnecessarily.
The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0)
or implicitly (using the IA-32 architecture’s native task switching mechanism). When the native
task switching mechanism is used, the processor automatically sets the TS flag on a task switch.
After the device-not-available handler has saved the x87 FPU/MMX/SSE/SSE2/SSE3 state, it
should execute the CLTS instruction to clear the TS flag.
Figure 12-1 gives an example of an operating system that implements x87
FPU/MMX/SSE/SSE2/SSE3 state saving using the TS flag. In this example, task A is the
currently running task and task B is the new task. The operating system maintains a save area
for the x87 FPU/MMX/SSE/SSE2/SSE3 state for each task and defines a variable
(x87_MMX_SSE_SSE2_SSE3_StateOwner) that indicates the task that “owns” the state. In this
example, task A is the current owner.
On a task switch, the operating system task switching code must execute the following pseudocode
to set the TS flag according to the current owner of the x87 FPU/MMX/SSE/SSE2/SSE3
state. If the new task (task B in this example) is not the current owner of this state, the TS flag
is set to 1; otherwise, it is set to 0.
Vol. 3 12-9
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
IF Task_Being_Switched_To ≠ x87FPU_MMX_SSE_SSE2_SSE3_StateOwner
THEN
CR0.TS ← 1;
ELSE
CR0.TS ← 0;
FI;
If a new task attempts to access an x87 FPU, MMX, XMM, or MXCSR register while the TS
flag is set to 1, a device-not-available exception (#NM) is generated. The device-not-available
exception handler executes the following pseudo-code.
FSAVE “To x87FPU/MMX/SSE/SSE2/SSE3 State Save Area for Current
x87FPU_MMX_SSE_SSE2_SSE3_StateOwner”;
FRSTOR “x87FPU/MMX/SSE/SSE2/SSE3 State From Current Task’s
x87FPU/MMX/SSE/SSE2/SSE3 State Save Area”;
x87FPU_MMX_SSE_SSE2_SSE3_StateOwner ← Current_Task;
CR0.TS ← 0;
This exception handler code performs the following tasks:
• Saves the x87 FPU, MMX, XMM, or MXCSR registers in the state save area for the
current owner of the x87 FPU/MMX/SSE/SSE2/SSE3 state.
• Restores the x87 FPU, MMX, XMM, or MXCSR registers from the new task’s save area
for the x87 FPU/MMX/SSE/SSE2/SSE3 state.
• Updates the current x87 FPU/MMX/SSE/SSE2/SSE3 state owner to be the current task.
• Clears the TS flag.

CHAPTER 11 INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING

11 Intel® MMX™ Technology System Programming
Vol. 3 11-1
CHAPTER 11 INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING
This chapter describes those features of the Intel® MMX™ technology that must be considered
when designing or enhancing an operating system to support MMX technology. It covers MMX
instruction set emulation, the MMX state, aliasing of MMX registers, saving MMX state, task
and context switching considerations, exception handling, and debugging.
11.1 EMULATION OF THE MMX INSTRUCTION SET
The IA-32 architecture does not support emulation of the MMX instructions, as it does for x87
FPU instructions. The EM flag in control register CR0 (provided to invoke emulation of x87
FPU instructions) cannot be used for MMX instruction emulation. If an MMX instruction is
executed when the EM flag is set, an invalid opcode exception (UD#) is generated. Table 11-1
shows the interaction of the EM, MP, and TS flags in control register CR0 when executing
MMX instructions.
11.2 THE MMX STATE AND MMX REGISTER ALIASING
The MMX state consists of eight 64-bit registers (MM0 through MM7). These registers are
aliased to the low 64-bits (bits 0 through 63) of floating-point registers R0 through R7 (see
Figure 11-1). Note that the MMX registers are mapped to the physical locations of the floatingpoint
registers (R0 through R7), not to the relative locations of the registers in the floating-point
register stack (ST0 through ST7). As a result, the MMX register mapping is fixed and is not
affected by value in the Top Of Stack (TOS) field in the floating-point status word (bits 11
through 13).
Table 11-1. Action Taken By MMX Instructions for Different Combinations
of EM, MP and TS
CR0 Flags
EM MP* TS Action
0 1 0 Execute.
0 1 1 #NM exception.
1 1 0 #UD exception.
1 1 1 #UD exception.
NOTE:
* For processors that support the MMX instructions, the MP flag should be set.
11-2 Vol. 3
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING
When a value is written into an MMX register using an MMX instruction, the value also appears
in the corresponding floating-point register in bits 0 through 63. Likewise, when a floating-point
value written into a floating-point register by a x87 FPU, the low 64 bits of that value also
appears in a the corresponding MMX register.
The execution of MMX instructions have several side effects on the x87 FPU state contained in
the floating-point registers, the x87 FPU tag word, and the x87 FPU status word. These side
effects are as follows:
• When an MMX instruction writes a value into an MMX register, at the same time, bits 64
through 79 of the corresponding floating-point register are set to all 1s.
• When an MMX instruction (other than the EMMS instruction) is executed, each of the tag
fields in the x87 FPU tag word is set to 00B (valid). (See also Section 11.2.1, “Effect of
MMX, x87 FPU, FXSAVE, and FXRSTOR Instructions on the x87 FPU Tag Word”.)
• When the EMMS instruction is executed, each tag field in the x87 FPU tag word is set to
11B (empty).
• Each time an MMX instruction is executed, the TOS value is set to 000B.
Figure 11-1. Mapping of MMX Registers to Floating-Point Registers
79 0
R7
R6
R5
R4
R3
R2
R1
R0
64 63 Floating-Point Registers
x87 FPU Status Register
13 11
x87 FPU Tag
MMX Registers
TOS
Register
0
MM7
MM6
MM5
MM4
MM3
MM2
MM1
MM0
63
TOS = 0
00
00
00
00
00
00
00
00
000
Vol. 3 11-3
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING
Execution of MMX instructions does not affect the other bits in the x87 FPU status word (bits
0 through 10 and bits 14 and 15) or the contents of the other x87 FPU registers that comprise the
x87 FPU state (the x87 FPU control word, instruction pointer, data pointer, or opcode registers).
Table 11-2 summarizes the effects of the MMX instructions on the x87 FPU state.
11.2.1 Effect of MMX, x87 FPU, FXSAVE, and FXRSTOR
Instructions on the x87 FPU Tag Word
Table 11-3 summarizes the effect of MMX and x87 FPU instructions and the FXSAVE and
FXRSTOR instructions on the tags in the x87 FPU tag word and the corresponding tags in an
image of the tag word stored in memory.
The values in the fields of the x87 FPU tag word do not affect the contents of the MMX registers
or the execution of MMX instructions. However, the MMX instructions do modify the contents
of the x87 FPU tag word, as is described in Section 11.2, “The MMX State and MMX Register
Aliasing”. These modifications may affect the operation of the x87 FPU when executing x87
FPU instructions, if the x87 FPU state is not initialized or restored prior to beginning x87 FPU
instruction execution.
Note that the FSAVE, FXSAVE, and FSTENV instructions (which save x87 FPU state information)
read the x87 FPU tag register and contents of each of the floating-point registers, determine
the actual tag values for each register (empty, nonzero, zero, or special), and store the updated
tag word in memory. After executing these instructions, all the tags in the x87 FPU tag word are
set to empty (11B). Likewise, the EMMS instruction clears MMX state from the MMX/floatingpoint
registers by setting all the tags in the x87 FPU tag word to 11B.
Table 11-2. Effects of MMX Instructions on x87 FPU State
MMX
Instruction
Type
x87 FPU Tag
Word
TOS Field of
x87 FPU
Status Word
Other x87 FPU
Registers
Bits 64 Through
79 of x87 FPU
Data Registers
Bits 0 Through
63 of x87 FPU
Data Registers
Read from
MMX register
All tags set to
00B (Valid)
000B Unchanged Unchanged Unchanged
Write to MMX
register
All tags set to
00B (Valid)
000B Unchanged Set to all 1s Overwritten with
MMX data
EMMS All fields set to
11B (Empty)
000B Unchanged Unchanged Unchanged
11-4 Vol. 3
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING
11.3 SAVING AND RESTORING THE MMX STATE AND
REGISTERS
Because the MMX registers are aliased to the x87 FPU data registers, the MMX state can be
saved to memory and restored from memory as follows:
• Execute an FSAVE, FNSAVE, or FXSAVE instruction to save the MMX state to memory.
(The FXSAVE instruction also saves the state of the XMM and MXCSR registers.)
• Execute an FRSTOR or FXRSTOR instruction to restore the MMX state from memory.
(The FXRSTOR instruction also restores the state of the XMM and MXCSR registers.)
The save and restore methods described above are required for operating systems (see Section
11.4, “Saving MMX State on Task or Context Switches”). Applications can in some cases save
and restore only the MMX registers in the following way:
• Execute eight MOVQ instructions to save the contents of the MMX0 through MMX7
registers to memory. An EMMS instruction may then (optionally) be executed to clear the
MMX state in the x87 FPU.
• Execute eight MOVQ instructions to read the saved contents of MMX registers from
memory into the MMX0 through MMX7 registers.
Table 11-3. Effect of the MMX, x87 FPU, and FXSAVE/FXRSTOR Instructions on the
x87 FPU Tag Word
Instruction
Type Instruction x87 FPU Tag Word
Image of x87 FPU Tag Word
Stored in Memory
MMX All (except EMMS) All tags are set to 00B (valid). Not affected.
MMX EMMS All tags are set to 11B (empty). Not affected.
x87 FPU All (except FSAVE,
FSTENV, FRSTOR,
FLDENV)
Tag for modified floating-point
register is set to 00B or 11B.
Not affected.
x87 FPU and
FXSAVE
FSAVE, FSTENV,
FXSAVE
Tags and register values are
read and interpreted; then all
tags are set to 11B.
Tags are set according to the
actual values in the floatingpoint
registers; that is, empty
registers are marked 11B and
valid registers are marked 00B
(nonzero), 01B (zero), or 10B
(special).
x87 FPU and
FXRSTOR
FRSTOR, FLDENV,
FXRSTOR
All tags marked 11B in memory
are set to 11B; all other tags are
set according to the value in the
corresponding floating-point
register: 00B (nonzero), 01B
(zero), or 10B (special).
Tags are read and interpreted,
but not modified.
Vol. 3 11-5
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING
NOTE
The IA-32 architecture does not support scanning the x87 FPU tag word and
then only saving valid entries.
11.4 SAVING MMX STATE ON TASK OR CONTEXT SWITCHES
When switching from one task or context to another, it is often necessary to save the MMX state.
As a general rule, if the existing task switching code for an operating system includes facilities
for saving the state of the x87 FPU, these facilities can also be relied upon to save the MMX
state, without rewriting the task switch code. This reliance is possible because the MMX state
is aliased to the x87 FPU state (see Section 11.2, “The MMX State and MMX Register
Aliasing”).
With the introduction of the FXSAVE and FXRSTOR instructions and of SSE/SSE2/SSE3
extensions to the IA-32 architecture, it is possible (and more efficient) to create state saving
facilities in the operating system or executive that save the x87 FPU/MMX/SSE/SSE2/SSE3
state in one operation. Section 12.5, “Designing OS Facilities for AUTOMATICALLY Saving
x87 FPU, MMX, and SSE/SSE2/SSE3 state on Task or Context Switches” describes how to
design such facilities. The techniques describes in this section can be adapted to saving only the
MMX and x87 FPU state if needed.
11.5. EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING MMX
INSTRUCTIONS
MMX instructions do not generate x87 FPU floating-point exceptions, nor do they affect the
processor’s status flags in the EFLAGS register or the x87 FPU status word. The following
exceptions can be generated during the execution of an MMX instruction:
• Exceptions during memory accesses:
— Stack-segment fault (#SS).
— General protection (#GP).
— Page fault (#PF).
— Alignment check (#AC), if alignment checking is enabled.
• System exceptions:
— Invalid Opcode (#UD), if the EM flag in control register CR0 is set when an MMX
instruction is executed (see Section 11.1, “Emulation of the MMX Instruction Set”).
— Device not available (#NM), if an MMX instruction is executed when the TS flag in
control register CR0 is set. (See Section 12.5.1., “Using the TS Flag to Control the
Saving of the x87 FPU, MMX, SSE, SSE2 and SSE3 State”.)
• Floating-point error (#MF). (See Section 11.5.1, “Effect of MMX Instructions on Pending
x87 Floating-Point Exceptions”.)
11-6 Vol. 3
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING
• Other exceptions can occur indirectly due to the faulty execution of the exception handlers
for the above exceptions.
11.5.1 Effect of MMX Instructions on Pending x87 Floating-Point
Exceptions
If an x87 FPU floating-point exception is pending and the processor encounters an MMX
instruction, the processor generates a x87 FPU floating-point error (#MF) prior to executing the
MMX instruction, to allow the pending exception to be handled by the x87 FPU floating-point
error exception handler. While this exception handler is executing, the x87 FPU state is maintained
and is visible to the handler. Upon returning from the exception handler, the MMX
instruction is executed, which will alter the x87 FPU state, as described in Section 11.2, “The
MMX State and MMX Register Aliasing”.
11.6 DEBUGGING MMX CODE
The debug facilities of the IA-32 architecture operate in the same manner when executing MMX
instructions as when executing other IA-32 architecture instructions.
To correctly interpret the contents of the MMX or x87 FPU registers from the FSAVE/FNSAVE
or FXSAVE image in memory, a debugger needs to take account of the relationship between the
x87 FPU register’s logical locations relative to TOS and the MMX register’s physical locations.
In the x87 FPU context, STn refers to an x87 FPU register at location n relative to the TOS.
However, the tags in the x87 FPU tag word are associated with the physical locations of the x87
FPU registers (R0 through R7). The MMX registers always refer to the physical locations of the
registers (with MM0 through MM7 being mapped to R0 through R7). Figure 11-2 shows this
relationship. Here, the inner circle refers to the physical location of the x87 FPU and MMX
registers. The outer circle refers to the x87 FPU registers’s relative location to the current TOS.
When the TOS equals 0 (case A in Figure 11-2), ST0 points to the physical location R0 on the
floating-point stack. MM0 maps to ST0, MM1 maps to ST1, and so on.
When the TOS equals 2 (case B in Figure 11-2), ST0 points to the physical location R2. MM0
maps to ST6, MM1 maps to ST7, MM2 maps to ST0, and so on.
Vol. 3 11-7
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING
Figure 11-2. Mapping of MMX Registers to x87 FPU Data Register Stack
MM0
MM1
MM2
MM3
MM4
MM5
MM6
MM7
ST1
ST2
ST7
ST0 ST6
ST7
ST1
TOS TOS
x87 FPU “push” x87 FPU “pop” x87 FPU “push”
x87 FPU “pop”
Case A: TOS=0 Case B: TOS=2
MM0
MM1
MM2
MM3
MM4
MM5
MM6
MM7
ST0
Outer circle = x87 FPU data register’s logical location relative to TOS
Inner circle = x87 FPU tags = MMX register’s location = FP registers’s physical location
(R0)
(R2) (R2)
(R0)

CHAPTER 10 MEMORY CACHE CONTROL

10 Memory Cache Control
Vol. 3 10-1
CHAPTER 10 MEMORY CACHE CONTROL
This chapter describes the IA-32 architecture’s memory cache and cache control mechanisms, the
TLBs, and the store buffer. It also describes the memory type range registers (MTRRs) found in
the P6 family processors and how they are used to control caching of physical memory locations.
10.1 INTERNAL CACHES, TLBS, AND BUFFERS
The IA-32 architecture supports caches, translation look aside buffers (TLBs), and a store buffer
for temporary on-chip (and external) storage of instructions and data. (Figure 10-1 shows the
arrangement of caches, TLBs, and the store buffer for the Pentium 4 and Intel Xeon processors.)
Table 10-1 shows the characteristics of these caches and buffers for the Pentium 4, Intel Xeon,
P6 family, and Pentium processors. The sizes and characteristics of these units are machine
specific and may change in future versions of the processor. The CPUID instruction returns
the sizes and characteristics of the caches and buffers for the processor on which the instruction
is executed (see “CPUID—CPU Identification” in Chapter 3 of the IA-32 Intel Architecture Software
Developer’s Manual, Volume 2).
Figure 10-1. Cache Structure of the Pentium 4 and Intel Xeon Processors
Instruction Decoder Trace Cache
Bus Interface Unit
System Bus
Data Cache
Unit (L1)
(External)
Physical
Memory
Store Buffer
Data TLBs
L2 Cache
Instruction
TLBs
L3 Cache†
† Intel Xeon processors only
10-2 Vol. 3
MEMORY CACHE CONTROL
Table 10-1. Characteristics of the Caches, TLBs, Store Buffer, and
Write Combining Buffer in IA-32 processors
Cache or Buffer Characteristics
Trace Cache† - Pentium 4 and Intel Xeon processors: 12 Kμops, 8-way set associative.
- Pentium M processor: not implemented.
- P6 family and Pentium processors: not implemented.
L1 Instruction Cache - Pentium 4 and Intel Xeon processors: not implemented.
- Pentium M processor: 32-KByte, 8-way set associative.
- P6 family and Pentium processors: 8- or 16-KByte, 4-way set associative,
32-byte cache line size; 2-way set associative for earlier Pentium processors.
L1 Data Cache - Pentium 4 and Intel Xeon processors: 8-KByte, 4-way set associative, 64-byte
cache line size.
- Pentium 4 and Intel Xeon processors: 16-KByte, 8-way set associative, 64-byte
cache line size.
- Pentium M processor: 32-KByte, 8-way set associative, 64-byte cache line size.
- P6 family processors: 16-KByte, 4-way set associative, 32-byte cache line size;
8-KBytes, 2-way set associative for earlier P6 family processors.
- Pentium processors: 16-KByte, 4-way set associative, 32-byte cache line size;
8-KByte, 2-way set associative for earlier Pentium processors.
L2 Unified Cache - Pentium 4 and Intel Xeon processors: 256, 512, 1024, or 2048-KByte, 8-way set
associative, 64-byte cache line size, 128-byte sector size.
- Pentium M processor: 1 or 2-MByte, 8-way set associative, 64-byte cache line
size.
- P6 family processors: 128-KByte, 256-KByte, 512-KByte, 1-MByte, or 2-MByte,
4-way set associative, 32-byte cache line size.
- Pentium processor (external optional): System specific, typically 256- or
512-KByte, 4-way set associative, 32-byte cache line size.
L3 Unified Cache - Intel Xeon processors: 512-KByte, 1-MByte, 2-MByte, or 4-MByte, 8-way set
associative, 64-byte cache line size, 128-byte sector size.
Instruction TLB
(4-KByte Pages)
- Pentium 4 and Intel Xeon processors: 128 entries, 4-way set associative.
- Pentium M processor: 128 entries, 4-way set associative.
- P6 family processors: 32 entries, 4-way set associative.
- Pentium processor: 32 entries, 4-way set associative; fully set associative for
Pentium processors with MMX technology.
Data TLB (4-KByte
Pages)
- Pentium 4 and Intel Xeon processors: 64 entries, fully set associative; shared
with large page data TLBs.
- Pentium M processor: 128 entries, 4-way set associative.
- Pentium and P6 family processors: 64 entries, 4-way set associative; fully set.
associative for Pentium processors with MMX technology.
Instruction TLB
(Large Pages)
- Pentium 4 and Intel Xeon processors: large pages are fragmented.
- Pentium M processor: 2 entries, fully associative.
- P6 family processors: 2 entries, fully associative.
- Pentium processor: Uses same TLB as used for 4-KByte pages.
Data TLB (Large
Pages)
- Pentium 4 and Intel Xeon processors: 64 entries, fully set associative; shared
with small page data TLBs.
- Pentium M processor: 8 entries, fully associative.
- P6 family processors: 8 entries, 4-way set associative.
- Pentium processor: 8 entries, 4-way set associative; uses same TLB as used for
4-KByte pages in Pentium processors with MMX technology.
Vol. 3 10-3
MEMORY CACHE CONTROL
The IA-32 processors implement four types of caches: the trace cache, the level 1 (L1) cache,
the level 2 (L2) cache, and the level 3 (L3) cache (see Figure 10-1). The uses of these caches
differs from the Pentium 4, Intel Xeon, and P6 family processors, as follows:
• Pentium 4 and Intel Xeon processors — The trace cache caches decoded instructions
(μops) from the instruction decoder, and the L1 cache contains only data. The L2 and L3
caches are unified data and instruction caches that are located on the processor chip. (The
L3 cache is only implemented on Intel Xeon processors.)
• P6 family processors — The L1 cache is divided into two sections: one dedicated to
caching IA-32 architecture instructions (pre-decoded instructions) and one to caching data.
The L2 cache is a unified data and instruction cache that is located on the processor chip.
The P6 family processors do not implement a trace cache.
• Pentium processors — The L1 cache has the same structure as on the P6 family
processors (and a trace cache is not implemented). The L2 cache is a unified data and
instruction cache that is external to the processor chip on earlier Pentium processors and
implemented on the processor chip in later Pentium processors. For Pentium processors
where the L2 cache is external to the processor, access to the cache is through the system
bus.
The cache lines for the L1 and L2 caches in the Pentium 4 and the L1, L2, and L3 caches in the
Intel Xeon processors are 64 bytes wide. The processor always reads a cache line from system
memory beginning on a 64-byte boundary. (A 64-byte aligned cache line begins at an address
with its 6 least-significant bits clear.) A cache line can be filled from memory with a 8-transfer
burst transaction. The caches do not support partially-filled cache lines, so caching even a single
doubleword requires caching an entire line.
The L1 and L2 cache lines in the P6 family and Pentium processors are 32 bytes wide, with
cache line reads from system memory beginning on a 32-byte boundary (5 least-significant bits
of a memory address clear.) A cache line can be filled from memory with a 4-transfer burst transaction.
Partially-filled cache lines are not supported.
Store Buffer - Pentium 4 and Intel Xeon processors: 24 entries.
- Pentium M processor: 16 entries.
- P6 family processors: 12 entries.
- Pentium processor: 2 buffers, 1 entry each (Pentium processors with MMX
technology have 4 buffers for 4 entries).
Write Combining
(WC) Buffer
- Pentium 4 and Intel Xeon processors: 6 or 8 entries.
- Pentium M processor: 6 entries.
- P6 family processors: 4 entries.
NOTES:
† Introduced to the IA-32 architecture in the Pentium 4 and Intel Xeon processors.
Table 10-1. Characteristics of the Caches, TLBs, Store Buffer, and
Write Combining Buffer in IA-32 processors (Contd.)
Cache or Buffer Characteristics
10-4 Vol. 3
MEMORY CACHE CONTROL
The trace cache in the Pentium 4 and Intel Xeon processors is an integral part of the Intel
NetBurst microarchitecture and is available in all execution modes: protected mode, system
management mode (SMM), and real-address mode. The L1,L2, and L3 caches are also available
in all execution modes; however, use of them must be handled carefully in SMM (see Section
13.4.2, “SMRAM Caching”).
The TLBs store the most recently used page-directory and page-table entries. They speed up
memory accesses when paging is enabled by reducing the number of memory accesses that are
required to read the page tables stored in system memory. The TLBs are divided into four
groups: instruction TLBs for 4-KByte pages, data TLBs for 4-KByte pages; instruction TLBs
for large pages (2-MByte or 4-MByte pages), and data TLBs for large pages. The TLBs are
normally active only in protected mode with paging enabled. When paging is disabled or the
processor is in real-address mode, the TLBs maintain their contents until explicitly or implicitly
flushed (see Section 10.9, “Invalidating the Translation Lookaside Buffers (TLBs)”).
The store buffer is associated with the processors instruction execution units. It allows writes to
system memory and/or the internal caches to be saved and in some cases combined to optimize
the processor’s bus accesses. The store buffer is always enabled in all execution modes.
The processor’s caches are for the most part transparent to software. When enabled, instructions
and data flow through these caches without the need for explicit software control. However,
knowledge of the behavior of these caches may be useful in optimizing software performance.
For example, knowledge of cache dimensions and replacement algorithms gives an indication
of how large of a data structure can be operated on at once without causing cache thrashing.
In multiprocessor systems, maintenance of cache consistency may, in rare circumstances,
require intervention by system software. For these rare cases, the processor provides privileged
cache control instructions for use in flushing caches and forcing memory ordering.
The Pentium III, Pentium 4, and Intel Xeon processors introduced several instructions that software
can use to improve the performance of the L1, L2, and L3 caches, including the
PREFETCHh and CLFLUSH instructions and the non-temporal move instructions (MOVNTI,
MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD). The use of these instructions are
discussed in Section 10.5.5, “Cache Management Instructions”.
10.2 CACHING TERMINOLOGY
The IA-32 architecture (beginning with the Pentium processor) uses the MESI (modified, exclusive,
shared, invalid) cache protocol to maintain consistency with internal caches and caches in
other processors (see Section 10.4, “Cache Control Protocol”).
When the processor recognizes that an operand being read from memory is cacheable, the
processor reads an entire cache line into the appropriate cache (L1, L2, L3, or all). This operation
is called a cache line fill. If the memory location containing that operand is still cached the next
time the processor attempts to access the operand, the processor can read the operand from the
cache instead of going back to memory. This operation is called a cache hit.
Vol. 3 10-5
MEMORY CACHE CONTROL
When the processor attempts to write an operand to a cacheable area of memory, it first checks
if a cache line for that memory location exists in the cache. If a valid cache line does exist, the
processor (depending on the write policy currently in force) can write the operand into the cache
instead of writing it out to system memory. This operation is called a write hit. If a write misses
the cache (that is, a valid cache line is not present for area of memory being written to), the
processor performs a cache line fill, write allocation. Then it writes the operand into the cache
line and (depending on the write policy currently in force) can also write it out to memory. If the
operand is to be written out to memory, it is written first into the store buffer, and then written
from the store buffer to memory when the system bus is available. (Note that for the Pentium
processor, write misses do not result in a cache line fill; they always result in a write to memory.
For this processor, only read misses result in cache line fills.)
When operating in an MP system, IA-32 processors (beginning with the Intel486 processor)
have the ability to snoop other processor’s accesses to system memory and to their internal
caches. They use this snooping ability to keep their internal caches consistent both with system
memory and with the caches in other processors on the bus. For example, in the Pentium and P6
family processors, if through snooping one processor detects that another processor intends to
write to a memory location that it currently has cached in shared state, the snooping processor
will invalidate its cache line forcing it to perform a cache line fill the next time it accesses the
same memory location.
Beginning with the P6 family processors, if a processor detects (through snooping) that another
processor is trying to access a memory location that it has modified in its cache, but has not yet
written back to system memory, the snooping processor will signal the other processor (by
means of the HITM# signal) that the cache line is held in modified state and will preform an
implicit write-back of the modified data. The implicit write-back is transferred directly to the
initial requesting processor and snooped by the memory controller to assure that system memory
has been updated. Here, the processor with the valid data may pass the data to the other processors
without actually writing it to system memory; however, it is the responsibility of the
memory controller to snoop this operation and update memory.
10.3 METHODS OF CACHING AVAILABLE
The processor allows any area of system memory to be cached in the L1, L2, and L3 caches. In
individual pages or regions of system memory, it allows the type of caching (also called
memory type) to be specified (see Section 10.5). Memory types currently defined for the IA-32
architecture are as follows (see Table 10-2):
• Strong Uncacheable (UC) —System memory locations are not cached. All reads and
writes appear on the system bus and are executed in program order without reordering. No
speculative memory accesses, page-table walks, or prefetches of speculated branch targets
are made. This type of cache-control is useful for memory-mapped I/O devices. When
used with normal RAM, it greatly reduces processor performance.
10-6 Vol. 3
MEMORY CACHE CONTROL
NOTE
The behavior of FP and SSE/SSE2 operations on operands in UC memory is
implementation dependent. In some implementations, accesses to UC
memory may occur more than once. To ensure predictable behavior, use loads
and stores of general purpose registers to access UC memory that may have
read or write side effects.
• Uncacheable (UC-) — Has same characteristics as the strong uncacheable (UC) memory
type, except that this memory type can be overridden by programming the MTRRs for the
WC memory type. This memory type is available in the Pentium 4, Intel Xeon, and
Pentium III processors and can only be selected through the PAT.
• Write Combining (WC) — System memory locations are not cached (as with
uncacheable memory) and coherency is not enforced by the processor’s bus coherency
protocol. Speculative reads are allowed. Writes may be delayed and combined in the write
combining buffer (WC buffer) to reduce memory accesses. If the WC buffer is partially
filled, the writes may be delayed until the next occurrence of a serializing event; such as,
an SFENCE or MFENCE instruction, CPUID execution, a read or write to uncached
memory, an interrupt occurrence, or a LOCK instruction execution. This type of cachecontrol
is appropriate for video frame buffers, where the order of writes is unimportant as
long as the writes update memory so they can be seen on the graphics display. See Section
10.3.1, “Buffering of Write Combining Memory Locations”, for more information about
caching the WC memory type. This memory type is available in the Pentium Pro and
Pentium II processors by programming the MTRRs or in the Pentium III, Pentium 4, and
Intel Xeon processors by programming the MTRRs or by selecting it through the PAT.
• Write-through (WT) — Writes and reads to and from system memory are cached. Reads
come from cache lines on cache hits; read misses cause cache fills. Speculative reads are
allowed. All writes are written to a cache line (when possible) and through to system
Table 10-2. Memory Types and Their Properties
Memory Type and
Mnemonic
Cacheable Writeback
Cacheable
Allows
Speculative
Reads
Memory Ordering Model
Strong Uncacheable
(UC)
No No No Strong Ordering
Uncacheable (UC-) No No No Strong Ordering. Can only be
selected through the PAT. Can be
overridden by WC in MTRRs.
Write Combining (WC) No No Yes Weak Ordering. Available by
programming MTRRs or by
selecting it through the PAT.
Write Through (WT) Yes No Yes Speculative Processor Ordering.
Write Back (WB) Yes Yes Yes Speculative Processor Ordering.
Write Protected (WP) Yes for
reads; no for
writes
No Yes Speculative Processor Ordering.
Available by programming
MTRRs.
Vol. 3 10-7
MEMORY CACHE CONTROL
memory. When writing through to memory, invalid cache lines are never filled, and valid
cache lines are either filled or invalidated. Write combining is allowed. This type of cachecontrol
is appropriate for frame buffers or when there are devices on the system bus that
access system memory, but do not perform snooping of memory accesses. It enforces
coherency between caches in the processors and system memory.
• Write-back (WB) — Writes and reads to and from system memory are cached. Reads
come from cache lines on cache hits; read misses cause cache fills. Speculative reads are
allowed. Write misses cause cache line fills (in the Pentium 4, Intel Xeon, and P6 family
processors), and writes are performed entirely in the cache, when possible. Write
combining is allowed. The write-back memory type reduces bus traffic by eliminating
many unnecessary writes to system memory. Writes to a cache line are not immediately
forwarded to system memory; instead, they are accumulated in the cache. The modified
cache lines are written to system memory later, when a write-back operation is performed.
Write-back operations are triggered when cache lines need to be deallocated, such as when
new cache lines are being allocated in a cache that is already full. They also are triggered
by the mechanisms used to maintain cache consistency. This type of cache-control
provides the best performance, but it requires that all devices that access system memory
on the system bus be able to snoop memory accesses to insure system memory and cache
coherency.
• Write protected (WP) — Reads come from cache lines when possible, and read misses
cause cache fills. Writes are propagated to the system bus and cause corresponding cache
lines on all processors on the bus to be invalidated. Speculative reads are allowed. This
memory type is available in the Pentium 4, Intel Xeon, and P6 family processors by
programming the MTRRs (see Table 10-6).
Table 10-3 shows which of these caching methods are available in the Pentium, P6 Family,
Pentium 4, and Intel Xeon processors.
Table 10-3. Methods of Caching Available in Pentium 4, Intel Xeon, P6 Family,
and Pentium Processors
Memory Type Pentium 4 and Intel
Xeon Processors
P6 Family Processors Pentium Processor
Strong Uncacheable (UC) Yes Yes Yes
Uncacheable (UC-) Yes Yes* No
Write Combining (WC) Yes Yes No
Write Through (WT) Yes Yes Yes
Write Back (WB) Yes Yes Yes
Write Protected (WP) Yes Yes No
NOTE:
* Introduced in the Pentium III processor; not available in the Pentium Pro or Pentium II processors
10-8 Vol. 3
MEMORY CACHE CONTROL
10.3.1 Buffering of Write Combining Memory Locations
Writes to the WC memory type are not cached in the typical sense of the word cached. They are
retained in an internal write combining buffer (WC buffer) that is separate from the internal L1,
L2, and L3 caches and the store buffer. The WC buffer is not snooped and thus does not provide
data coherency. Buffering of writes to WC memory is done to allow software a small window
of time to supply more modified data to the WC buffer while remaining as non-intrusive to software
as possible. The buffering of writes to WC memory also causes data to be collapsed; that
is, multiple writes to the same memory location will leave the last data written in the location
and the other writes will be lost.
The size and structure of the WC buffer is not architecturally defined. For the Pentium 4 and
Intel Xeon processors, the WC buffer is made up of several 64-byte WC buffers. For the P6
family processors, the WC buffer is made up of several 32-byte WC buffers.
When software begins writing to WC memory, the processor begins filling the WC buffers one
at a time. When one or more WC buffers has been filled, the processor has the option of evicting
the buffers to system memory. The protocol for evicting the WC buffers is implementation
dependent and should not be relied on by software for system memory coherency. When using
the WC memory type, software must be sensitive to the fact that the writing of data to system
memory is being delayed and must deliberately empty the WC buffers when system memory
coherency is required.
Once the processor has started to evict data from the WC buffer into system memory, it will
make a bus-transaction style decision based on how much of the buffer contains valid data. If
the buffer is full (for example, all bytes are valid) the processor will execute a burst-write transaction
on the bus that will result in all 32 bytes (P6 family processors) or 64 bytes (Pentium 4
and Intel Xeon processor) being transmitted on the data bus in a single burst transaction. If one
or more of the WC buffer’s bytes are invalid (for example, have not been written by software)
then the processor will transmit the data to memory using “partial write” transactions (one chunk
at a time, where a “chunk” is 8 bytes).
This will result in a maximum of 4 partial write transactions (for P6 family processors) or 8
partial write transactions (for the Pentium 4 and Intel Xeon processors) for one WC buffer of
data sent to memory.
The WC memory type is weakly ordered by definition. Once the eviction of a WC buffer has
started, the data is subject to the weak ordering semantics of its definition. Ordering is not maintained
between the successive allocation/deallocation of WC buffers (for example, writes to WC
buffer 1 followed by writes to WC buffer 2 may appear as buffer 2 followed by buffer 1 on the
system bus). When a WC buffer is evicted to memory as partial writes there is no guaranteed
ordering between successive partial writes (for example, a partial write for chunk 2 may appear
on the bus before the partial write for chunk 1 or vice versa).
Vol. 3 10-9
MEMORY CACHE CONTROL
The only elements of WC propagation to the system bus that are guaranteed are those provided
by transaction atomicity. For example, with a P6 family processor, a completely full WC buffer
will always be propagated as a single 32-bit burst transaction using any chunk order. In a WC
buffer eviction where the data will be evicted as partials, all data contained in the same chunk
(0 mod 8 aligned) will be propagated simultaneously. Likewise, with a Pentium 4 or Intel Xeon
processor, a full WC buffer will always be propagated as a single burst transactions, using any
chunk order within a transaction. For partial buffer propagations, all data contained in the same
chunk will be propagated simultaneously.
10.3.2 Choosing a Memory Type
The simplest system memory model does not use memory-mapped I/O with read or write side
effects, does not include a frame buffer, and uses the write-back memory type for all memory.
An I/O agent can perform direct memory access (DMA) to write-back memory and the cache
protocol maintains cache coherency.
A system can use strong uncacheable memory for other memory-mapped I/O, and should
always use strong uncacheable memory for memory-mapped I/O with read side effects.
Dual-ported memory can be considered a write side effect, making relatively prompt writes
desirable, because those writes cannot be observed at the other port until they reach the memory
agent. A system can use strong uncacheable, uncacheable, write-through, or write-combining
memory for frame buffers or dual-ported memory that contains pixel values displayed on a
screen. Frame buffer memory is typically large (a few megabytes) and is usually written more
than it is read by the processor. Using strong uncacheable memory for a frame buffer generates
very large amounts of bus traffic, because operations on the entire buffer are implemented using
partial writes rather than line writes. Using write-through memory for a frame buffer can
displace almost all other useful cached lines in the processor's L2 and L3 caches and L1 data
cache. Therefore, systems should use write-combining memory for frame buffers whenever
possible.
Software can use page-level cache control, to assign appropriate effective memory types when
software will not access data structures in ways that benefit from write-back caching. For
example, software may read a large data structure once and not access the structure again until
the structure is rewritten by another agent. Such a large data structure should be marked as
uncacheable, or reading it will evict cached lines that the processor will be referencing again.
A similar example would be a write-only data structure that is written to (to export the data to
another agent), but never read by software. Such a structure can be marked as uncacheable,
because software never reads the values that it writes (though as uncacheable memory, it will be
written using partial writes, while as write-back memory, it will be written using line writes,
which may not occur until the other agent reads the structure and triggers implicit write-backs).
On the Pentium III, Pentium 4, and Intel Xeon processors, new instructions are provided that
give software greater control over the caching, prefetching, and the write-back characteristics of
data. These instructions allow software to use weakly ordered or processor ordered memory
types to improve processor performance, but when necessary to force strong ordering on
memory reads and/or writes. They also allow software greater control over the caching of data.
10-10 Vol. 3
MEMORY CACHE CONTROL
For a description of these instructions and there intended use, see Section 10.5.5, “Cache
Management Instructions”.
10.4 CACHE CONTROL PROTOCOL
The following section describes the cache control protocol currently defined for the IA-32 architecture.
This protocol is used by the Pentium 4, Intel Xeon, P6 family, and Pentium processors.
In the L1 data cache and in the L2 and L3 unified caches, the MESI (modified, exclusive, shared,
invalid) cache protocol maintains consistency with caches of other processors. The L1 data
cache and the L2 and L3 unified caches have two MESI status flags per cache line. Each line
can thus be marked as being in one of the states defined in Table 10-4. In general, the operation
of the MESI protocol is transparent to programs.
The L1 instruction cache in P6 family processors implements only the “SI” part of the MESI
protocol, because the instruction cache is not writable. The instruction cache monitors changes
in the data cache to maintain consistency between the caches when instructions are modified.
See Section 10.6, “Self-Modifying Code”, for more information on the implications of caching
instructions.
10.5 CACHE CONTROL
The IA-32 architecture provides a variety of mechanisms for controlling the caching of data and
instructions and for controlling the ordering of reads and writes between the processor, the
caches, and memory. These mechanisms can be divided into two groups:
• Cache control registers and bits — The IA-32 architecture defines several dedicated
registers and various bits within control registers and page- and directory-table entries that
control the caching system memory locations in the L1, L2, and L3 caches. These
mechanisms control the caching of virtual memory pages and of regions of physical
memory.
Table 10-4. MESI Cache Line States
Cache Line State M (Modified) E (Exclusive) S (Shared) I (Invalid)
This cache line is valid? Yes Yes Yes No
The memory copy is… Out of date Valid Valid —
Copies exist in caches of
other processors?
No No Maybe Maybe
A write to this line … Does not go to
the system bus.
Does not go to
the system bus.
Causes the
processor to
gain exclusive
ownership of the
line.
Goes directly to
the system bus.
Vol. 3 10-11
MEMORY CACHE CONTROL
• Cache control and memory ordering instructions — The IA-32 architecture provides
several instructions that control the caching of data, the ordering of memory reads and
writes, and the prefetching of data. These instructions allow software to control the
caching of specific data structures, to control memory coherency for specific locations in
memory, and to force strong memory ordering at specific locations in a program.
The following sections describe these two groups of cache control mechanisms.
10.5.1 Cache Control Registers and Bits
The current IA-32 architecture provides the following cache-control registers and bits for use in
enabling and/or restricting caching to various pages or regions in memory (see Figure 10-2):
• CD flag, bit 30 of control register CR0 — Controls caching of system memory locations
(see Section 2.5, “Control Registers”). If the CD flag is clear, caching is enabled for the
whole of system memory, but may be restricted for individual pages or regions of memory
by other cache-control mechanisms. When the CD flag is set, caching is restricted in the
processor’s caches (cache hierarchy) for the Pentium 4, Intel Xeon, and P6 family
processors and prevented for the Pentium processor (see note below). With the CD flag set,
however, the caches will still respond to snoop traffic. Caches should be explicitly flushed
to insure memory coherency. For highest processor performance, both the CD and the NW
flags in control register CR0 should be cleared. Table 10-5 shows the interaction of the CD
and NW flags.
The effect of setting the CD flag is somewhat different for the Pentium 4, Intel Xeon,
and P6 family processors than for the Pentium processor (see Table 10-5). To insure
memory coherency after the CD flag is set, the caches should be explicitly flushed (see
Section 10.5.3, “Preventing Caching”). Setting the CD flag for the Pentium 4, Intel
Xeon, and P6 family processors modifies cache line fill and update behaviour. Also for
the Pentium 4, Intel Xeon, and P6 family processors, setting the CD flag does not force
strict ordering of memory accesses unless the MTRRs are disabled and/or all memory is
referenced as uncached (see Section 7.2.4, “Strengthening or Weakening the Memory
Ordering Model”).
10-12 Vol. 3
MEMORY CACHE CONTROL
Figure 10-2. Cache-Control Registers and Bits Available in IA-32 Processors
Page-Directory or
Page-Table Entry
TLBs
MTRRs3
Physical Memory
0
FFFFFFFFH2
control overall caching
of system memory
CD and NW Flags PCD and PWT flags
control page-level
caching
G flag controls pagelevel
flushing of TLBs
MTRRs control caching
of selected regions of
physical memory
PC
D
CR3
Control caching of
page directory
PWT
C
D
CR0
NW
Store Buffer
PC
D
PWT
G1
CR4
Enables global pages
PGE
designated with G flag
1. G flag only available in Pentium 4, Intel Xeon, and P6 family
3. MTRRs available only in Pentium 4 and P6 family processors;
similar control available in Pentium processor with the KEN#
and WB/WT# pins.
2. The maximum physical address size is reported by CPUID leaf
function 80000008H. The maximum physical address size of
PAT4
PAT controls caching
of virtual memory
pages
4. PAT available only in Pentium III and Pentium 4 processors.
P4
AT
processors.
IA32_MISC_ENABLE MSR
3rd Level
Cache Disable
FFFFFFFFFH applies only If 36-bit physical addressing is used.
Vol. 3 10-13
MEMORY CACHE CONTROL
Table 10-5. Cache Operating Modes
CD NW Caching and Read/Write Policy L1 L2/L31
0 0 Normal Cache Mode. Highest performance cache operation.
- Read hits access the cache; read misses may cause replacement.
- Write hits update the cache.
- Only writes to shared lines and write misses update system memory.
- Write misses cause cache line fills.
- Write hits can change shared lines to modified under control of the
MTRRs and with associated read invalidation cycle.
- (Pentium processor only.) Write misses do not cause cache line fills.
- (Pentium processor only.) Write hits can change shared lines to
exclusive under control of WB/WT#.
- Invalidation is allowed.
- External snoop traffic is supported.
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
0 1 Invalid setting.
Generates a general-protection exception (#GP) with an error code of 0. NA NA
1 0 No-fill Cache Mode. Memory coherency is maintained.
- (Pentium 4 and Intel Xeon processors.) State of processor after a power
up or reset.
- Read hits access the cache; read misses do not cause replacement
(see Pentium 4 and Intel Xeon processors reference below).
- Write hits update the cache.
- Only writes to shared lines and write misses update system memory.
- Write misses access memory.
- Write hits can change shared lines to exclusive under control of the
MTRRs and with associated read invalidation cycle.
- (Pentium processor only.) Write hits can change shared lines to
exclusive under control of the WB/WT#.
- (Pentium 4, Intel Xeon, and P6 family processors only.) Strict memory
ordering is not enforced unless the MTRRs are disabled and/or all
memory is referenced as uncached (see Section 7.2.4., “Strengthening
or Weakening the Memory Ordering Model”).
- Invalidation is allowed.
- External snoop traffic is supported.
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
1 1 Memory coherency is not maintained.2
- (P6 family and Pentium processors.) State of the processor after a
power up or reset.
- Read hits access the cache; read misses do not cause replacement.
- Write hits update the cache and change exclusive lines to modified.
- Shared lines remain shared after write hit.
- Write misses access memory.
- Invalidation is inhibited when snooping; but is allowed with INVD and
WBINVD instructions.
- External snoop traffic is supported.
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
NOTES:
1. The L2/L3 column in this table is definitive for the Pentium 4, Intel Xeon, and P6 family processors. It is
intended to represent what could be implemented in a system based on a Pentium processor with an
external, platform specific, write-back L2 cache.
2. The Pentium 4 and Intel Xeon processors do not support this mode; setting the CD and NW bits to 1
selects the no-fill cache mode.
10-14 Vol. 3
MEMORY CACHE CONTROL
• NW flag, bit 29 of control register CR0 — Controls the write policy for system memory
locations (see Section 2.5, “Control Registers”). If the NW and CD flags are clear, writeback
is enabled for the whole of system memory, but may be restricted for individual pages
or regions of memory by other cache-control mechanisms. Table 10-5 shows how the other
combinations of CD and NW flags affects caching.
NOTES
For the Pentium 4 and Intel Xeon processors, the NW flag is a don’t care flag;
that is, when the CD flag is set, the processor uses the no-fill cache mode,
regardless of the setting of the NW flag.
For the Pentium processor, when the L1 cache is disabled (the CD and NW
flags in control register CR0 are set), external snoops are accepted in DP
(dual-processor) systems and inhibited in uniprocessor systems.
When snoops are inhibited, address parity is not checked and APCHK# is not
asserted for a corrupt address; however, when snoops are accepted, address
parity is checked and APCHK# is asserted for corrupt addresses.
• PCD flag in the page-directory and page-table entries — Controls caching for
individual page tables and pages, respectively (see Section 3.7.6, “Page-Directory and
Page-Table Entries”). This flag only has effect when paging is enabled and the CD flag in
control register CR0 is clear. The PCD flag enables caching of the page table or page when
clear and prevents caching when set.
• PWT flag in the page-directory and page-table entries — Controls the write policy for
individual page tables and pages, respectively (see Section 3.7.6, “Page-Directory and
Page-Table Entries”). This flag only has effect when paging is enabled and the NW flag in
control register CR0 is clear. The PWT flag enables write-back caching of the page table or
page when clear and write-through caching when set.
• PCD and PWT flags in control register CR3 — Control the global caching and write
policy for the page directory (see Section 2.5, “Control Registers”). The PCD flag enables
caching of the page directory when clear and prevents caching when set. The PWT flag
enables write-back caching of the page directory when clear and write-through caching
when set. These flags do not affect the caching and write policy for individual page tables.
These flags only have effect when paging is enabled and the CD flag in control register
CR0 is clear.
• G (global) flag in the page-directory and page-table entries (introduced to the IA-32
architecture in the P6 family processors) — Controls the flushing of TLB entries for
individual pages. See Section 3.12, “Translation Lookaside Buffers (TLBs)”, for more
information about this flag.
• PGE (page global enable) flag in control register CR4 — Enables the establishment of
global pages with the G flag. See Section 3.12, “Translation Lookaside Buffers (TLBs)”,
for more information about this flag.
Vol. 3 10-15
MEMORY CACHE CONTROL
• Memory type range registers (MTRRs) (introduced in P6 family processors) —
Control the type of caching used in specific regions of physical memory. Any of the
caching types described in Section 10.3, “Methods of Caching Available”, can be selected.
See Section 10.11, “Memory Type Range Registers (MTRRs)”, for a detailed description
of the MTRRs.
• Page Attribute Table (PAT) MSR (introduced in the Pentium III processor) — Extends
the memory typing capabilities of the processor to permit memory types to be assigned on
a page-by-page basis (see Section 10.12, “Page Attribute Table (PAT)”).
• Third-Level Cache Disable flag, bit 6 of the IA32_MISC_ENABLE MSR (introduced
in the Intel Xeon processors) — Allows the L3 cache to be disabled and enabled,
independently of the L1 and L2 caches.
• KEN# and WB/WT# pins (Pentium processor) — Allow external hardware to control
the caching method used for specific areas of memory. They perform similar (but not
identical) functions to the MTRRs in the P6 family processors.
• PCD and PWT pins (Pentium processor) — These pins (which are associated with the
PCD and PWT flags in control register CR3 and in the page-directory and page-table
entries) permit caching in an external L2 cache to be controlled on a page-by-page basis,
consistent with the control exercised on the L1 cache of these processors. The Pentium 4,
Intel Xeon, and P6 family processors do not provide these pins because the L2 cache in
internal to the chip package.
10.5.2 Precedence of Cache Controls
The cache control flags and MTRRs operate hierarchically for restricting caching. That is, if the
CD flag is set, caching is prevented globally (see Table 10-5). If the CD flag is clear, the pagelevel
cache control flags and/or the MTRRs can be used to restrict caching. If there is an overlap
of page-level and MTRR caching controls, the mechanism that prevents caching has precedence.
For example, if an MTRR makes a region of system memory uncachable, a page-level
caching control cannot be used to enable caching for a page in that region. The converse is also
true; that is, if a page-level caching control designates a page as uncachable, an MTRR cannot
be used to make the page cacheable.
In cases where there is a overlap in the assignment of the write-back and write-through caching
policies to a page and a region of memory, the write-through policy takes precedence. The writecombining
policy (which can only be assigned through an MTRR or the PAT) takes precedence
over either write-through or write-back.
The selection of memory types at the page level varies depending on whether PAT is being used
to select memory types for pages, as described in the following sections.
Third-level cache disable flag (bit 6 of the IA32_MISC_ENABLE MSR) takes precedence over
the CD flag, MTRRs, and PAT for the L3 cache. That is, when the third-level cache disable flag
is set (cache disabled), the other cache controls have no affect on the L3 cache; when the flag is
clear (enabled), the cache controls have the same affect on the L3 cache as they have on the L1
and L2 caches.
10-16 Vol. 3
MEMORY CACHE CONTROL
10.5.2.1 Selecting Memory Types for Pentium Pro and Pentium II Processors
The Pentium Pro and Pentium II processors do not support the PAT. Here, the effective memory
type for a page is selected with the MTRRs and the PCD and PWT bits in the page-table or pagedirectory
entry for the page. Table 10-6 describes the mapping of MTRR memory types and
page-level caching attributes to effective memory types, when normal caching is in effect (the
CD and NW flags in control register CR0 are clear). Combinations that appear in gray are implementation-
defined for the Pentium Pro and Pentium II processors. System designers are encouraged
to avoid these implementation-defined combinations.
When normal caching is in effect, the effective memory type shown in Table 10-6 is determined
using the following rules:
1. If the PCD and PWT attributes for the page are both 0, then the effective memory type is
identical to the MTRR-defined memory type.
2. If the PCD flag is set, then the effective memory type is UC.
3. If the PCD flag is clear and the PWT flag is set, the effective memory type is WT for the
WB memory type and the MTRR-defined memory type for all other memory types.
Table 10-6. Effective Page-Level Memory Type for Pentium Pro and
Pentium II Processors
MTRR Memory Type1 PCD Value PWT Value Effective Memory Type
UC X X UC
WC 0 0 WC
0 1 WC
1 0 WC
1 1 UC
WT 0 X WT
1 X UC
WP 0 0 WP
0 1 WP
1 0 WC
1 1 UC
WB 0 0 WB
0 1 WT
1 X UC
NOTE:
1. These effective memory types also apply to the Pentium 4, Intel Xeon, and Pentium III processors
when the PAT bit is not used (set to 0) in page-table and page-directory entries.
Vol. 3 10-17
MEMORY CACHE CONTROL
4. Setting the PCD and PWT flags to opposite values is considered model-specific for the WP
and WC memory types and architecturally-defined for the WB, WT, and UC memory
types.
10.5.2.2 Selecting Memory Types for Pentium 4, Intel Xeon,
and Pentium III Processors
The Pentium 4, Intel Xeon, and Pentium III processors use the PAT to select effective page-level
memory types. Here, a memory type for a page is selected by the MTRRs and the value in a PAT
entry that is selected with the PAT, PCD and PWT bits in a page-table or page-directory entry
(see Section 10.12.3, “Selecting a Memory Type from the PAT”). Table 10-7 describes the
mapping of MTRR memory types and PAT entry types to effective memory types, when normal
caching is in effect (the CD and NW flags in control register CR0 are clear). The combinations
shown in gray are implementation-defined for the Pentium 4, Intel Xeon, and Pentium III processors.
System designers are encouraged to avoid the implementation-defined combinations.
Table 10-7. Effective Page-Level Memory Types for Pentium III, Pentium 4,
and Intel Xeon Processors
MTRR Memory Type PAT Entry Value Effective Memory Type
UC UC UC1
UC- UC1
WC WC
WT UC1
WB UC1
WP UC1
WC UC UC2
UC- WC
WC WC
WT UC2,3
WB WC
WP UC2,3
WT UC UC2
UC- UC2
WC WC
WT WT
WB WT
WP WP3
10-18 Vol. 3
MEMORY CACHE CONTROL
10.5.2.3 Writing Values Across Pages with Different Memory Types
If two adjoining pages in memory have different memory types, and a word or longer operand
is written to a memory location that crosses the page boundary between those two pages, the
operand might be written to memory twice. This action does not present a problem for writes to
actual memory; however, if a device is mapped the memory space assigned to the pages, the
device might malfunction.
10.5.3 Preventing Caching
To disable the L1, L2, and L3 caches after they have been enabled and have received cache fills,
perform the following steps:
1. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag
to 0.
2. Flush all caches using the WBINVD instruction.
WB UC UC2
UC- UC2
WC WC
WT WT
WB WB
WP WP
WP UC UC2
UC- WC3
WC WC
WT WT3
WB WP
WP WP
NOTES:
1. The UC attribute comes from the MTRRs and the processors are not required to snoop their caches
since the data could never have been cached. This attribute is preferred for performance reasons.
2. The UC attribute came from the page-table or page-directory entry and processors are required to check
their caches because the data may be cached due to page aliasing, which is not recommended.
3. These combinations were specified as “undefined” in previous editions of the IA-32 Intel Architecture
Software Developer’s Manual. However, all processors that support both the PAT and the MTRRs determine
the effective page-level memory types for these combinations as given.
Table 10-7. Effective Page-Level Memory Types for Pentium III, Pentium 4,
and Intel Xeon Processors (Contd.)
MTRR Memory Type PAT Entry Value Effective Memory Type
Vol. 3 10-19
MEMORY CACHE CONTROL
3. Disable the MTRRs and set the default memory type to uncached or set all MTRRs for the
uncached memory type (see the discussion of the discussion of the TYPE field and the E
flag in Section 10.11.2.1, “IA32_MTRR_DEF_TYPE MSR”).
The caches must be flushed (step 2) after the CD flag is set to insure system memory coherency.
If the caches are not flushed, cache hits on reads will still occur and data will be read from valid
cache lines.
NOTES
Setting the CD flag in control register CR0 modifies the processor’s caching
behaviour as indicated in Table 10-5, but it does not force the effective
memory type for all physical memory to be UC nor does it force strict
memory ordering. To force the UC memory type and strict memory ordering
on all of physical memory, either the MTRRs must all be programmed for the
UC memory type or they must be disabled.
For the Pentium 4 and Intel Xeon processors, after the sequence of steps
given above has been executed, the cache lines containing the code between
the end of the WBINVD instruction and before the MTRRS have actually
been disabled may be retained in the cache hierarchy. Here, to remove code
from the cache completely, a second WBINVD instruction must be executed
after the MTRRs have been disabled.
10.5.4 Disabling and Enabling the L3 Cache
Third-level cache disable flag (bit 6 of the IA32_MISC_ENABLE MSR) allows the L3 cache
to be disabled and enabled, independently of the L1 and L2 caches. Prior to using this control to
disable or enable the L3 cache, software should disable and flush all the processor caches, as
described earlier in Section 10.5.3, “Preventing Caching”, to prevent of loss of information
stored in the L3 cache. After the L3 cache has been disabled or enabled, caching for the whole
processor can be restored.
10.5.5 Cache Management Instructions
The IA-32 architecture provide several instructions for managing the L1, L2, and L3 caches. The
INVD, WBINVD, and WBINVD instructions are system instructions that operate on the L1, L2,
and L3 caches as a whole. The PREFETCHh and CLFLUSH instructions and the non-temporal
move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD), which
were introduced in SSE/SSE2 extensions, offer more granular control over caching.
The INVD and WBINVD instructions are used to invalidate the contents of the L1, L2, and L3
caches. The INVD instruction invalidates all internal cache entries, then generates a specialfunction
bus cycle that indicates that external caches also should be invalidated. The INVD
instruction should be used with care. It does not force a write-back of modified cache lines;
therefore, data stored in the caches and not written back to system memory will be lost. Unless
there is a specific requirement or benefit to invalidating the caches without writing back the
10-20 Vol. 3
MEMORY CACHE CONTROL
modified lines (such as, during testing or fault recovery where cache coherency with main
memory is not a concern), software should use the WBINVD instruction.
The WBINVD instruction first writes back any modified lines in all the internal caches, then
invalidates the contents of both the L1, L2, and L3 caches. It ensures that cache coherency with
main memory is maintained regardless of the write policy in effect (that is, write-through or
write-back). Following this operation, the WBINVD instruction generates one (P6 family
processors) or two (Pentium and Intel486 processors) special-function bus cycles to indicate to
external cache controllers that write-back of modified data followed by invalidation of external
caches should occur.
The PREFETCHh instructions allow a program to suggest to the processor that a cache line from
a specified location in system memory be prefetched into the cache hierarchy (see Section 10.8,
“Explicit Caching”).
The CLFLUSH instruction allow selected cache lines to be flushed from memory. This instruction
give a program the ability to explicitly free up cache space, when it is known that cached
section of system memory will not be accessed in the near future.
The non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and
MOVNTPD) allow data to be moved from the processor’s registers directly into system memory
without being also written into the L1, L2, and/or L3 caches. These instructions can be used to
prevent cache pollution when operating on data that is going to be modified only once before
being stored back into system memory. These instructions operate on data in the generalpurpose,
MMX, and XMM registers.
10.5.6 L1 Data Cache Context Mode
L1 data cache context mode is a feature of IA-32 processors that support Hyper-Threading Technology.
When CPUID.1:ECX[bit 10] = 1, the processor supports setting L1 data cache context
mode using the L1 data cache context mode flag ( IA32_MISC_ENABLE[bit 24] ). Selectable
modes are adaptive mode (default) and shared mode.
The BIOS is responsible for configuring the L1 data cache context mode.
Vol. 3 10-21
MEMORY CACHE CONTROL
10.5.6.1 Adaptive Mode
Adaptive mode facilitates L1 data cache sharing between logical processors. When running in
adaptive mode, the L1 data cache is shared across logical processors in the same core if:
• CR3 control registers for logical processors sharing the cache are identical.
• The same paging mode is used by logical processors sharing the cache.
In this situation, the entire L1 data cache is available to each logical processor (instead of being
competitively shared).
If CR3 values are different for the logical processors sharing an L1 data cache or the logical
processors use different paging modes, processors compete for cache resources. This reduces
the effective size of the cache for each logical processor. Aliasing of the cache is not allowed
(which prevents data thrashing).
10.5.6.2 Shared Mode
In shared mode, the L1 data cache is competitively shared between logical processors. This is
true even if the logical processors use identical CR3 registers and paging modes.
In shared mode, linear addresses in the L1 data cache can be aliased, meaning that one linear
address in the cache can point to different physical locations. The mechanism for resolving
aliasing can lead to thrashing. For this reason, IA32_MISC_ENABLE[bit 24] = 0 is the
preferred configuration for IA-32 processors that support Hyper-Threading Technology.
10.6 SELF-MODIFYING CODE
A write to a memory location in a code segment that is currently cached in the processor causes
the associated cache line (or lines) to be invalidated. This check is based on the physical address
of the instruction. In addition, the P6 family and Pentium processors check whether a write to a
code segment may modify an instruction that has been prefetched for execution. If the write
affects a prefetched instruction, the prefetch queue is invalidated. This latter check is based on
the linear address of the instruction. For the Pentium 4 and Intel Xeon processors, a write or a
snoop of an instruction in a code segment, where the target instruction is already decoded and
resident in the trace cache, invalidates the entire trace cache. The latter behavior means that
programs that self-modify code can cause severe degradation of performance when run on the
Pentium 4 and Intel Xeon processors.
In practice, the check on linear addresses should not create compatibility problems among IA-32
processors. Applications that include self-modifying code use the same linear address for modifying
and fetching the instruction. Systems software, such as a debugger, that might possibly
modify an instruction using a different linear address than that used to fetch the instruction, will
execute a serializing operation, such as a CPUID instruction, before the modified instruction is
executed, which will automatically resynchronize the instruction cache and prefetch queue. (See
Section 7.1.3, “Handling Self- and Cross-Modifying Code”, for more information about the use
of self-modifying code.)
10-22 Vol. 3
MEMORY CACHE CONTROL
For Intel486 processors, a write to an instruction in the cache will modify it in both the cache
and memory, but if the instruction was prefetched before the write, the old version of the instruction
could be the one executed. To prevent the old instruction from being executed, flush the
instruction prefetch unit by coding a jump instruction immediately after any write that modifies
an instruction.
10.7 IMPLICIT CACHING (PENTIUM 4, INTEL XEON,
AND P6 FAMILY PROCESSORS)
Implicit caching occurs when a memory element is made potentially cacheable, although the
element may never have been accessed in the normal von Neumann sequence. Implicit caching
occurs on the Pentium 4, Intel Xeon, and P6 family processors due to aggressive prefetching,
branch prediction, and TLB miss handling. Implicit caching is an extension of the behavior of
existing Intel386, Intel486, and Pentium processor systems, since software running on these
processor families also has not been able to deterministically predict the behavior of instruction
prefetch.
To avoid problems related to implicit caching, the operating system must explicitly invalidate
the cache when changes are made to cacheable data that the cache coherency mechanism does
not automatically handle. This includes writes to dual-ported or physically aliased memory
boards that are not detected by the snooping mechanisms of the processor, and changes to pagetable
entries in memory.
The code in Example 10-1 shows the effect of implicit caching on page-table entries. The linear
address F000H points to physical location B000H (the page-table entry for F000H contains the
value B000H), and the page-table entry for linear address F000 is PTE_F000.
Example 10-1. Effect of Implicit Caching on Page-Table Entries
mov EAX, CR3 ; Invalidate the TLB
mov CR3, EAX ; by copying CR3 to itself
mov PTE_F000, A000H; Change F000H to point to A000H
mov EBX, [F000H];
Because of speculative execution in the Pentium 4, Intel Xeon, and P6 family processors, the
last MOV instruction performed would place the value at physical location B000H into EBX,
rather than the value at the new physical address A000H. This situation is remedied by placing
a TLB invalidation between the load and the store.
10.8 EXPLICIT CACHING
The Pentium III processor introduced four new instructions, the PREFETCHh instructions, that
provide software with explicit control over the caching of data. These instructions provide
“hints” to the processor that the data requested by a PREFETCHh instruction should be read into
Vol. 3 10-23
MEMORY CACHE CONTROL
cache hierarchy now or as soon as possible, in anticipation of its use. The instructions provide
different variations of the hint that allow selection of the cache level into which data will be read.
The PREFETCHh instructions can help reduce the long latency typically associated with
reading data from memory and thus help prevent processor “stalls.” However, these instructions
should be used judiciously. Overuse can lead to resource conflicts and hence reduce the performance
of an application. Also, these instructions should only be used to prefetch data from
memory; they should not be used to prefetch instructions. For more detailed information on the
proper use of the prefetch instruction, refer to Chapter 6, “Optimizing Cache Usage for the Intel
Pentium 4 Processors”, in the Pentium 4 Processor Optimization Reference Manual (see
Section 1.4, “Related Literature”, for the document order number).
10.9 INVALIDATING THE TRANSLATION LOOKASIDE BUFFERS
(TLBS)
The processor updates its address translation caches (TLBs) transparently to software. Several
mechanisms are available, however, that allow software and hardware to invalidate the TLBs
either explicitly or as a side effect of another operation.
The INVLPG instruction invalidates the TLB for a specific page. This instruction is the most
efficient in cases where software only needs to invalidate a specific page, because it improves
performance over invalidating the whole TLB. This instruction is not affected by the state of the
G flag in a page-directory or page-table entry.
The following operations invalidate all TLB entries except global entries. (A global entry is one
for which the G (global) flag is set in its corresponding page-directory or page-table entry. The
global flag was introduced into the IA-32 architecture in the P6 family processors, see Section
10.5, “Cache Control”.)
• Writing to control register CR3.
• A task switch that changes control register CR3.
The following operations invalidate all TLB entries, irrespective of the setting of the G flag:
• Asserting or de-asserting the FLUSH# pin.
• (Pentium 4, Intel Xeon, and P6 family processors only.) Writing to an MTRR (with a
WRMSR instruction).
• Writing to control register CR0 to modify the PG or PE flag.
• (Pentium 4, Intel Xeon, and P6 family processors only.) Writing to control register CR4 to
modify the PSE, PGE, or PAE flag.
See Section 3.12, “Translation Lookaside Buffers (TLBs)”, for additional information about the
TLBs.
10-24 Vol. 3
MEMORY CACHE CONTROL
10.10 STORE BUFFER
IA-32 processors temporarily store each write (store) to memory in a store buffer. The store
buffer improves processor performance by allowing the processor to continue executing instructions
without having to wait until a write to memory and/or to a cache is complete. It also allows
writes to be delayed for more efficient use of memory-access bus cycles.
In general, the existence of the store buffer is transparent to software, even in systems that use
multiple processors. The processor ensures that write operations are always carried out in
program order. It also insures that the contents of the store buffer are always drained to memory
in the following situations:
• When an exception or interrupt is generated.
• (Pentium 4, Intel Xeon, and P6 family processors only) When a serializing instruction is
executed.
• When an I/O instruction is executed.
• When a LOCK operation is performed.
• (Pentium 4, Intel Xeon, and P6 family processors only) When a BINIT operation is
performed.
• (Pentium III, Pentium 4, and Intel Xeon processors only) When using an SFENCE
instruction to order stores.
• (Pentium 4 and Intel Xeon processors only) When using an MFENCE instruction to order
stores.
The discussion of write ordering in Section 7.2, “Memory Ordering”, gives a detailed description
of the operation of the store buffer.
10.11 MEMORY TYPE RANGE REGISTERS (MTRRS)
The following section pertains only to the Pentium 4, Intel Xeon, and P6 family processors.
The memory type range registers (MTRRs) provide a mechanism for associating the memory
types (see Section 10.3, “Methods of Caching Available”) with physical-address ranges in
system memory. They allow the processor to optimize operations for different types of memory
such as RAM, ROM, frame-buffer memory, and memory-mapped I/O devices. They also
simplify system hardware design by eliminating the memory control pins used for this function
on earlier IA-32 processors and the external logic needed to drive them.
The MTRR mechanism allows up to 96 memory ranges to be defined in physical memory, and
it defines a set of model-specific registers (MSRs) for specifying the type of memory that is
contained in each range. Table 10-8 shows the memory types that can be specified and their
properties; Figure 10-3 shows the mapping of physical memory with MTRRs. See Section 10.3,
“Methods of Caching Available”, for a more detailed description of each memory type.
Following a hardware reset, a Pentium 4, Intel Xeon, or P6 family processor disables all the
fixed and variable MTRRs, which in effect makes all of physical memory uncachable. InitialVol.
3 10-25
MEMORY CACHE CONTROL
ization software should then set the MTRRs to a specific, system-defined memory map. Typically,
the BIOS (basic input/output system) software configures the MTRRs. The operating
system or executive is then free to modify the memory map using the normal page-level cacheability
attributes.
In a multiprocessor system, different Pentium 4, Intel Xeon, or P6 family processors MUST use
the identical MTRR memory map so that software has a consistent view of memory, independent
of the processor executing a program.
Table 10-8. Memory Types That Can Be Encoded in MTRRs
Memory Type and Mnemonic Encoding in MTRR
Uncacheable (UC) 00H
Write Combining (WC) 01H
Reserved* 02H
Reserved* 03H
Write-through (WT) 04H
Write-protected (WP) 05H
Writeback (WB) 06H
Reserved* 7H through FFH
NOTE:
* Use of these encodings results in a general-protection exception (#GP).
10-26 Vol. 3
MEMORY CACHE CONTROL
10.11.1 MTRR Feature Identification
The availability of the MTRR feature is model-specific. Software can determine if MTRRs are
supported on a processor by executing the CPUID instruction and reading the state of the MTRR
flag (bit 12) in the feature information register (EDX).
If the MTRR flag is set (indicating that the processor implements MTRRs), additional information
about MTRRs can be obtained from the 64-bit IA32_MTRRCAP MSR (named MTRRcap
MSR for the P6 family processors). The IA32_MTRRCAP MSR is a read-only MSR that can
be read with the RDMSR instruction. Figure 10-4 shows the contents of the IA32_MTRRCAP
MSR. The functions of the flags and field in this register are as follows:
• VCNT (variable range registers count) field, bits 0 through 7 — Indicates the number
of variable ranges implemented on the processor. The Pentium 4, Intel Xeon, and P6
family processors have eight pairs of MTRRs for setting up eight variable ranges.
• FIX (fixed range registers supported) flag, bit 8 — Fixed range MTRRs
(IA32_MTRR_FIX64K_00000 through IA32_MTRR_FIX4K_0F8000) are supported
when set; no fixed range registers are supported when clear.
Figure 10-3. Mapping Physical Memory With MTRRs
0
FFFFFFFFH
80000H
BFFFFH
C0000H
FFFFFH
100000H
7FFFFH
512 KBytes
256 KBytes
256 KBytes
8 fixed ranges
16 fixed ranges
64 fixed ranges
8 variable ranges
(64-KBytes each)
(16 KBytes each)
(4 KBytes each)
(from 4 KBytes to
maximum size of
Address ranges not
Physical Memory
mapped by an MTRR
are set to a default type
physical memory)
Vol. 3 10-27
MEMORY CACHE CONTROL
• WC (write combining) flag, bit 10 — The write-combining (WC) memory type is
supported when set; the WC type is not supported when clear.
Bit 9 and bits 11 through 63 in the IA32_MTRRCAP MSR are reserved. If software attempts to
write to the IA32_MTRRCAP MSR, a general-protection exception (#GP) is generated.
For the Pentium 4, Intel Xeon, and P6 family processors, the IA32_MTRRCAP MSR always
contai

CHAPTER 9 PROCESSOR MANAGEMENT AND INITIALIZATION

9 Processor Management and Initialization
Vol. 3 9-1
CHAPTER 9 PROCESSOR MANAGEMENT AND INITIALIZATION
This chapter describes the facilities provided for managing processor wide functions and for
initializing the processor. The subjects covered include: processor initialization, x87 FPU
initialization, processor configuration, feature determination, mode switching, the MSRs (in the
Pentium, P6 family, Pentium 4, and Intel Xeon processors), and the MTRRs (in the P6 family,
Pentium 4, and Intel Xeon processors).
9.1 INITIALIZATION OVERVIEW
Following power-up or an assertion of the RESET# pin, each processor on the system bus
performs a hardware initialization of the processor (known as a hardware reset) and an optional
built-in self-test (BIST). A hardware reset sets each processor’s registers to a known state and
places the processor in real-address mode. It also invalidates the internal caches, translation
lookaside buffers (TLBs) and the branch target buffer (BTB). At this point, the action taken
depends on the processor family:
• Pentium 4 and Intel Xeon processors — All the processors on the system bus (including
a single processor in a uniprocessor system) execute the multiple processor (MP) initialization
protocol. The processor that is selected through this protocol as the bootstrap
processor (BSP) then immediately starts executing software-initialization code in the
current code segment beginning at the offset in the EIP register. The application (non-BSP)
processors (APs) go into a Wait For Startup IPI (SIPI) state while the BSP is executing
initialization code. See Section 7.5, “Multiple-Processor (MP) Initialization”, for more
details. Note that in a uniprocessor system, the single Pentium 4 or Intel Xeon processor
automatically becomes the BSP.
• P6 family processors — The action taken is the same as for the Pentium 4 and Intel Xeon
processors (as described in the previous paragraph).
• Pentium processors — In either a single- or dual- processor system, a single Pentium
processor is always pre-designated as the primary processor. Following a reset, the primary
processor behaves as follows in both single- and dual-processor systems. Using the dualprocessor
(DP) ready initialization protocol, the primary processor immediately starts
executing software-initialization code in the current code segment beginning at the offset
in the EIP register. The secondary processor (if there is one) goes into a halt state.
• Intel486 processor — The primary processor (or single processor in a uniprocessor
system) immediately starts executing software-initialization code in the current code
segment beginning at the offset in the EIP register. (The Intel486 does not automatically
execute a DP or MP initialization protocol to determine which processor is the primary
processor.)
9-2 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
The software-initialization code performs all system-specific initialization of the BSP or
primary processor and the system logic.
At this point, for MP (or DP) systems, the BSP (or primary) processor wakes up each AP (or
secondary) processor to enable those processors to execute self-configuration code.
When all processors are initialized, configured, and synchronized, the BSP or primary processor
begins executing an initial operating-system or executive task.
The x87 FPU is also initialized to a known state during hardware reset. x87 FPU software initialization
code can then be executed to perform operations such as setting the precision of the x87
FPU and the exception masks. No special initialization of the x87 FPU is required to switch
operating modes.
Asserting the INIT# pin on the processor invokes a similar response to a hardware reset. The
major difference is that during an INIT, the internal caches, MSRs, MTRRs, and x87 FPU state
are left unchanged (although, the TLBs and BTB are invalidated as with a hardware reset). An
INIT provides a method for switching from protected to real-address mode while maintaining
the contents of the internal caches.
9.1.1 Processor State After Reset
Table 9-1 shows the state of the flags and other registers following power-up for the Pentium 4,
Intel Xeon, P6 family, and Pentium processors. The state of control register CR0 is 60000010H
(see Figure 9-1). This places the processor is in real-address mode with paging disabled.
9.1.2 Processor Built-In Self-Test (BIST)
Hardware may request that the BIST be performed at power-up. The EAX register is cleared
(0H) if the processor passes the BIST. A nonzero value in the EAX register after the BIST indicates
that a processor fault was detected. If the BIST is not requested, the contents of the EAX
register after a hardware reset is 0H.
The overhead for performing a BIST varies between processor families. For example, the BIST
takes approximately 30 million processor clock periods to execute on the Pentium 4 processor.
(This clock count is model-specific, and Intel reserves the right to change the exact number of
periods, for any of the IA-32 processors, without notification.)
Vol. 3 9-3
PROCESSOR MANAGEMENT AND INITIALIZATION
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT
Register
Pentium 4 and Intel Xeon
Processor P6 Family Processor Pentium Processor
EFLAGS1 00000002H 00000002H 00000002H
EIP 0000FFF0H 0000FFF0H 0000FFF0H
CR0 60000010H2 60000010H2 60000010H2
CR2, CR3, CR4 00000000H 00000000H 00000000H
CS Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = F000H
Base = FFFF0000H
Limit = FFFFH
AR = Present, R/W,
Accessed
SS, DS, ES, FS,
GS
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W,
Accessed
EDX 00000FxxH 000n06xxH3 000005xxH
EAX 04 04 04
EBX, ECX, ESI,
EDI, EBP, ESP
00000000H 00000000H 00000000H
ST0 through
ST75
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
Pwr up or Reset: +0.0
FINIT/FNINIT: Unchanged
x87 FPU Control
Word5
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
Pwr up or Reset: 0040H
FINIT/FNINIT: 037FH
x87 FPU Status
Word5
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
x87 FPU Tag
Word5
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
Pwr up or Reset: 5555H
FINIT/FNINIT: FFFFH
x87 FPU Data
Operand and CS
Seg. Selectors5
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
Pwr up or Reset: 0000H
FINIT/FNINIT: 0000H
x87 FPU Data
Operand and
Inst. Pointers5
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
Pwr up or Reset:
00000000H
FINIT/FNINIT: 00000000H
MM0 through
MM75
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
Pentium II and Pentium III
Processors Only—
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
Pentium with MMX
Technology Only—
Pwr up or Reset:
0000000000000000H
INIT or FINIT/FNINIT:
Unchanged
XMM0 through
XMM7
Pwr up or Reset:
0000000000000000H
INIT: Unchanged
Pentium III processor
Only—
Pwr up or Reset:
0000000000000000H
INIT: Unchanged
NA
9-4 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
MXCSR Pwr up or Reset: 1F80H
INIT: Unchanged
Pentium III processor only-
Pwr up or Reset: 1F80H
INIT: Unchanged
NA
GDTR, IDTR Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
LDTR, Task
Register
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
Selector = 0000H
Base = 00000000H
Limit = FFFFH
AR = Present, R/W
DR0, DR1, DR2,
DR3
00000000H 00000000H 00000000H
DR6 FFFF0FF0H FFFF0FF0H FFFF0FF0H
DR7 00000400H 00000400H 00000400H
Time-Stamp
Counter
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Perf. Counters
and Event Select
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
Power up or Reset: 0H
INIT: Unchanged
All Other MSRs Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Data and Code
Cache, TLBs
Invalid Invalid Invalid
Fixed MTRRs Pwr up or Reset: Disabled
INIT: Unchanged
Pwr up or Reset: Disabled
INIT: Unchanged
Not Implemented
Variable MTRRs Pwr up or Reset: Disabled
INIT: Unchanged
Pwr up or Reset: Disabled
INIT: Unchanged
Not Implemented
Machine-Check
Architecture
Pwr up or Reset:
Undefined
INIT: Unchanged
Pwr up or Reset:
Undefined
INIT: Unchanged
Not Implemented
APIC Pwr up or Reset: Enabled
INIT: Unchanged
Pwr up or Reset: Enabled
INIT: Unchanged
Pwr up or Reset: Enabled
INIT: Unchanged
NOTES:
1. The 10 most-significant bits of the EFLAGS register are undefined following a reset. Software should
not depend on the states of any of these bits.
2. The CD and NW flags are unchanged, bit 4 is set to 1, all other bits are cleared.
3. Where “n” is the Extended Model Value for the respective processor.
4. If Built-In Self-Test (BIST) is invoked on power up or reset, EAX is 0 only if all tests passed. (BIST cannot
be invoked during an INIT.)
5. The state of the x87 FPU and MMX registers is not changed by the execution of an INIT.
Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.)
Register
Pentium 4 and Intel Xeon
Processor P6 Family Processor Pentium Processor
Vol. 3 9-5
PROCESSOR MANAGEMENT AND INITIALIZATION
9.1.3 Model and Stepping Information
Following a hardware reset, the EDX register contains component identification and revision
information (see Figure 9-2). For example, the model, family, and processor type returned for
the first processor in the Intel Pentium 4 family is as follows: model (0000B), family (1111B),
and processor type (00B).
The stepping ID field contains a unique identifier for the processor’s stepping ID or revision
level. The extended family and extended model fields were added to the IA-32 architecture in
the Pentium 4 processors.
Figure 9-1. Contents of CR0 Register after Reset
Figure 9-2. Version Information in the EDX Register after Reset
External x87 FPU error reporting: 0
(Not used): 1
No task switch: 0
x87 FPU instructions not trapped: 0
WAIT/FWAIT instructions not trapped: 0
Real-address mode: 0
31 19 16 15 0
PE
30 29 28 18 17 6 5 4 3 2 1
MP
EM
1 NE
TS
PG
CD
NW
WP
AM
Paging disabled: 0
Alignment check disabled: 0
Caching disabled: 1
Not write-through disabled: 1
Write-protect disabled: 0
Reserved Reserved
31 12 11 8 7 4 3 0
EAX
Family (1111B for the Pentium 4 Processor Family)
Model (Beginning with 0000B)
14 13
Processor Type
Family Model
Stepping
ID
15
Model
Extended Extended
Family
24 23 20 19 16
Reserved
9-6 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.1.4 First Instruction Executed
The first instruction that is fetched and executed following a hardware reset is located at physical
address FFFFFFF0H. This address is 16 bytes below the processor’s uppermost physical
address. The EPROM containing the software-initialization code must be located at this address.
The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in
real-address mode. The processor is initialized to this starting address as follows. The CS
register has two parts: the visible segment selector part and the hidden base address part. In realaddress
mode, the base address is normally formed by shifting the 16-bit segment selector value
4 bits to the left to produce a 20-bit base address. However, during a hardware reset, the segment
selector in the CS register is loaded with F000H and the base address is loaded with
FFFF0000H. The starting address is thus formed by adding the base address to the value in the
EIP register (that is, FFFF0000 + FFF0H = FFFFFFF0H).
The first time the CS register is loaded with a new value after a hardware reset, the processor
will follow the normal rule for address translation in real-address mode (that is, [CS base
address = CS segment selector * 16]). To insure that the base address in the CS register remains
unchanged until the EPROM based software-initialization code is completed, the code must not
contain a far jump or far call or allow an interrupt to occur (which would cause the CS selector
value to be changed).
9.2 X87 FPU INITIALIZATION
Software-initialization code can determine the whether the processor contains an x87 FPU by
using the CPUID instruction. The code must then initialize the x87 FPU and set flags in control
register CR0 to reflect the state of the x87 FPU environment.
A hardware reset places the x87 FPU in the state shown in Table 9-1. This state is different from
the state the x87 FPU is placed in following the execution of an FINIT or FNINIT instruction
(also shown in Table 9-1). If the x87 FPU is to be used, the software-initialization code should
execute an FINIT/FNINIT instruction following a hardware reset. These instructions, tag all
data registers as empty, clear all the exception masks, set the TOP-of-stack value to 0, and select
the default rounding and precision controls setting (round to nearest and 64-bit precision).
If the processor is reset by asserting the INIT# pin, the x87 FPU state is not changed.
9.2.1 Configuring the x87 FPU Environment
Initialization code must load the appropriate values into the MP, EM, and NE flags of control
register CR0. These bits are cleared on hardware reset of the processor. Figure 9-2 shows the
suggested settings for these flags, depending on the IA-32 processor being initialized. Initialization
code can test for the type of processor present before setting or clearing these flags.
Vol. 3 9-7
PROCESSOR MANAGEMENT AND INITIALIZATION
The EM flag determines whether floating-point instructions are executed by the x87 FPU (EM
is cleared) or a device-not-available exception (#NM) is generated for all floating-point instructions
so that an exception handler can emulate the floating-point operation (EM = 1). Ordinarily,
the EM flag is cleared when an x87 FPU or math coprocessor is present and set if they are not
present. If the EM flag is set and no x87 FPU, math coprocessor, or floating-point emulator is
present, the processor will hang when a floating-point instruction is executed.
The MP flag determines whether WAIT/FWAIT instructions react to the setting of the TS flag.
If the MP flag is clear, WAIT/FWAIT instructions ignore the setting of the TS flag; if the MP
flag is set, they will generate a device-not-available exception (#NM) if the TS flag is set. Generally,
the MP flag should be set for processors with an integrated x87 FPU and clear for processors
without an integrated x87 FPU and without a math coprocessor present. However, an
operating system can choose to save the floating-point context at every context switch, in which
case there would be no need to set the MP bit.
Table 2-1 shows the actions taken for floating-point and WAIT/FWAIT instructions based on the
settings of the EM, MP, and TS flags.
The NE flag determines whether unmasked floating-point exceptions are handled by generating
a floating-point error exception internally (NE is set, native mode) or through an external interrupt
(NE is cleared). In systems where an external interrupt controller is used to invoke numeric
exception handlers (such as MS-DOS-based systems), the NE bit should be cleared.
9.2.2 Setting the Processor for x87 FPU Software Emulation
Setting the EM flag causes the processor to generate a device-not-available exception (#NM)
and trap to a software exception handler whenever it encounters a floating-point instruction.
(Table 9-2 shows when it is appropriate to use this flag.) Setting this flag has two functions:
• It allows x87 FPU code to run on an IA-32 processor that has neither an integrated x87
FPU nor is connected to an external math coprocessor, by using a floating-point emulator.
• It allows floating-point code to be executed using a special or nonstandard floating-point
emulator, selected for a particular application, regardless of whether an x87 FPU or math
coprocessor is present.
Table 9-2. Recommended Settings of EM and MP Flags on IA-32 Processors
EM MP NE IA-32 processor
1 0 1 Intel486™ SX, Intel386™ DX, and Intel386™ SX processors
only, without the presence of a math coprocessor.
0 1 1 or 0* Pentium 4, Intel Xeon, P6 family, Pentium, Intel486™ DX, and
Intel 487 SX processors, and Intel386 DX and Intel386 SX
processors when a companion math coprocessor is present.
NOTE:
* The setting of the NE flag depends on the operating system being used.
9-8 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
To emulate floating-point instructions, the EM, MP, and NE flag in control register CR0 should
be set as shown in Table 9-3.
Regardless of the value of the EM bit, the Intel486 SX processor generates a device-not-available
exception (#NM) upon encountering any floating-point instruction.
9.3 CACHE ENABLING
The IA-32 processors (beginning with the Intel486 processor) contain internal instruction and
data caches. These caches are enabled by clearing the CD and NW flags in control register CR0.
(They are set during a hardware reset.) Because all internal cache lines are invalid following
reset initialization, it is not necessary to invalidate the cache before enabling caching. Any
external caches may require initialization and invalidation using a system-specific initialization
and invalidation code sequence.
Depending on the hardware and operating system or executive requirements, additional configuration
of the processor’s caching facilities will probably be required. Beginning with the
Intel486 processor, page-level caching can be controlled with the PCD and PWT flags in pagedirectory
and page-table entries. Beginning with the P6 family processors, the memory type
range registers (MTRRs) control the caching characteristics of the regions of physical memory.
(For the Intel486 and Pentium processors, external hardware can be used to control the caching
characteristics of regions of physical memory.) See Chapter 10, Memory Cache Control, for
detailed information on configuration of the caching facilities in the Pentium 4, Intel Xeon, and
P6 family processors and system memory.
Table 9-3. Software Emulation Settings of EM, MP, and NE Flags
CR0 Bit Value
EM 1
MP 0
NE 1
Vol. 3 9-9
PROCESSOR MANAGEMENT AND INITIALIZATION
9.4 MODEL-SPECIFIC REGISTERS (MSRS)
The Pentium 4, Intel Xeon, P6 family, and Pentium processors contain a model-specific registers
(MSRs). These registers are by definition implementation specific; that is, they are not guaranteed
to be supported on future IA-32 processors and/or to have the same functions. The MSRs
are provided to control a variety of hardware- and software-related features, including:
• The performance-monitoring counters (see Section 15.9, “Performance Monitoring
Overview”).
• (Pentium 4, Intel Xeon, and P6 family processors only.) Debug extensions (see Section
15.4, “Last Branch Recording Overview”).
• (Pentium 4, Intel Xeon, and P6 family processors only.) The machine-check exception
capability and its accompanying machine-check architecture (see Chapter 14, Machine-
Check Architecture).
• (Pentium 4, Intel Xeon, and P6 family processors only.) The MTRRs (see Section 10.11,
“Memory Type Range Registers (MTRRs)”).
The MSRs can be read and written to using the RDMSR and WRMSR instructions, respectively.
When performing software initialization of a Pentium 4, Intel Xeon, P6 family, or Pentium
processor, many of the MSRs will need to be initialized to set up things like performance-monitoring
events, run-time machine checks, and memory types for physical memory.
The list of available performance-monitoring counters for the Pentium 4, Intel Xeon, P6 family,
and Pentium processors is given in Appendix A, Performance-Monitoring Events, and the list
of available MSRs for the Pentium 4, Intel Xeon, P6 family, and Pentium processors is given in
Appendix B, Model-Specific Registers (MSRs). The references earlier in this section show
where the functions of the various groups of MSRs are described in this manual.
9.5 MEMORY TYPE RANGE REGISTERS (MTRRS)
Memory type range registers (MTRRs) were introduced into the IA-32 architecture with the
Pentium Pro processor. They allow the type of caching (or no caching) to be specified in system
memory for selected physical address ranges. They allow memory accesses to be optimized for
various types of memory such as RAM, ROM, frame buffer memory, and memory-mapped I/O
devices.
In general, initializing the MTRRs is normally handled by the software initialization code or
BIOS and is not an operating system or executive function. At the very least, all the MTRRs
must be cleared to 0, which selects the uncached (UC) memory type. See Section 10.11,
“Memory Type Range Registers (MTRRs)”, for detailed information on the MTRRs.
9-10 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.6 INITIALIZING SSE/SSE2/SSE3 EXTENSIONS
For processors that contain SSE/SSE2/SSE3 extensions, steps must be taken when initializing
the processor to allow execution of these instructions.
1. Check the CPUID feature flags for the presence of the SSE/SSE2/SSE3 extensions
(respectively: EDX bits 25 and 26, ECX bit 0) and support for the FXSAVE and
FXRSTOR instructions (EDX bit 24). Also check for support for the CLFLUSH
instruction (EDX bit 19). The CPUID feature flags are loaded in the EDX and ECX
registers when the CPUID instruction is executed with a 1 in the EAX register.
2. Set the OSFXSR flag (bit 9 in control register CR4) to indicate that the operating system
supports saving and restoring the SSE/SSE2/SSE3 execution environment (XXM and
MXCSR registers) with the FXSAVE and FXRSTOR instructions, respectively. See
Section 2.5, “Control Registers”, for a description of the OSFXSR flag.
3. Set the OSXMMEXCPT flag (bit 10 in control register CR4) to indicate that the operating
system supports the handling of SSE/SSE2/SSE3 SIMD floating-point exceptions (#XF).
See Section 2.5, “Control Registers”, for a description of the OSXMMEXCPT flag.
4. Set the mask bits and flags in the MXCSR register according to the mode of operation
desired for SSE/SSE2/SSE3 SIMD floating-point instructions. See “MXCSR Control and
Status Register” in Chapter 10 of the IA-32 Intel Architecture Software Developer’s
Manual, Volume 1 for a detailed description of the bits and flags in the MXCSR register.
9.7 SOFTWARE INITIALIZATION FOR REAL-ADDRESS MODE
OPERATION
Following a hardware reset (either through a power-up or the assertion of the RESET# pin) the
processor is placed in real-address mode and begins executing software initialization code from
physical address FFFFFFF0H. Software initialization code must first set up the necessary data
structures for handling basic system functions, such as a real-mode IDT for handling interrupts
and exceptions. If the processor is to remain in real-address mode, software must then load additional
operating-system or executive code modules and data structures to allow reliable execution
of application programs in real-address mode.
If the processor is going to operate in protected mode, software must load the necessary data
structures to operate in protected mode and then switch to protected mode. The protected-mode
data structures that must be loaded are described in Section 9.8, “Software Initialization for
Protected-Mode Operation”.
Vol. 3 9-11
PROCESSOR MANAGEMENT AND INITIALIZATION
9.7.1 Real-Address Mode IDT
In real-address mode, the only system data structure that must be loaded into memory is the IDT
(also called the “interrupt vector table”). By default, the address of the base of the IDT is physical
address 0H. This address can be changed by using the LIDT instruction to change the base
address value in the IDTR. Software initialization code needs to load interrupt- and exceptionhandler
pointers into the IDT before interrupts can be enabled.
The actual interrupt- and exception-handler code can be contained either in EPROM or RAM;
however, the code must be located within the 1-MByte addressable range of the processor in
real-address mode. If the handler code is to be stored in RAM, it must be loaded along with the
IDT.
9.7.2 NMI Interrupt Handling
The NMI interrupt is always enabled (except when multiple NMIs are nested). If the IDT and
the NMI interrupt handler need to be loaded into RAM, there will be a period of time following
hardware reset when an NMI interrupt cannot be handled. During this time, hardware must
provide a mechanism to prevent an NMI interrupt from halting code execution until the IDT and
the necessary NMI handler software is loaded. Here are two examples of how NMIs can be
handled during the initial states of processor initialization:
• A simple IDT and NMI interrupt handler can be provided in EPROM. This allows an NMI
interrupt to be handled immediately after reset initialization.
• The system hardware can provide a mechanism to enable and disable NMIs by passing the
NMI# signal through an AND gate controlled by a flag in an I/O port. Hardware can clear
the flag when the processor is reset, and software can set the flag when it is ready to handle
NMI interrupts.
9.8 SOFTWARE INITIALIZATION FOR PROTECTED-MODE
OPERATION
The processor is placed in real-address mode following a hardware reset. At this point in the
initialization process, some basic data structures and code modules must be loaded into physical
memory to support further initialization of the processor, as described in Section 9.7, “Software
Initialization for Real-Address Mode Operation”. Before the processor can be switched to
protected mode, the software initialization code must load a minimum number of protected
mode data structures and code modules into memory to support reliable operation of the
processor in protected mode. These data structures include the following:
• A IDT.
• A GDT.
• A TSS.
• (Optional) An LDT.
9-12 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
• If paging is to be used, at least one page directory and one page table.
• A code segment that contains the code to be executed when the processor switches to
protected mode.
• One or more code modules that contain the necessary interrupt and exception handlers.
Software initialization code must also initialize the following system registers before the
processor can be switched to protected mode:
• The GDTR.
• (Optional.) The IDTR. This register can also be initialized immediately after switching to
protected mode, prior to enabling interrupts.
• Control registers CR1 through CR4.
• (Pentium 4, Intel Xeon, and P6 family processors only.) The memory type range registers
(MTRRs).
With these data structures, code modules, and system registers initialized, the processor can be
switched to protected mode by loading control register CR0 with a value that sets the PE flag
(bit 0).
9.8.1 Protected-Mode System Data Structures
The contents of the protected-mode system data structures loaded into memory during software
initialization, depend largely on the type of memory management the protected-mode operatingsystem
or executive is going to support: flat, flat with paging, segmented, or segmented with
paging.
To implement a flat memory model without paging, software initialization code must at a
minimum load a GDT with one code and one data-segment descriptor. A null descriptor in the
first GDT entry is also required. The stack can be placed in a normal read/write data segment,
so no dedicated descriptor for the stack is required. A flat memory model with paging also
requires a page directory and at least one page table (unless all pages are 4 MBytes in which case
only a page directory is required). See Section 9.8.3, “Initializing Paging”.
Before the GDT can be used, the base address and limit for the GDT must be loaded into the
GDTR register using an LGDT instruction.
A multi-segmented model may require additional segments for the operating system, as well as
segments and LDTs for each application program. LDTs require segment descriptors in the
GDT. Some operating systems allocate new segments and LDTs as they are needed. This
provides maximum flexibility for handling a dynamic programming environment. However,
many operating systems use a single LDT for all tasks, allocating GDT entries in advance. An
embedded system, such as a process controller, might pre-allocate a fixed number of segments
and LDTs for a fixed number of application programs. This would be a simple and efficient way
to structure the software environment of a real-time system.
Vol. 3 9-13
PROCESSOR MANAGEMENT AND INITIALIZATION
9.8.2 Initializing Protected-Mode Exceptions and Interrupts
Software initialization code must at a minimum load a protected-mode IDT with gate descriptor
for each exception vector that the processor can generate. If interrupt or trap gates are used, the
gate descriptors can all point to the same code segment, which contains the necessary exception
handlers. If task gates are used, one TSS and accompanying code, data, and task segments are
required for each exception handler called with a task gate.
If hardware allows interrupts to be generated, gate descriptors must be provided in the IDT for
one or more interrupt handlers.
Before the IDT can be used, the base address and limit for the IDT must be loaded into the IDTR
register using an LIDT instruction. This operation is typically carried out immediately after
switching to protected mode.
9.8.3 Initializing Paging
Paging is controlled by the PG flag in control register CR0. When this flag is clear (its state
following a hardware reset), the paging mechanism is turned off; when it is set, paging is enabled.
Before setting the PG flag, the following data structures and registers must be initialized:
• Software must load at least one page directory and one page table into physical memory.
The page table can be eliminated if the page directory contains a directory entry pointing to
itself (here, the page directory and page table reside in the same page), or if only 4-MByte
pages are used.
• Control register CR3 (also called the PDBR register) is loaded with the physical base
address of the page directory.
• (Optional) Software may provide one set of code and data descriptors in the GDT or in an
LDT for supervisor mode and another set for user mode.
With this paging initialization complete, paging is enabled and the processor is switched to
protected mode at the same time by loading control register CR0 with an image in which the PG
and PE flags are set. (Paging cannot be enabled before the processor is switched to protected
mode.)
9.8.4 Initializing Multitasking
If the multitasking mechanism is not going to be used and changes between privilege levels are
not allowed, it is not necessary load a TSS into memory or to initialize the task register.
If the multitasking mechanism is going to be used and/or changes between privilege levels are
allowed, software initialization code must load at least one TSS and an accompanying TSS
descriptor. (A TSS is required to change privilege levels because pointers to the privileged-level
0, 1, and 2 stack segments and the stack pointers for these stacks are obtained from the TSS.)
TSS descriptors must not be marked as busy when they are created; they should be marked busy
by the processor only as a side-effect of performing a task switch. As with descriptors for LDTs,
TSS descriptors reside in the GDT.
9-14 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
After the processor has switched to protected mode, the LTR instruction can be used to load a
segment selector for a TSS descriptor into the task register. This instruction marks the TSS
descriptor as busy, but does not perform a task switch. The processor can, however, use the TSS
to locate pointers to privilege-level 0, 1, and 2 stacks. The segment selector for the TSS must be
loaded before software performs its first task switch in protected mode, because a task switch
copies the current task state into the TSS.
After the LTR instruction has been executed, further operations on the task register are
performed by task switching. As with other segments and LDTs, TSSs and TSS descriptors can
be either pre-allocated or allocated as needed.
9.8.5 Initializing IA-32e Mode
On IA-32 processors that support Intel EM64T, the IA32_EFER MSR is cleared on system reset.
The operating system must be in protected mode with paging enabled before attempting to
initialize IA-32e mode. IA-32e mode operation also requires physical-address extensions with
four levels of enhanced paging structures (see Section 3.10, “PAE-Enabled Paging in IA-32e
Mode”).
Operating systems should follow this sequence to initialize IA-32e mode:
1. Starting from protected mode, disable paging by setting CR0.PG = 0. Use the MOV CR0
instruction to disable paging (the instruction must be located in an identity-mapped page).
2. Enable physical-address extensions (PAE) by setting CR4.PAE = 1. Failure to enable PAE
will result in a #GP fault when an attempt is made to initialize IA-32e mode.
3. Load CR3 with the physical base address of the Level 4 page map table (PML4).
4. Enable IA-32e mode by setting IA32_EFER.LME = 1.
5. Enable paging by setting CR0.PG = 1. This causes the processor to set the
IA32_EFER.LMA bit to 1. The MOV CR0 instruction that enables paging and the
following instructions must be located in an identity-mapped page (until such time that a
branch to non-identity mapped pages can be effected).
64-bit mode paging tables must be located in the first 4 GBytes of physical-address space prior
to activating IA-32e mode. This is necessary because the MOV CR3 instruction used to initialize
the page-directory base must be executed in legacy mode prior to activating IA-32e mode
(setting CR0.PG = 1 to enable paging). Because MOV CR3 is executed in protected mode, only
the lower 32 bits of the register are written, limiting the table location to the low 4 GBytes of
memory. Software can relocate the page tables anywhere in physical memory after IA-32e mode
is activated.
The processor performs 64-bit mode consistency checks whenever software attempts to modify
any of the enable bits directly involved in activating IA-32e mode (IA32_EFER.LME, CR0.PG,
and CR4.PAE). It will generate a general protection fault (#GP) if consistency checks fail. 64-bit
mode consistency checks ensure that the processor does not enter an undefined mode or state
with unpredictable behavior.
Vol. 3 9-15
PROCESSOR MANAGEMENT AND INITIALIZATION
64-bit mode consistency checks fail in the following circumstances:
• An attempt is made to enable or disable IA-32e mode while paging is enabled.
• IA-32e mode is enabled and an attempt is made to enable paging prior to enabling
physical-address extensions (PAE).
• IA-32e mode is active and an attempt is made to disable physical-address extensions
(PAE).
• If the current CS has the L-bit set on an attempt to activate IA-32e mode.
• The TR must contain a 16-bit TSS.
9.8.5.1 IA-32e Mode System Data Structures
After activating IA-32e mode, the system-descriptor-table registers (GDTR, LDTR, IDTR, TR)
continue to reference legacy protected-mode descriptor tables. Tables referenced by the descriptors
all reside in the lower 4 GBytes of linear-address space. After activating IA-32e mode,
64-bit operating-systems should use the LGDT, LLDT, LIDT, and LTR instructions to load the
system-descriptor-table registers with references to 64-bit descriptor tables.
9.8.5.2 IA-32e Mode Interrupts and Exceptions
Software must not allow exceptions or interrupts to occur between the time IA-32e mode is activated
and the update of the interrupt-descriptor-table register (IDTR) that establishes references
to a 64-bit interrupt-descriptor table (IDT). This is because the IDT remains in legacy form
immediately after IA-32e mode is activated.
If an interrupt or exception occurs prior to updating the IDTR, a legacy 32-bit interrupt gate will
be referenced and interpreted as a 64-bit interrupt gate with unpredictable results. External interrupts
can be disabled by using the CLI instruction.
Non-maskable interrupts (NMI) must be disabled using external hardware.
9.8.5.3 64-bit Mode and Compatibility Mode Operation
IA-32e mode uses two code segment-descriptor bits (CS.L and CS.D, see Figure 3-8) to control
the operating modes after IA-32e mode is initialized. If CS.L = 1 and CS.D = 0, the processor
is running in 64-bit mode. With this encoding, the default operand size is 32 bits and default
address size is 64 bits. Using instruction prefixes, operand size can be changed to 64 bits or 16
bits; address size can be changed to 32 bits.
When IA-32e mode is active and CS.L = 0, the processor operates in compatibility mode. In this
mode, CS.D controls default operand and address sizes exactly as it does in the legacy IA-32
architecture. Setting CS.D = 1 specifies default operand and address size as 32 bits. Clearing
CS.D to 0 specifies default operand and address size as 16 bits (the CS.L = 1, CS.D = 1 bit
combination is reserved).
9-16 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Compatibility mode execution is selected on a code-segment basis. This mode allows legacy
applications to coexist with 64-bit applications running in 64-bit mode. An operating system
running in IA-32e mode can execute existing 16-bit and 32-bit applications by clearing their
code-segment descriptor’s CS.L bit to 0.
In compatibility mode, the following system-level mechanisms continue to operate using the
IA-32e-mode architectural semantics:
• Linear-to-physical address translation uses the 64-bit mode extended page-translation
mechanism.
• Interrupts and exceptions are handled using the 64-bit mode mechanisms.
• System calls (calls through call gates and SYSENTER/SYSEXIT) are handled using the
IA-32e mode mechanisms.
9.8.5.4 Switching Out of IA-32e Mode Operation
To return from IA-32e mode to paged-protected mode operation. Operating systems must use
the following sequence:
1. Switch to compatibility mode.
2. Deactivate IA-32e mode by clearing CR0.PG = 0. This causes the processor to set
IA32_EFER.LMA = 0. The MOV CR0 instruction used to disable paging and subsequent
instructions must be located in an identity-mapped page.
3. Load CR3 with the physical base address of the legacy page-table-directory base address.
4. Disable IA-32e mode by setting IA32_EFER.LME = 0.
5. Enable legacy paged-protected mode by setting CR0.PG = 1
6. A branch instruction must follow the MOV CR0 that enables paging. Both the MOV CR0
and the branch instruction must be located in an identity-mapped page.
Vol. 3 9-17
PROCESSOR MANAGEMENT AND INITIALIZATION
9.9 MODE SWITCHING
To use the processor in protected mode after hardware or software reset, a mode switch must be
performed from real-address mode. Once in protected mode, software generally does not need
to return to real-address mode. To run software written to run in real-address mode (8086 mode),
it is generally more convenient to run the software in virtual-8086 mode, than to switch back to
real-address mode.
9.9.1 Switching to Protected Mode
Before switching to protected mode from real mode, a minimum set of system data structures
and code modules must be loaded into memory, as described in Section 9.8, “Software Initialization
for Protected-Mode Operation”. Once these tables are created, software initialization
code can switch into protected mode.
Protected mode is entered by executing a MOV CR0 instruction that sets the PE flag in the CR0
register. (In the same instruction, the PG flag in register CR0 can be set to enable paging.)
Execution in protected mode begins with a CPL of 0.
The 32-bit IA-32 processors have slightly different requirements for switching to protected
mode. To insure upwards and downwards code compatibility with all 32-bit IA-32 processors,
it is recommended that the following steps be performed:
1. Disable interrupts. A CLI instruction disables maskable hardware interrupts. NMI
interrupts can be disabled with external circuitry. (Software must guarantee that no
exceptions or interrupts are generated during the mode switching operation.)
2. Execute the LGDT instruction to load the GDTR register with the base address of the
GDT.
3. Execute a MOV CR0 instruction that sets the PE flag (and optionally the PG flag) in
control register CR0.
4. Immediately following the MOV CR0 instruction, execute a far JMP or far CALL
instruction. (This operation is typically a far jump or call to the next instruction in the
instruction stream.)
The JMP or CALL instruction immediately after the MOV CR0 instruction changes the
flow of execution and serializes the processor.
If paging is enabled, the code for the MOV CR0 instruction and the JMP or CALL
instruction must come from a page that is identity mapped (that is, the linear address before
the jump is the same as the physical address after paging and protected mode is enabled).
The target instruction for the JMP or CALL instruction does not need to be identity
mapped.
5. If a local descriptor table is going to be used, execute the LLDT instruction to load the
segment selector for the LDT in the LDTR register.
9-18 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
6. Execute the LTR instruction to load the task register with a segment selector to the initial
protected-mode task or to a writable area of memory that can be used to store TSS
information on a task switch.
7. After entering protected mode, the segment registers continue to hold the contents they had
in real-address mode. The JMP or CALL instruction in step 4 resets the CS register.
Perform one of the following operations to update the contents of the remaining segment
registers.
— Reload segment registers DS, SS, ES, FS, and GS. If the ES, FS, and/or GS registers
are not going to be used, load them with a null selector.
— Perform a JMP or CALL instruction to a new task, which automatically resets the
values of the segment registers and branches to a new code segment.
8. Execute the LIDT instruction to load the IDTR register with the address and limit of the
protected-mode IDT.
9. Execute the STI instruction to enable maskable hardware interrupts and perform the
necessary hardware operation to enable NMI interrupts.
Random failures can occur if other instructions exist between steps 3 and 4 above. Failures will
be readily seen in some situations, such as when instructions that reference memory are inserted
between steps 3 and 4 while in system management mode.
9.9.2 Switching Back to Real-Address Mode
The processor switches from protected mode back to real-address mode if software clears the
PE bit in the CR0 register with a MOV CR0 instruction. A procedure that re-enters real-address
mode should perform the following steps:
1. Disable interrupts. A CLI instruction disables maskable hardware interrupts. NMI
interrupts can be disabled with external circuitry.
2. If paging is enabled, perform the following operations:
— Transfer program control to linear addresses that are identity mapped to physical
addresses (that is, linear addresses equal physical addresses).
— Insure that the GDT and IDT are in identity mapped pages.
— Clear the PG bit in the CR0 register.
— Move 0H into the CR3 register to flush the TLB.
3. Transfer program control to a readable segment that has a limit of 64 KBytes (FFFFH).
This operation loads the CS register with the segment limit required in real-address mode.
Vol. 3 9-19
PROCESSOR MANAGEMENT AND INITIALIZATION
4. Load segment registers SS, DS, ES, FS, and GS with a selector for a descriptor containing
the following values, which are appropriate for real-address mode:
— Limit = 64 KBytes (0FFFFH)
— Byte granular (G = 0)
— Expand up (E = 0)
— Writable (W = 1)
— Present (P = 1)
— Base = any value
The segment registers must be loaded with non-null segment selectors or the segment
registers will be unusable in real-address mode. Note that if the segment registers are not
reloaded, execution continues using the descriptor attributes loaded during protected
mode.
5. Execute an LIDT instruction to point to a real-address mode interrupt table that is within
the 1-MByte real-address mode address range.
6. Clear the PE flag in the CR0 register to switch to real-address mode.
7. Execute a far JMP instruction to jump to a real-address mode program. This operation
flushes the instruction queue and loads the appropriate base and access rights values in the
CS register.
8. Load the SS, DS, ES, FS, and GS registers as needed by the real-address mode code. If any
of the registers are not going to be used in real-address mode, write 0s to them.
9. Execute the STI instruction to enable maskable hardware interrupts and perform the
necessary hardware operation to enable NMI interrupts.
NOTE
All the code that is executed in steps 1 through 9 must be in a single page and
the linear addresses in that page must be identity mapped to physical
addresses.
9-20 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10 INITIALIZATION AND MODE SWITCHING EXAMPLE
This section provides an initialization and mode switching example that can be incorporated into
an application. This code was originally written to initialize the Intel386 processor, but it will
execute successfully on the Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors.
The code in this example is intended to reside in EPROM and to run following a hardware reset
of the processor. The function of the code is to do the following:
• Establish a basic real-address mode operating environment.
• Load the necessary protected-mode system data structures into RAM.
• Load the system registers with the necessary pointers to the data structures and the
appropriate flag settings for protected-mode operation.
• Switch the processor to protected mode.
Figure 9-3 shows the physical memory layout for the processor following a hardware reset and
the starting point of this example. The EPROM that contains the initialization code resides at the
upper end of the processor’s physical memory address range, starting at address FFFFFFFFH
and going down from there. The address of the first instruction to be executed is at FFFFFFF0H,
the default starting address for the processor following a hardware reset.
The main steps carried out in this example are summarized in Table 9-4. The source listing for
the example (with the filename STARTUP.ASM) is given in Example 9-1. The line numbers
given in Table 9-4 refer to the source listing.
The following are some additional notes concerning this example:
• When the processor is switched into protected mode, the original code segment baseaddress
value of FFFF0000H (located in the hidden part of the CS register) is retained and
execution continues from the current offset in the EIP register. The processor will thus
continue to execute code in the EPROM until a far jump or call is made to a new code
segment, at which time, the base address in the CS register will be changed.
• Maskable hardware interrupts are disabled after a hardware reset and should remain
disabled until the necessary interrupt handlers have been installed. The NMI interrupt is
not disabled following a reset. The NMI# pin must thus be inhibited from being asserted
until an NMI handler has been loaded and made available to the processor.
• The use of a temporary GDT allows simple transfer of tables from the EPROM to
anywhere in the RAM area. A GDT entry is constructed with its base pointing to address 0
and a limit of 4 GBytes. When the DS and ES registers are loaded with this descriptor, the
temporary GDT is no longer needed and can be replaced by the application GDT.
• This code loads one TSS and no LDTs. If more TSSs exist in the application, they must be
loaded into RAM. If there are LDTs they may be loaded as well.
Vol. 3 9-21
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-3. Processor State After Reset
Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing
STARTUP.ASM Line
Numbers
From To Description
157 157 Jump (short) to the entry code in the EPROM
162 169 Construct a temporary GDT in RAM with one entry:
0 - null
1 - R/W data segment, base = 0, limit = 4 GBytes
171 172 Load the GDTR to point to the temporary GDT
174 177 Load CR0 with PE flag set to switch to protected mode
179 181 Jump near to clear real mode instruction queue
184 186 Load DS, ES registers with GDT[1] descriptor, so both point to the entire
physical memory space
188 195 Perform specific board initialization that is imposed by the new protected
mode
0
FFFF FFFFH
After Reset
[CS.BASE+EIP] FFFF FFF0H
EIP = 0000 FFF0H
[SP, DS, SS, ES]
FFFF 0000H
64K EPROM
CS.BASE = FFFF 0000H
DS.BASE = 0H
ES.BASE = 0H
SS.BASE = 0H
ESP = 0H
9-22 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.1 Assembler Usage
In this example, the Intel assembler ASM386 and build tools BLD386 are used to assemble and
build the initialization code module. The following assumptions are used when using the Intel
ASM386 and BLD386 tools.
• The ASM386 will generate the right operand size opcodes according to the code-segment
attribute. The attribute is assigned either by the ASM386 invocation controls or in the
code-segment definition.
• If a code segment that is going to run in real-address mode is defined, it must be set to a
USE 16 attribute. If a 32-bit operand is used in an instruction in this code segment (for
example, MOV EAX, EBX), the assembler automatically generates an operand prefix for
the instruction that forces the processor to execute a 32-bit operation, even though its
default code-segment attribute is 16-bit.
• Intel's ASM386 assembler allows specific use of the 16- or 32-bit instructions, for
example, LGDTW, LGDTD, IRETD. If the generic instruction LGDT is used, the defaultsegment
attribute will be used to generate the right opcode.
196 218 Copy the application's GDT from ROM into RAM
220 238 Copy the application's IDT from ROM into RAM
241 243 Load application's GDTR
244 245 Load application's IDTR
247 261 Copy the application's TSS from ROM into RAM
263 267 Update TSS descriptor and other aliases in GDT (GDT alias or IDT alias)
277 277 Load the task register (without task switch) using LTR instruction
282 286 Load SS, ESP with the value found in the application's TSS
287 287 Push EFLAGS value found in the application's TSS
288 288 Push CS value found in the application's TSS
289 289 Push EIP value found in the application's TSS
290 293 Load DS, ES with the value found in the application's TSS
296 296 Perform IRET; pop the above values and enter the application code
Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing (Contd.)
STARTUP.ASM Line
Numbers
From To Description
Vol. 3 9-23
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.2 STARTUP.ASM Listing
Example 9-1 provides high-level sample code designed to move the processor into protected
mode. This listing does not include any opcode and offset information.
Example 9-1. STARTUP.ASM
MS-DOS* 5.0(045-N) 386(TM) MACRO ASSEMBLER STARTUP 09:44:51 08/19/92
PAGE 1
MS-DOS 5.0(045-N) 386(TM) MACRO ASSEMBLER V4.0, ASSEMBLY OF MODULE
STARTUP
OBJECT MODULE PLACED IN startup.obj
ASSEMBLER INVOKED BY: f:\386tools\ASM386.EXE startup.a58 pw (132 )
LINE SOURCE
1 NAME STARTUP
2
3 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
4 ;
5 ; ASSUMPTIONS:
6 ;
7 ; 1. The bottom 64K of memory is ram, and can be used for
8 ; scratch space by this module.
9 ;
10 ; 2. The system has sufficient free usable ram to copy the
11 ; initial GDT, IDT, and TSS
12 ;
13 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
14
15 ; configuration data - must match with build definition
16
17 CS_BASE EQU 0FFFF0000H
18
19 ; CS_BASE is the linear address of the segment STARTUP_CODE
20 ; - this is specified in the build language file
21
22 RAM_START EQU 400H
23
24 ; RAM_START is the start of free, usable ram in the linear
25 ; memory space. The GDT, IDT, and initial TSS will be
26 ; copied above this space, and a small data segment will be
27 ; discarded at this linear address. The 32-bit word at
28 ; RAM_START will contain the linear address of the first
29 ; free byte above the copied tables - this may be useful if
30 ; a memory manager is used.
31
9-24 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
32 TSS_INDEX EQU 10
33
34 ; TSS_INDEX is the index of the TSS of the first task to
35 ; run after startup
36
37
38 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
39
40 ; ------------------------- STRUCTURES and EQU ---------------
41 ; structures for system data
42
43 ; TSS structure
44 TASK_STATE STRUC
45 link DW ?
46 link_h DW ?
47 ESP0 DD ?
48 SS0 DW ?
49 SS0_h DW ?
50 ESP1 DD ?
51 SS1 DW ?
52 SS1_h DW ?
53 ESP2 DD ?
54 SS2 DW ?
55 SS2_h DW ?
56 CR3_reg DD ?
57 EIP_reg DD ?
58 EFLAGS_reg DD ?
59 EAX_reg DD ?
60 ECX_reg DD ?
61 EDX_reg DD ?
62 EBX_reg DD ?
63 ESP_reg DD ?
64 EBP_reg DD ?
65 ESI_reg DD ?
66 EDI_reg DD ?
67 ES_reg DW ?
68 ES_h DW ?
69 CS_reg DW ?
70 CS_h DW ?
71 SS_reg DW ?
72 SS_h DW ?
73 DS_reg DW ?
74 DS_h DW ?
75 FS_reg DW ?
76 FS_h DW ?
77 GS_reg DW ?
78 GS_h DW ?
Vol. 3 9-25
PROCESSOR MANAGEMENT AND INITIALIZATION
79 LDT_reg DW ?
80 LDT_h DW ?
81 TRAP_reg DW ?
82 IO_map_base DW ?
83 TASK_STATE ENDS
84
85 ; basic structure of a descriptor
86 DESC STRUC
87 lim_0_15 DW ?
88 bas_0_15 DW ?
89 bas_16_23 DB ?
90 access DB ?
91 gran DB ?
92 bas_24_31 DB ?
93 DESC ENDS
94
95 ; structure for use with LGDT and LIDT instructions
96 TABLE_REG STRUC
97 table_lim DW ?
98 table_linear DD ?
99 TABLE_REG ENDS
100
101 ; offset of GDT and IDT descriptors in builder generated GDT
102 GDT_DESC_OFF EQU 1*SIZE(DESC)
103 IDT_DESC_OFF EQU 2*SIZE(DESC)
104
105 ; equates for building temporary GDT in RAM
106 LINEAR_SEL EQU 1*SIZE (DESC)
107 LINEAR_PROTO_LO EQU 00000FFFFH ; LINEAR_ALIAS
108 LINEAR_PROTO_HI EQU 000CF9200H
109
110 ; Protection Enable Bit in CR0
111 PE_BIT EQU 1B
112
113 ; ------------------------------------------------------------
114
115 ; ------------------------- DATA SEGMENT----------------------
116
117 ; Initially, this data segment starts at linear 0, according
118 ; to the processor’s power-up state.
119
120 STARTUP_DATA SEGMENT RW
121
122 free_mem_linear_base LABEL DWORD
123 TEMP_GDT LABEL BYTE ; must be first in segment
124 TEMP_GDT_NULL_DESC DESC <>
125 TEMP_GDT_LINEAR_DESC DESC <>
9-26 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
126
127 ; scratch areas for LGDT and LIDT instructions
128 TEMP_GDT_SCRATCH TABLE_REG <>
129 APP_GDT_RAM TABLE_REG <>
130 APP_IDT_RAM TABLE_REG <>
131 ; align end_data
132 fill DW ?
133
134 ; last thing in this segment - should be on a dword boundary
135 end_data LABEL BYTE
136
137 STARTUP_DATA ENDS
138 ; ------------------------------------------------------------
139
140
141 ; ------------------------- CODE SEGMENT----------------------
142 STARTUP_CODE SEGMENT ER PUBLIC USE16
143
144 ; filled in by builder
145 PUBLIC GDT_EPROM
146 GDT_EPROM TABLE_REG <>
147
148 ; filled in by builder
149 PUBLIC IDT_EPROM
150 IDT_EPROM TABLE_REG <>
151
152 ; entry point into startup code - the bootstrap will vector
153 ; here with a near JMP generated by the builder. This
154 ; label must be in the top 64K of linear memory.
155
156 PUBLIC STARTUP
157 STARTUP:
158
159 ; DS,ES address the bottom 64K of flat linear memory
160 ASSUME DS:STARTUP_DATA, ES:STARTUP_DATA
161 ; See Figure 9-4
162 ; load GDTR with temporary GDT
163 LEA EBX,TEMP_GDT ; build the TEMP_GDT in low ram,
164 MOV DWORD PTR [EBX],0 ; where we can address
165 MOV DWORD PTR [EBX]+4,0
166 MOV DWORD PTR [EBX]+8, LINEAR_PROTO_LO
167 MOV DWORD PTR [EBX]+12, LINEAR_PROTO_HI
168 MOV TEMP_GDT_scratch.table_linear,EBX
169 MOV TEMP_GDT_scratch.table_lim,15
170
171 DB 66H ; execute a 32 bit LGDT
172 LGDT TEMP_GDT_scratch
173
174 ; enter protected mode
Vol. 3 9-27
PROCESSOR MANAGEMENT AND INITIALIZATION
175 MOV EBX,CR0
176 OR EBX,PE_BIT
177 MOV CR0,EBX
178
179 ; clear prefetch queue
180 JMP CLEAR_LABEL
181 CLEAR_LABEL:
182
183 ; make DS and ES address 4G of linear memory
184 MOV CX,LINEAR_SEL
185 MOV DS,CX
186 MOV ES,CX
187
188 ; do board specific initialization
189 ;
190 ;
191 ; ......
192 ;
193
194
195 ; See Figure 9-5
196 ; copy EPROM GDT to ram at:
197 ; RAM_START + size (STARTUP_DATA)
198 MOV EAX,RAM_START
199 ADD EAX,OFFSET (end_data)
200 MOV EBX,RAM_START
201 MOV ECX, CS_BASE
202 ADD ECX, OFFSET (GDT_EPROM)
203 MOV ESI, [ECX].table_linear
204 MOV EDI,EAX
205 MOVZX ECX, [ECX].table_lim
206 MOV APP_GDT_ram[EBX].table_lim,CX
207 INC ECX
208 MOV EDX,EAX
209 MOV APP_GDT_ram[EBX].table_linear,EAX
210 ADD EAX,ECX
211 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
212
213 ; fixup GDT base in descriptor
214 MOV ECX,EDX
215 MOV [EDX].bas_0_15+GDT_DESC_OFF,CX
216 ROR ECX,16
217 MOV [EDX].bas_16_23+GDT_DESC_OFF,CL
218 MOV [EDX].bas_24_31+GDT_DESC_OFF,CH
219
220 ; copy EPROM IDT to ram at:
221 ; RAM_START+size(STARTUP_DATA)+SIZE (EPROM GDT)
9-28 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
222 MOV ECX, CS_BASE
223 ADD ECX, OFFSET (IDT_EPROM)
224 MOV ESI, [ECX].table_linear
225 MOV EDI,EAX
226 MOVZX ECX, [ECX].table_lim
227 MOV APP_IDT_ram[EBX].table_lim,CX
228 INC ECX
229 MOV APP_IDT_ram[EBX].table_linear,EAX
230 MOV EBX,EAX
231 ADD EAX,ECX
232 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
233
234 ; fixup IDT pointer in GDT
235 MOV [EDX].bas_0_15+IDT_DESC_OFF,BX
236 ROR EBX,16
237 MOV [EDX].bas_16_23+IDT_DESC_OFF,BL
238 MOV [EDX].bas_24_31+IDT_DESC_OFF,BH
239
240 ; load GDTR and IDTR
241 MOV EBX,RAM_START
242 DB 66H ; execute a 32 bit LGDT
243 LGDT APP_GDT_ram[EBX]
244 DB 66H ; execute a 32 bit LIDT
245 LIDT APP_IDT_ram[EBX]
246
247 ; move the TSS
248 MOV EDI,EAX
249 MOV EBX,TSS_INDEX*SIZE(DESC)
250 MOV ECX,GDT_DESC_OFF ;build linear address for TSS
251 MOV GS,CX
252 MOV DH,GS:[EBX].bas_24_31
253 MOV DL,GS:[EBX].bas_16_23
254 ROL EDX,16
255 MOV DX,GS:[EBX].bas_0_15
256 MOV ESI,EDX
257 LSL ECX,EBX
258 INC ECX
259 MOV EDX,EAX
260 ADD EAX,ECX
261 REP MOVS BYTE PTR ES:[EDI],BYTE PTR DS:[ESI]
262
263 ; fixup TSS pointer
264 MOV GS:[EBX].bas_0_15,DX
265 ROL EDX,16
266 MOV GS:[EBX].bas_24_31,DH
267 MOV GS:[EBX].bas_16_23,DL
268 ROL EDX,16
269 ;save start of free ram at linear location RAMSTART
270 MOV free_mem_linear_base+RAM_START,EAX
Vol. 3 9-29
PROCESSOR MANAGEMENT AND INITIALIZATION
271
272 ;assume no LDT used in the initial task - if necessary,
273 ;code to move the LDT could be added, and should resemble
274 ;that used to move the TSS
275
276 ; load task register
277 LTR BX ; No task switch, only descriptor loading
278 ; See Figure 9-6
279 ; load minimal set of registers necessary to simulate task
280 ; switch
281
282
283 MOV AX,[EDX].SS_reg ; start loading registers
284 MOV EDI,[EDX].ESP_reg
285 MOV SS,AX
286 MOV ESP,EDI ; stack now valid
287 PUSH DWORD PTR [EDX].EFLAGS_reg
288 PUSH DWORD PTR [EDX].CS_reg
289 PUSH DWORD PTR [EDX].EIP_reg
290 MOV AX,[EDX].DS_reg
291 MOV BX,[EDX].ES_reg
292 MOV DS,AX ; DS and ES no longer linear memory
293 MOV ES,BX
294
295 ; simulate far jump to initial task
296 IRETD
297
298 STARTUP_CODE ENDS
*** WARNING #377 IN 298, (PASS 2) SEGMENT CONTAINS PRIVILEGED
INSTRUCTION(S)
299
300 END STARTUP, DS:STARTUP_DATA, SS:STARTUP_DATA
301
302
ASSEMBLY COMPLETE, 1 WARNING, NO ERRORS.
9-30 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-4. Constructing Temporary GDT and Switching to Protected Mode (Lines
162-172 of List File)
FFFF FFFFH
Base=0, Limit=4G
START: [CS.BASE+EIP]
TEMP_GDT
• Jump near start
FFFF 0000H
• Construct TEMP_GDT
• LGDT
• Move to protected mode
DS, ES = GDT[1] 4 GB
0
GDT [1]
GDT [0]
GDT_SCRATCH
Base
Limit
Vol. 3 9-31
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-5. Moving the GDT, IDT, and TSS from ROM to RAM (Lines 196-261 of List File)
FFFF FFFFH
GDT RAM
• Move the GDT, IDT, TSS
• Fix Aliases
• LTR
0
RAM_START
TSS
IDT
GDT
TSS RAM
IDT RAM
from ROM to RAM
9-32 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Figure 9-6. Task Switching (Lines 282-296 of List File)
GDT RAM
RAM_START
TSS RAM
IDT RAM
GDT Alias
IDT Alias
DS
EIP
EFLAGS
CS
SS
0
ES
ESP

••
•••
SS = TSS.SS
ESP = TSS.ESP
PUSH TSS.EFLAG
PUSH TSS.CS
PUSH TSS.EIP
ES = TSS.ES
DS = TSS.DS
IRET
GDT
Vol. 3 9-33
PROCESSOR MANAGEMENT AND INITIALIZATION
9.10.3 MAIN.ASM Source Code
The file MAIN.ASM shown in Example 9-2 defines the data and stack segments for this application
and can be substituted with the main module task written in a high-level language that is
invoked by the IRET instruction executed by STARTUP.ASM.
Example 9-2. MAIN.ASM
NAME main_module
data SEGMENT RW
dw 1000 dup(?)
DATA ENDS
stack stackseg 800
CODE SEGMENT ER use32 PUBLIC
main_start:
nop
nop
nop
CODE ENDS
END main_start, ds:data, ss:stack
9.10.4 Supporting Files
The batch file shown in Example 9-3 can be used to assemble the source code files
STARTUP.ASM and MAIN.ASM and build the final application.
Example 9-3. Batch File to Assemble and Build the Application
ASM386 STARTUP.ASM
ASM386 MAIN.ASM
BLD386 STARTUP.OBJ, MAIN.OBJ buildfile(EPROM.BLD) bootstrap(STARTUP)
Bootload
BLD386 performs several operations in this example:
• It allocates physical memory location to segments and tables.
• It generates tables using the build file and the input files.
• It links object files and resolves references.
• It generates a boot-loadable file to be programmed into the EPROM.
Example 9-4 shows the build file used as an input to BLD386 to perform the above functions.
9-34 Vol. 3
PROCESSOR MANAGEMENT AND INITIALIZATION
Example 9-4. Build File
INIT_BLD_EXAMPLE;
SEGMENT
*SEGMENTS(DPL = 0)
, startup.startup_code(BASE = 0FFFF0000H)
;
TASK
BOOT_TASK(OBJECT = startup, INITIAL,DPL = 0,
NOT INTENABLED)
, PROTECTED_MODE_TASK(OBJECT = main_module,DPL = 0,
NOT INTENABLED)
;
TABLE
GDT (
LOCATION = GDT_EPROM
, ENTRY = (
10: PROTECTED_MODE_TASK
, startup.startup_code
, startup.startup_data
, main_module.data
, main_module.code
, main_module.stack
)
),
IDT (
LOCATION = IDT_EPROM
);
MEMORY
(
RESERVE = (0..3FFFH
-- Area for the GDT, IDT, TSS copied from
ROM
, 60000H..0FFFEFFFFH)
, RANGE = (ROM_AREA = ROM (0FFFF0000H..0FFFFFFFFH))
-- Eprom size 64K
, RANGE = (RAM_AREA = RAM (4000H..05FFFFH))
);
END
Table 9-5 shows the relationship of each build item with an ASM source file.
Vol. 3 9-35
PROCESSOR MANAGEMENT AND INITIALIZATION
9.11 MICROCODE UPDATE FACILITIES
The Pentium 4, Intel Xeon, and P6 family processors have the capability to correct errata by
loading an Intel-supplied data block into the processor. The data block is called a microcode
update. This section describes the mechanisms the BIOS needs to provide in order to use this
feature during system initialization. It also describes a specification that permits the incorporation
of future updates into a system BIOS.
Intel considers the release of a microcode update for a silicon revision to be the equivalent of a
processor stepping and completes a full-stepping level validation for releases of microcode
updates.
A microcode update is used to correct errata in the processor. The BIOS, which has an update
loader, is responsible for loading the update on processors during system initialization (Figure
9-7). There are two steps to this process: the first is to incorporate the necessary update data
blocks into the BIOS; the second is to load update data blocks into the processor.
Table 9-5. Relationship Between BLD Item and ASM Source File
Item ASM386 and Startup.A58
BLD386 Controls
and BLD file Effect
Bootstrap public startup
startup:
bootstrap
start(startup)
Near jump at
0FFFFFFF0H to start.
GDT location public GDT_EP
December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31