Homeworks writing service

A description of the micro architecture of pentium pro processor

Free Coursework

Flip-chip Deschutes core is on the left. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. March 2014 Learn how and when to remove this template message The Pentium Pro incorporated a new microarchitecture in a departure from the Pentium x86 architecture. It has a decoupled, 14-stage superpipelined architecture which used an instruction pool.

Tuning the Pentium Pro Microarchitecture

The Pentium Pro P6 featured many advanced concepts not found in the Pentium, although it wasn't the a description of the micro architecture of pentium pro processor or only x86 processor to implement them see NexGen Nx586 or Cyrix a description of the micro architecture of pentium pro processor.

The Pentium Pro pipeline had extra decode stages to dynamically translate IA-32 instructions into buffered micro-operation sequences which could then be analysed, reordered, and renamed in order to detect parallelizable operations that may be issued to more than one execution unit at once. The Pentium Pro thus featured out of order executionincluding speculative execution via register renaming.

There are three instruction decoders. The decoders are not equal in capability: This restricts the Pentium Pro's ability to decode multiple instructions simultaneously, limiting superscalar execution. The micro-ops are RISC -like; that is, they encode an operation, two sources, and a destination.

The general decoder can generate up to four micro-ops per cycle, whereas the simple decoders can generate one micro-op each per cycle. Thus, x86 instructions that operate on the memory e. Likewise, the simple decoders are limited to instructions that can be translated into one micro-op. Instructions that require more micro-ops than four are translated with the assistance of a sequencer, which generates the required micro-ops over multiple clock cycles. In each clock cycle, up to five micro-ops can be dispatched to five execution units.

The Pentium Pro has a total of six execution units: One of the integer units shares the same ports as the FPU, and therefore the Pentium Pro can only dispatch one integer micro-op and one floating-point micro-op, or two integer micro-ops per a cycle, in addition to micro-ops for the other three execution units. Of the two integer units, only one has the full complement of functions such as a barrel shiftermultiplier and divider.

Pentium Pro (P6) 6th generation x86 History

The second integer unit, which shares paths with the FPU, does not have these facilities and is limited to simple operations such as add, subtract, and the calculation of branch target addresses. The FPU executes floating-point operations. Addition and multiplication are pipelined and have a latency of three and five cycles, respectively.

Division and square-root are not pipelined and are executed in separate units that share the FPU's ports. Division and square root have a latency of 18-36 and 29-69 cycles, respectively.

The smallest number is for single precision 32-bit floating-point numbers and the largest for extended precision 80-bit numbers. Division and square root can operate simultaneously with adds and multiplies, preventing them from executing only when the result has to be stored in the ROB.

After the microprocessor was released, a bug was discovered in the floating point unitcommonly called the "Pentium Pro and Pentium II FPU bug" and by Intel as the "flag erratum". The bug occurs under some circumstances during floating a description of the micro architecture of pentium pro processor conversion when the floating point number won't fit into the smaller integer format, causing the FPU to deviate from its documented behaviour.

Pentium Pro

The bug is considered to be minor and occurs under a description of the micro architecture of pentium pro processor special circumstances that very few, if any, software programs are affected. The Pentium Pro P6 microarchitecture was used in one form or another by Intel for more than a decade. The design's various traits would continue after that in the derivative core called " Banias " in Pentium M and Intel Core Yonahwhich itself would evolve into the Core microarchitecture Core 2 processor in 2006 and onward.

This, together with the high cost of Pentium Pro systems, caused rather lackluster reception among PC enthusiasts at the time. The performance issues on legacy code were later partially mitigated by Intel with the Pentium II.

Navigation menu

Methods to circumvent this included setting VESA drawing to system memory instead of video memory in games such as Quakeand later on utilities such as FASTVID emerged, which could double performance in certain games by enabling the write combining features of the CPU. However, its lack of MMX implementation reduces a description of the micro architecture of pentium pro processor in multimedia applications that made use of those instructions.

At the time, manufacturing technology did not feasibly allow a large L2 cache to be integrated into the processor core. Intel instead placed the L2 die s separately in the package which still allowed it to run at the same clock speed as the CPU core. Additionally, unlike most motherboard-based cache schemes that shared the main system bus with the CPU, the Pentium Pro's cache had its own back-side bus called dual independent bus by Intel.

Because of this, the CPU could read main memory and cache concurrently, greatly reducing a traditional bottleneck.

P6 (microarchitecture)

The cache was also "non-blocking", meaning that the processor could issue more than one cache request at a time up to 4reducing cache-miss penalties. These properties combined to produce an L2 cache that was immensely faster than the motherboard-based caches of older processors.

In multiprocessor configurations, Pentium Pro's integrated cache skyrocketed performance in comparison to architectures which had each CPU sharing a central cache.

Not what you're looking for?

However, this far faster L2 cache did come with some complications. The Pentium Pro's "on-package cache" arrangement was unique. The processor and the cache were on separate dies in the same package and connected closely by a full-speed bus.

The two or three dies had to be bonded together early in the production process, before testing was possible. This meant that a single, tiny flaw in either die made it necessary to discard the entire assembly, which was one of the reasons for the Pentium Pro's relatively low production yield and high cost.

The chip was popular in symmetric multiprocessing configurations, with dual and quad SMP server and workstation setups being commonplace.