Digital Term Papers Term Papers Count: 63,000
    Home     |     Join     |     Login     |     Logout     |     Forgot Password     |     FAQ     |     Contact
Search
   for:      
Term Paper Categories
American History
Anatomy
Physiology
Animal Science
Anthropology
Architecture
Arts
Astronomy
Aviation
Beauty
Biographies
Book Reports
Business
Computers
Creative Writing
Current Events
Economics
Education
Engineering
English
Environmental
Ethics
European History
Foreign Languages
Geography
Government
Politics
Health
History
Human Sexuality
Legal Issues
Marketing
Mathematics
Medicine
Miscellaneous
Movies
Television
Music
Mythology
Philosophy
Physics
Poetry
Political Science
Psychology
Religion
Science
Shakespeare
Social Issues
Sociology
Speech
Sports
Recreation
Supernatural
Technology
Theater
Zoology

Term Papers on A Tour Of The Pentium Pro Processor Microarchitecture

Term Paper TitleA Tour Of The Pentium Pro Processor Microarchitecture
# of Words2578
# of Pages (250 words per page double spaced)10.31

A Tour of the Pentium Pro Processor Microarchitecture

Introduction

One of the Pentium Pro processor's primary goals was to significantly exceed the
performance of the 100MHz Pentium processor while being manufactured on the same
semiconductor process. Using the same process as a volume production processor
practically assured that the Pentium Pro processor would be manufacturable, but
it meant that Intel had to focus on an improved microarchitecture for ALL of the
performance gains. This guided tour describes how multiple architectural
techniques - some proven in mainframe computers, some proposed in academia and
some we innovated ourselves - were carefully interwoven, modified, enhanced,
tuned and implemented to produce the Pentium Pro microprocessor. This unique
combination of architectural features, which Intel describes as Dynamic
Execution, enabled the first Pentium Pro processor silicon to exceed the
original performance goal.

Building from an already high platform

The Pentium processor set an impressive performance standard with its pipelined,
superscalar microarchitecture. The Pentium processor's pipelined implementation
uses five stages to extract high throughput from the silicon - the Pentium Pro
processor moves to a decoupled, 12-stage, superpipelined implementation, trading
less work per pipestage for more stages. The Pentium Pro processor reduced its
pipestage time by 33 percent, compared with a Pentium processor, which means the
Pentium Pro processor can have a 33% higher clock speed than a Pentium processor
and still be equally easy to produce from a semiconductor manufacturing process
(i.e., transistor speed) perspective.

The Pentium processor's superscalar microarchitecture, with its ability to
execute two instructions per clock, would be difficult to exceed without a new
approach. The new approach used by the Pentium Pro processor removes the
constraint of linear instruction sequencing between the traditional "fetch" and
"execute" phases, and opens up a wide instruction window using an instruction
pool. This approach allows the "execute" phase of the Pentium Pro processor to
have much more visibility into the program's instruction stream so that better
scheduling may take place. It requires the instruction "fetch/decode" phase of
the Pentium Pro processor to be much more intelligent in terms of predicting
program flow. Optimized scheduling requires the fundamental "execute" phase to
be replaced by decoupled "dispatch/execute" and "retire" phases. This allows
instructions to be started in any order but always be completed in the original
program order. The Pentium Pro processor is implemented as three independent
engines coupled with an instruction pool as shown in Figure 1 below.

What is the fundamental problem to solve?

Before starting our tour on how the Pentium Pro processor achieves its high
performance it is important to note why this three- independent-engine approach
was taken. A fundamental fact of today's microprocessor implementations must be
appreciated: most CPU cores are not fully utilized. Consider the code fragment
in Figure 2 below:

The first instruction in this example is a load of r1 that, at run time, causes
a cache miss. A traditional CPU core must wait for its bus interface unit to
read this data from main memory and return it before moving on to instruction 2.
This CPU stalls while waiting for this data and is thus being under-utilized.

While CPU speeds have increased 10-fold over the past 10 years, the speed of
main memory devices has only increased by 60 percent. This increasing memory
latency, relative to the CPU core speed, is a fundamental problem that the
Pentium Pro processor set out to solve. One approach would be to place the
burden of this problem onto the chipset but a high-performance CPU that needs
very high speed, specialized, support components is not a good solution for a
volume production system.

A brute-force approach to this problem is, of course, increasing the size of the
L2 cache to re...

This is ONLY a preview of the article. If you would like to view the entire document, you must subscribe to Digital Term Papers. Please register below now!

Digital Term Papers has over 63,000 essays, term papers, and book notes online. Many paper sites will charge you hundreds of dollars for a single paper. Digital Term Papers only charges $14.95 for a one month membership with instant account activation!

Don't waste anymore time! Join NOW!!!

1 Month (automatic renewal) ($14.95)
3 Months (automatic renewal) ($29.95)
6 Months (one-time billing) ($39.95)

Pay by: