CPU - How a Computer Works (2015)

How a Computer Works (2015)

12. CPU

3D_CPU.jpg

Central Processing Unit

PIII_2.jpg

The central processing unit (CPU) is the heart of any computer. The CPU performs many tasks on data. Most data at one stage will travel through the CPU.

The CPU performs calculations on data, routes data to memory and organizes data. The chances are your computer will contain a CPU from either Intel or AMD.

Intel’s Pentium processors are amongst the most widely used.

Different CPU speeds are available, the higher speed CPU’s process data much faster, so the computer will perform tasks faster.

Clock

mb_clock_cpu.jpg

The CPU is attached to the motherboard. A quartz crystal clock generates timing pulses that are fed into the CPU and other microchips. These timing pulses keep the CPU data processing in step with all the other microchips on the motherboard. Because of the huge amount of heat generated by the CPU, a fan (not shown) keeps it cool.

Intel Pentium

PIII_2.jpg

Here we describe the internal operation of an Intel Pentium processor.

The Intel Pentium CPU microprocessor uses millions of transistors on its two silicon circuits.

One circuit is the main CPU. A memory cache, named L2 is the second. Both silicon circuits are embedded in the one package with connection leads underneath.

The CPU and memory cache are 64bits wide. Data bits move around the Pentium up to 100Mhz.

Each data bits movement is controlled by a clock pulse so all movements happen at the same time.

The timing cycle ensures data moves around at the same speed.

pentium_dia0.jpg

Internal Pentium Diagram

BIU

pentium_dia0.jpg

Idle

pentium_dia1.jpg

Data is read into the BIU and placed in the L2 cache

pentium_dia2.jpg

BIU copies data to L1 caches

When data bits reach the CPU data is connected to the CPUs bus interface unit (BIU). Once the BIU receives information it makes a copy of it.

One copy is sent to the L2 memory cache the other to L1 memory caches on the main CPU silicon circuit.

There are several L1 memory caches on the main CPUs silicon circuit and range in size from 8-16KB.

The BIU sends code to the L1 instruction cache, or I-cache. Data is sent to the Data cache (D-cache) to be used by the code.

BTB

pentium_dia3.jpg

The Fetch/Decode Unit pulls data from the L1 I-Cache the BTB also has this data

The fetch/decode unit pulls instruction code from the I-cache, at this time the branch target buffer (BTB) compares each instruction code with a record in a separate memory buffer to see if it’s been used before.

The BTB is searching for any code which involves branching. This is because the program code could follow separate paths.

If the BTB finds a branch type instruction it predicts where the program will go. The BTB does this from past experience; its predications are over 90 percent accurate.

Reorder Buffer (ROB)

pentium_dia4.jpg

Data is sent to the ROB

pentium_dia5.jpg

The Dispatch/Execute unit checks and executes code

Three decoders working in parallel break up the larger instructions into uops (mu-ops) these are smaller 274 bit micro-operations.

The dispatch/execute unit processes a uops faster than a single higher-level instruction code.

The decode unit sends all uops to the Reorder Buffer (ROB) also called the instruction pool. The Reorder Buffer contains two arithmetic logic units (ALUs).

The ALUs handle all integer number calculations and contain the uops in the order the BTB predicted.

Dispatch/Execute

pentium_dia5.jpg

The ALUs use a circular buffer with a head and tail to mark the beginning and end of the uops lines. From here the dispatch/execute unit checks in the buffer each uop to see whether all the information is there to process it.

If the code is valid the dispatch/execute unit carries out the code. When a uop requires data bits from memory, the execute unit skips over it.

The CPU looks for the information in the L1 memory cache. If the data bits are not there the L2 memory cache is checked.

Memory Cache

pentium_dia5.jpg

If neither memory cache holds the data bits they are retrieved from the main on board memory. Going to the on-board memory slows down data processing, as retrieving data from the on chip caches is much faster than going to on-board memory.

While data bits are being fetched from memory the execute unit continues inspecting each uop in the buffer.

When a valid micro-op has all the required information the unit processes it and stores the results in the uop itself.

The code is then marked as complete.

Floating-Point Math Unit

pentium_dia5.jpg

The Dispatch/Execute unit checks and executes code

pentium_dia5.jpg

The floating point maths unit (FPMU) performs calculations

The execute unit moves onto the next uop in line. This method of processing is called speculative execution as the order of uops in the circular buffer is based on the BTB's branch predictions.

When the end of the buffer is reached, the execution unit starts at the beginning (head) again and checks all the uops to see whether any have received data bits that need to be executed. During instruction processing if a floating-point number is found the ALU's pass this to the floating-point math unit that contains circuits that are optimised to process floating point numbers quickly.

JEU

pentium_dia9.jpg

The JEU unit in action

Delayed uop instructions are processed and the execute unit compares the results with those predicted by the BTB.

If a prediction comparison is wrong the jump execution unit (JEU) moves the end marker from the last uop in line to the predicted uop.

All uops behind the end marker can be ignored and overwritten by new uops.

Retirement Unit

pentium_dia10.jpg

Data is sent to the retirement unit

pentium_dia11.jpg

The retirement unit sends data to the store buffer

pentium_dia12.jpg

The completed data is sent back to the BIU

pentium_dia1.jpg

Data is placed onto the CPUs system bus

The BTB is informed that its prediction was wrong, and that information becomes part of its future predications. During these processes the retirement unit checks the circular buffer to see whether the head uop has been carried out.

In the case it hasn't, the retirement unit keeps checking until it has. The retirement unit checks the second and third uops.

If they have also been executed all three results are sent to the store buffer.

When they arrive here the retirement unit checks them again before they are sent to the main on board memory.