How a Computer Works (2015)
12. CPU
Central Processing Unit
The central processing unit (CPU) is the heart of any computer. The CPU performs many tasks on data. Most data at one stage will travel through the CPU.
The CPU performs calculations on data, routes data to memory and organizes data. The chances are your computer will contain a CPU from either Intel or AMD.
Intel’s Pentium processors are amongst the most widely used.
Different CPU speeds are available, the higher speed CPU’s process data much faster, so the computer will perform tasks faster.
Clock
The CPU is attached to the motherboard. A quartz crystal clock generates timing pulses that are fed into the CPU and other microchips. These timing pulses keep the CPU data processing in step with all the other microchips on the motherboard. Because of the huge amount of heat generated by the CPU, a fan (not shown) keeps it cool.
Intel Pentium
Here we describe the internal operation of an Intel Pentium processor.
The Intel Pentium CPU microprocessor uses millions of transistors on its two silicon circuits.
One circuit is the main CPU. A memory cache, named L2 is the second. Both silicon circuits are embedded in the one package with connection leads underneath.
The CPU and memory cache are 64bits wide. Data bits move around the Pentium up to 100Mhz.
Each data bits movement is controlled by a clock pulse so all movements happen at the same time.
The timing cycle ensures data moves around at the same speed.
Internal Pentium Diagram
BIU
Idle
Data is read into the BIU and placed in the L2 cache
BIU copies data to L1 caches
When data bits reach the CPU data is connected to the CPUs bus interface unit (BIU). Once the BIU receives information it makes a copy of it.
One copy is sent to the L2 memory cache the other to L1 memory caches on the main CPU silicon circuit.
There are several L1 memory caches on the main CPUs silicon circuit and range in size from 8-16KB.
The BIU sends code to the L1 instruction cache, or I-cache. Data is sent to the Data cache (D-cache) to be used by the code.
BTB
The Fetch/Decode Unit pulls data from the L1 I-Cache the BTB also has this data
The fetch/decode unit pulls instruction code from the I-cache, at this time the branch target buffer (BTB) compares each instruction code with a record in a separate memory buffer to see if it’s been used before.
The BTB is searching for any code which involves branching. This is because the program code could follow separate paths.
If the BTB finds a branch type instruction it predicts where the program will go. The BTB does this from past experience; its predications are over 90 percent accurate.
Reorder Buffer (ROB)
Data is sent to the ROB
The Dispatch/Execute unit checks and executes code
Three decoders working in parallel break up the larger instructions into uops (mu-ops) these are smaller 274 bit micro-operations.
The dispatch/execute unit processes a uops faster than a single higher-level instruction code.
The decode unit sends all uops to the Reorder Buffer (ROB) also called the instruction pool. The Reorder Buffer contains two arithmetic logic units (ALUs).
The ALUs handle all integer number calculations and contain the uops in the order the BTB predicted.
Dispatch/Execute
The ALUs use a circular buffer with a head and tail to mark the beginning and end of the uops lines. From here the dispatch/execute unit checks in the buffer each uop to see whether all the information is there to process it.
If the code is valid the dispatch/execute unit carries out the code. When a uop requires data bits from memory, the execute unit skips over it.
The CPU looks for the information in the L1 memory cache. If the data bits are not there the L2 memory cache is checked.
Memory Cache
If neither memory cache holds the data bits they are retrieved from the main on board memory. Going to the on-board memory slows down data processing, as retrieving data from the on chip caches is much faster than going to on-board memory.
While data bits are being fetched from memory the execute unit continues inspecting each uop in the buffer.
When a valid micro-op has all the required information the unit processes it and stores the results in the uop itself.
The code is then marked as complete.
Floating-Point Math Unit
The Dispatch/Execute unit checks and executes code
The floating point maths unit (FPMU) performs calculations
The execute unit moves onto the next uop in line. This method of processing is called speculative execution as the order of uops in the circular buffer is based on the BTB's branch predictions.
When the end of the buffer is reached, the execution unit starts at the beginning (head) again and checks all the uops to see whether any have received data bits that need to be executed. During instruction processing if a floating-point number is found the ALU's pass this to the floating-point math unit that contains circuits that are optimised to process floating point numbers quickly.
JEU
The JEU unit in action
Delayed uop instructions are processed and the execute unit compares the results with those predicted by the BTB.
If a prediction comparison is wrong the jump execution unit (JEU) moves the end marker from the last uop in line to the predicted uop.
All uops behind the end marker can be ignored and overwritten by new uops.
Retirement Unit
Data is sent to the retirement unit
The retirement unit sends data to the store buffer
The completed data is sent back to the BIU
Data is placed onto the CPUs system bus
The BTB is informed that its prediction was wrong, and that information becomes part of its future predications. During these processes the retirement unit checks the circular buffer to see whether the head uop has been carried out.
In the case it hasn't, the retirement unit keeps checking until it has. The retirement unit checks the second and third uops.
If they have also been executed all three results are sent to the store buffer.
When they arrive here the retirement unit checks them again before they are sent to the main on board memory.