Performance Issues
Longest delay determines clock period
- Critical path: load instruction
- Instruction memory → register file → ALU → data memory → register file
Not feasible to vary period for different instructions
Violates design principle
- Making the common case fast
We will improve performance by pipelining
MIPS Pipeline
Pipeline: an implementation technique in which multiple instructions are overlapped in execution
Five stages, one step per stage
- IF: Instruction fetch from memory
- ID: Instruction decode & register read
- EX: Execute operation or calculate address
- MEM: Access memory operand
- WB: Write result back to register
Performance: 耗时最长的stage决定时钟周期
Speedup:
- If all stages are balanced (all take the same time)
- $Time,between,instructions_{pipelined} = \frac{Time,between,instructions_{nonpipelined}}{Number,of,stages}$
- else
- 除以耗时最长的stage
Speedup due to increased throughput, Latency does not decrease.
Pipelining and ISA Design
与以x86为代表的CISC指令集比较
- MIPS ISA designed for pipelining
- All instructions are 32-bits
- Easier to fetch and decode in one cycle
- Few and regular instruction formats
- Can decode and read registers in one step
- Load/store addressing
- Can calculate address in 3rd stage, access memory in 4th stage
- Alignment of memory operands
- Memory access takes only one cycle
Hazards
Situations that prevent starting the next instruction in the next cycle
Structure hazards
- A required resource is busy
- RAM数据的读写与ROM指令数据的读取
- Improved by Separating instruction/data caches
Data hazard
- Need to wait for previous instruction to complete its data read/write
- 如赋值后紧接着读取
- Improved by Forwarding & Code Scheduling
Control hazard
- Decisions of control action depends on the previous instruction
- 跳转信号结果的计算
- Improved by Predicting
Structure Hazards
- Conflict for use of a resource
- If MIPS pipeline has only one memory (data and instructions all in one), then
- Load/store requires data access
- Instruction fetch would have to stall for that cycle
- Hence, pipelined datapaths require separate instruction/data memories
- Or separate instruction/data caches
Data Hazards
An instruction depends on completion of data access by a previous instruction
add $s0, $t0, $t1
sub $t2, $s0, $t3
$s0 的写入后立即读取产生 data hazard,产生两个 bubble

Control Hazards
Branch determines flow of control
Fetching next instruction depends on branch outcome

Pipeline can’t always fetch correct instruction
- Still working on ID stage of branch
In MIPS pipeline
- Need to compare registers and compute target early in the pipeline
- Add hardware to do it in ID stage
Improvement of hazard
Forwarding (aka Bypassing)【Improve Data Hazard】
Forwarding can help to solve data hazard
Core idea: Use result immediately when it is computed
- Don’t wait for it to be stored in a register
- Requires extra connections in the data path
- Add a bypassing line to connect the output of EX to the input
通过额外的连线来传输数据

Can’t always avoid stalls by forwarding
- If value not computed when needed
- Can’t forward backward in time
不能完全避免因为内存读取产生的stall,(结果不能再EX stage后获取,只能减少一个bubble)

Code Scheduling to Avoid Stalls【Improve Data Hazard】
- Reorder code to avoid use of load result in the next instruction (avoid “load + exe” pattern)
尽量避免数据从内存载入后的立即使用

Branch Prediction
通过概率避免 stall,不能完全解决
- Longer pipelines can’t readily determine branch outcome early
- Stall penalty becomes unacceptable
- Predict outcome of branch
- Stall only if prediction is wrong
- In MIPS pipeline
- Can predict branches not taken
- Fetch instruction after branch, with no delay
More-Realistic Branch Prediction
Static branch prediction
编译器根据语法进行预测
- Based on typical branch behavior
- Example: loop and if-statement branches
- Predict backward branches taken
- Predict forward branches not taken
Dynamic branch prediction
基于硬件的预测:统计历史;假设与历史相同
- Hardware measures actual branch behavior
- record recent history of each branch
- Assume future behavior will continue the trend
- When wrong, stall while re-fetching, and update history
Summary
- Pipelining improves performance by increasing instruction throughput
- Executes multiple instructions in parallel
- Each instruction has the same latency
- Subject to hazards
- Structure, data, control
- Instruction set design affects complexity of pipeline implementation