不同于pipeline,parallelism需要额外的硬件资源
Instruction-Level Parallelism (ILP)
- Instruction-level parallelism: parallelism among instructions
- Pipelining is one type of ILP: because pipeline executes multiple instructions in parallel
- To increase ILP
- Deeper pipeline
- Less work per stage => shorter clock cycle
- Multiple issue (start multiple instructions in one clock)
- Replicate pipeline stages => multiple pipelines
- Start multiple instructions per clock cycle
- CPI (cycle per ins.)< 1, so use Instructions Per Cycle (IPC)
- E.g., for a 4GHz 4-way multiple-issue, peak rate is 16 BIPS (billion ins. per second), peak CPI = 0.25, peak IPC = 4, but dependencies reduce this in practice.
Two key responsibilities of multiple issue
- Packaging instructions into issue slots
- How many instructions can be issued
- Which instructions should be issued
- Dealing with data and control hazards
Multiple Issue
- Static multiple issue – decision made by compiler
- Compiler groups instructions to be issued together
- Packages them into “issue slots"
- Compiler detects and avoids hazards
- Dynamic multiple issue – decision made by processor
- CPU examines instruction stream and chooses instructions to issue each cycle
- Compiler can help by reordering instructions
- CPU resolves hazards using advanced techniques at runtime
Static Multiple Issue
- Compiler groups instructions into “issue packets”
- Group of instructions that can be issued on a single cycle
- Determined by pipeline resources required
- Think of an issue packet as a very long instruction
- Specifies multiple concurrent operations
- => Very Long Instruction Word (VLIW)
Dynamic Multiple Issue
- The decision is made by the processor during execution
- also called “Superscalar” processors
- CPU decides whether to issue 0, 1, 2, … each cycle
- Avoiding structural and data hazards
- No need for compiler scheduling
- Though it may still help
- Code semantics ensured by the CPU
Why need dynamic multiple issue
- Not all stalls are predicable
- cache misses
- Can’t always schedule around branches
- Branch outcome is dynamically determined
- Different implementations of an ISA have different latencies and hazards
MIPS with Static Dual Issue
Two-issue packets
- Divide instructions into two types: (Whether relates to memory)
- Type 1: ALU or branch instructions
- Type 2: load or store instructions
- In each cycle, execute a type1 and a type2 ins. simultaneously
- avoid data hazard
More instructions executing in parallel
EX data hazard
不能使用forwarding来避免两个packet中的指令引发的stall
- Forwarding avoided stalls with single-issue
- Now can’t use ALU result in load/store in same packet
Load-use hazard
- Still one cycle use latency, but now twoinstructions
More aggressive scheduling required
Improvement of multiple issue
Scheduling Static Multiple Issue
Compiler must remove some/all hazards
Reorder instructions into issue packets
No dependencies with a packet
Possibly some dependencies between packets
- Varies between ISAs; compiler must know!
Pad with nop if necessary

Loop Unrolling
- Replicate loop body to expose more parallelism
- Reduces loop-control overhead
- Use different registers per replication
- Called “register renaming”
- Avoid loop-carried “anti-dependencies”
- Store followed by a load of the same register
- Aka “name dependence” – Reuse of a register name

Dynamic Pipeline Scheduling
Hardware support for reordering the order of instruction execution
Allow the CPU to execute instructions out of order to avoid stalls
- But commit result to registers in order
Example
lw $t0, 20($s2)
addu $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20
Can start sub
while addu
is waiting for lw

Speculation
- “Guess” what to do with an instruction
- Start operation as soon as possible
- Check whether guess was right
- If so, complete the operation
- If not, roll-back and do the right thing
- Common to static and dynamic multiple issue
- Examples:
- Speculate on branch outcome
- Roll back if path taken is different
- Speculate on load
- Roll back if location is updated
Compiler/Hardware Speculation
- Compiler can reorder instructions
- e.g., move load before branch
- Can include “fix-up” instructions to recover from incorrect guess
- Hardware can look ahead for instructions to execute
- Buffer results until it determines they are actually needed
- Flush buffers on incorrect speculation
Speculation and Exceptions
- What if exception occurs on a speculatively executed instruction?
- e.g., speculative load before null-pointer check
- Static speculation
- Can add ISA support for deferring exceptions 【汇编语言的exception】
- Dynamic speculation
- Can buffer exceptions until instruction completion (which may not occur)
Does Multiple Issue Work?
- Yes, but not as much as we’d like
- Programs have real dependencies that limit ILP
- Some dependencies are hard to eliminate
- e.g., pointer aliasing
- Some parallelism is hard to expose
- Limited window size during instruction issue
- Memory delays and limited bandwidth
- Hard to keep pipelines full
- Speculation can help if done well
Power Efficiency (Power Wall)
- Complexity of dynamic scheduling and speculations requires power
- Multiple simpler cores may be better
Fallacies
- Pipelining is easy (!)
- The basic idea is easy
- The devil is in the details
- e.g., detecting data hazards
- Pipelining is independent of technology
- So why haven’t we always done pipelining?
- More transistors make more advanced techniques feasible
- Pipeline-related ISA design needs to take account of technology trends
- e.g., predicated instructions
Pitfalls
- Poor ISA design can make pipelining harder
- e.g., complex instruction sets (VAX, IA-32)
- Significant overhead to make pipelining work
- IA-32 micro-op approach
- e.g., complex addressing modes
- Register update side effects, memory indirection
- e.g., delayed branches
- Advanced pipelines have long delay slots
Concluding Remarks
ISA influences design of datapath and control
Datapath and control influence design of ISA
Pipelining improves instruction throughput using parallelism
- More instructions completed per second
- Latency for each instruction not reduced
Hazards: structural, data, control
Multiple issue and dynamic scheduling (ILP)
- Dependencies limit achievable parallelism
- Complexity leads to the power wall