Measuring Cache Performance
Components of CPU time
- Program execution cycles 【execution】
- Includes cache hit time
- Memory stall cycles 【IO】
- Mainly from cache misses
With simplifying assumptions:
$,,,,,,,Memory,stall,cycles\ = \frac{memory,access}{Program}\times Miss,Rate \times Miss , Penalty= \frac{Instructions}{Program} \times \frac{Misses}{Instructions} \times Miss,Penalty$
!> 此处需要注意指令内存100%访问,而数据内存访问依指令比例而定
Average Access Time
- Hit time is also important for performance
- Average Memory Access Time (AMAT)
- $AMAT = Hit,time + Miss,rate × Miss,penalty$
Example:
- CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5%
Performance Summary
- When CPU performance increased
- Miss penalty becomes more significant
- CPI=2, Miss=3.44, % of memory stall: 3.44/5.44=63%
- CPI=1, Miss=3.44, % of memory stall: 3.44/4.44=77%
- Decreasing base CPI
- Greater proportion of time spent on memory stalls
- Increasing clock rate
- Memory stalls account for more CPU cycles
- Can’t neglect cache behavior when evaluating system performance
Associative Cache Example
Direct Mapped Cache
- Location determined by address
- Direct mapped: only one choice
- Capacity of cache is not fully exploited
- Miss rate is high
Associative Cache Example

Associative Caches
- Fully associative
- Allow a given block to go in any cache entry
- Requires all entries to be searched at once
- Comparator per entry (expensive)
- n-way set associative
- Each set contains n entries
- Block number determines which set
- (Block number) modulo (#Sets in cache)
- Search all entries in a given set at once
- n comparators (less expensive)
- Trade off:
- Increased associativity decreases miss rate
- But with diminishing returns
Example:


Replacement Policy
- Direct mapped: no choice
- Set associative
- Prefer non-valid entry, if there is one
- Otherwise, choose among entries in the set
- Least-recently used (LRU)
- Choose the one unused for the longest time
- Simple for 2-way, manageable for 4-way, too hard beyond that
- Random:
- Gives approximately the same performance as LRU for high associativity
Multilevel Caches
- Primary cache attached to CPU
- Small, but fast
- Level-2 cache services misses from primary cache
- Larger, slower, but still faster than main memory
- Main memory services L-2 cache misses
- Some high-end systems include L-3 cache
Example:
- Now add L-2 cache
- Access time = 5ns
- Global miss rate to main memory = 0.5%
- Primary miss with L-2 hit
- Penalty = 5ns/0.25ns = 20 cycles
- Primary miss with L-2 miss
- Extra penalty = 400 cycles
- CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4
- Performance ratio = 9/3.4 = 2.6
还有一种题目告诉 L1 hit time determines clock cycle, 此时使用 L1 hit time 作为 clock cycle time 即可。
Multilevel Cache Considerations:
- Primary cache
- Focus on minimal hit time 【!>决定了时钟频率的上限】
- L-2 cache
- Focus on low miss rate to avoid main memory access
- Hit time has less overall impact
- Results
- L-1 cache usually smaller than a single cache
- L-1 block size smaller than L-2 block size
Software Optimization via Blocking
- Goal: maximize accesses to data before it is replaced
- Consider inner loops of DGEMM:
Dependability Measures
Reliability: mean time to failure (MTTF)
Service interruption: mean time to repair (MTTR)
Mean time between failures
- MTBF = MTTF + MTTR
Availability = MTTF / (MTTF + MTTR)
Improving Availability
- Increase MTTF: fault avoidance, fault tolerance, fault forecasting
- Reduce MTTR: fault detection, fault diagnosis and fault repair
The Hamming SEC Code
- Hamming distance
- Number of bits that are different between two bit patterns
- Minimum distance = 2 provides
- single bit error detection
- parity code
- Minimum distance = 3 provides
- single error correction
- 2 bit error detection
Encoding SEC (Single Error Correction)
To calculate Hamming code:
- Number bits from 1 on the left 【索引从1开始】
- All bit positions that are a power 2 are parity bits 【parity bit 的位置】
- Each parity bit checks certain data bits 【能够纠错的原理】
!> 这张图片的规律很重要【如何编码】

Decoding SEC
- Value of parity bits indicates which bits are in error
- Use numbering from encoding procedure
- E.g.
- Parity bits = 0000 indicates no error
- Parity bits = 1010 indicates bit 10 was flipped
SEC/DED Code
- Add an additional parity bit for the whole word ($p_n$)
- Make Hamming distance = 4
- Decoding:
- Let H = SEC parity bits
- H even, $p_n$ even, no error
- H odd, $p_n$ odd, correctable single bit error
- H even, $p_n$ odd, error in $p_n$ bit
- H odd, $p_n$ even, double error occurred
Note: ECC DRAM uses SEC/DED with 8 bits protecting each 64 bits
Summary
- Cache Performance
- Mainly depends on miss rate and miss penalty
- To improve cache performance:
- Fully associative cache
- Set-associative cache
- Replacement policy
- Multilevel cache
- Dependability
- MTTF, MTTR, reliability, availability
- Hamming code: SEC/DED code