What is Cache?
Published on 2022-05-03
Category: Misc
Cache is a type of storage that improves access time to slower memory. It has a small storage space but is very fast. Copies of instructions and data of programs are held here to enhance access times.
How Does Cache Improve Performance Over Regular Storage?
Cache uses temporal and spatial locality to improve performance.
Temporal Locality
Temporal locality means that memory addresses that have been recently used are likely to be used again soon. For example, if you visit www.google.com in a web browser, that website address is saved because it's likely you'll access it again. When you revisit it, the page loads faster.
Spatial Locality
Spatial locality means that if a memory address is used, nearby addresses are likely to be used as well. Continuing the example, when you search on Google, the subsequent pages or resources (which are near the original address) will load faster because they're stored in cache.
Levels of Cache
There are three levels of cache:
L1 Cache
This is the closest cache to the CPU computation units. It is separated into the data cache and instruction cache to avoid pipeline structural hazards. Each L1 cache holds up to 64KB with a 4ns latency per core.
L2 Cache
L2 Cache is slightly farther from the CPU than L1 but offers larger storage space. It usually combines the data and instruction caches. Each L2 cache holds up to 8MB with a 10ns latency per core.
L3 Cache
L3 Cache is located outside the CPU, making it slower to access but providing even larger memory capacity. Each L3 cache holds up to 64MB with a 20ns latency per core.
Comparison with Other Types of Memory
- Register Memory: Holds up to 2KB with 1ns latency.
- Main Memory (RAM): Holds up to 4GB per rank with 100ns latency.
- SSD Storage: Minimum of 1GB with 5ms latency.
- HDD Storage: Minimum of 1GB with 10ms latency.
Cache Hits and Misses
Blocks/Cache Lines
The minimum amount of data transferred is called a block or cache line. Blocks are checked at each level of cache.
Cache Hit
A cache hit occurs when the requested data is found in the cache. The hit rate is calculated as:
Hit Rate = Hits / Total References
Cache Miss
A cache miss happens when the requested data is not found in the cache. The miss rate is calculated as:
Miss Rate = Misses / Total References
There's a "miss penalty," which is the additional time required to fetch a block from a higher-level cache or memory when a miss occurs:
Miss Penalty = Access Time + Transfer Time
Cache misses can cause pipeline stalls, impacting overall performance.
Cache Placement Policies
There are three main types of cache placement policies:
- Direct Mapped
- Fully Associative
- Set Associative
Direct Mapped
In a direct-mapped cache, each memory block is mapped to exactly one cache line. Frame indexes can be computed quickly from memory addresses, allowing faster determination of hits or misses. However, it has a higher cache miss rate if the cache is small or if certain memory access patterns cause frequent collisions.
A direct-mapped cache is essentially a set-associative cache with one way.
Fully Associative
In a fully associative cache, a memory block can be placed in any cache line. This flexibility results in fewer misses but requires more time to compare tags for each slot, even on cache hits. Control logic becomes more complex due to the need to check all possible locations.
A fully associative cache is a set-associative cache with one set.
Set Associative
Set-associative cache combines aspects of direct mapping and full associativity. Blocks are mapped to a set of lines within the cache, reducing conflicts and miss rates compared to direct mapping while being faster than a fully associative cache. Determining the optimal number of ways can be challenging.
Cache Replacement Policies
Common replacement policies include:
- First-In, First-Out (FIFO)
- Least Recently Used (LRU)
- Most Recently Used (MRU)
Modern processors typically use the LRU policy.
Least Recently Used (LRU) Policy
LRU replaces the cache line that has not been used for the longest period. It relies on tracking the access history of cache lines. While effective, LRU can be complex to implement in hardware, so approximations like Not Recently Used (NRU) or Clock algorithms are often used.
The Three C's of Cache Misses
Compulsory Misses
Also known as cold start or first-reference misses, these occur when a cache line is accessed for the first time. The miss rate for compulsory misses is typically low.
Capacity Misses
Capacity misses happen when the cache cannot contain all the blocks needed during program execution, leading to blocks being discarded and reloaded. Increasing cache size can reduce capacity misses.
Conflict Misses
Conflict misses occur when multiple blocks compete for the same cache line or set, especially in direct-mapped or low-associativity caches. Increasing the number of ways in a set-associative cache can reduce conflict misses.
Write Policies
In data cache traffic, writes account for a small percentage (approximately 21%), so optimizing for reads is often prioritized. Writes are handled "on the side."
Write Hit Policies
When a write operation hits data already in the cache:
- Write-Back: Data is written only to the cache, with the write to lower-level memory deferred until the cache line is evicted.
- Write-Through: Data is written to both the cache and lower-level memory simultaneously.
Write Miss Policies
When a write operation misses the cache:
- Write-Allocate: The cache allocates space for the missed cache line, and the write is performed there.
- No-Write-Allocate: The write bypasses the cache and is written directly to lower-level memory.
Write-Back and Write-Allocate
Pros:
- Fewer DRAM writes
- Faster write speed
- Reduced DRAM bandwidth and power consumption
Cons:
- Cache and DRAM can become inconsistent
- Potential for increased cache miss rates upon eviction
Write-Through and No-Write-Allocate
Pros:
- Easy to implement
- Cache does not get polluted with write data
- Cache and DRAM remain consistent
Cons:
- Increased DRAM bandwidth and power usage
- Slower write speed
Write Buffer Flush Policy
Write buffers improve write speed and reduce DRAM bandwidth, making them a suitable solution for write-through and no-write-allocate policies.
Conclusion
Understanding cache memory and its associated policies is crucial for optimizing system performance. Cache leverages temporal and spatial locality to speed up data access, and various levels and policies help balance speed and storage capacity. By effectively managing cache hits and misses, placement, and write strategies, we can significantly enhance computing efficiency.