What is Cache?

Published on 2022-05-03

Category: Misc

Cache is a type of storage that improves access time to slower memory. It has a small storage space but is very fast. Copies of instructions and data of programs are held here to enhance access times.

How Does Cache Improve Performance Over Regular Storage?

Cache uses temporal and spatial locality to improve performance.

Temporal Locality

Temporal locality means that memory addresses that have been recently used are likely to be used again soon. For example, if you visit www.google.com in a web browser, that website address is saved because it's likely you'll access it again. When you revisit it, the page loads faster.

Spatial Locality

Spatial locality means that if a memory address is used, nearby addresses are likely to be used as well. Continuing the example, when you search on Google, the subsequent pages or resources (which are near the original address) will load faster because they're stored in cache.

Levels of Cache

There are three levels of cache:

L1 Cache

This is the closest cache to the CPU computation units. It is separated into the data cache and instruction cache to avoid pipeline structural hazards. Each L1 cache holds up to 64KB with a 4ns latency per core.

L2 Cache

L2 Cache is slightly farther from the CPU than L1 but offers larger storage space. It usually combines the data and instruction caches. Each L2 cache holds up to 8MB with a 10ns latency per core.

L3 Cache

L3 Cache is located outside the CPU, making it slower to access but providing even larger memory capacity. Each L3 cache holds up to 64MB with a 20ns latency per core.

Comparison with Other Types of Memory

Register Memory: Holds up to 2KB with 1ns latency.
Main Memory (RAM): Holds up to 4GB per rank with 100ns latency.
SSD Storage: Minimum of 1GB with 5ms latency.
HDD Storage: Minimum of 1GB with 10ms latency.

Cache Hits and Misses

Blocks/Cache Lines

The minimum amount of data transferred is called a block or cache line. Blocks are checked at each level of cache.

Cache Hit

A cache hit occurs when the requested data is found in the cache. The hit rate is calculated as:

Hit Rate = Hits / Total References

Cache Miss

A cache miss happens when the requested data is not found in the cache. The miss rate is calculated as:

Miss Rate = Misses / Total References

There's a "miss penalty," which is the additional time required to fetch a block from a higher-level cache or memory when a miss occurs:

Miss Penalty = Access Time + Transfer Time

Cache misses can cause pipeline stalls, impacting overall performance.

Cache Placement Policies

There are three main types of cache placement policies:

Direct Mapped
Fully Associative
Set Associative

Direct Mapped

In a direct-mapped cache, each memory block is mapped to exactly one cache line. Frame indexes can be computed quickly from memory addresses, allowing faster determination of hits or misses. However, it has a higher cache miss rate if the cache is small or if certain memory access patterns cause frequent collisions.

A direct-mapped cache is essentially a set-associative cache with one way.

Fully Associative

In a fully associative cache, a memory block can be placed in any cache line. This flexibility results in fewer misses but requires more time to compare tags for each slot, even on cache hits. Control logic becomes more complex due to the need to check all possible locations.

A fully associative cache is a set-associative cache with one set.

Set Associative

Set-associative cache combines aspects of direct mapping and full associativity. Blocks are mapped to a set of lines within the cache, reducing conflicts and miss rates compared to direct mapping while being faster than a fully associative cache. Determining the optimal number of ways can be challenging.

Cache Replacement Policies

Common replacement policies include:

First-In, First-Out (FIFO)
Least Recently Used (LRU)
Most Recently Used (MRU)

Modern processors typically use the LRU policy.

Least Recently Used (LRU) Policy

LRU replaces the cache line that has not been used for the longest period. It relies on tracking the access history of cache lines. While effective, LRU can be complex to implement in hardware, so approximations like Not Recently Used (NRU) or Clock algorithms are often used.

The Three C's of Cache Misses

Compulsory Misses

Also known as cold start or first-reference misses, these occur when a cache line is accessed for the first time. The miss rate for compulsory misses is typically low.

Capacity Misses

Capacity misses happen when the cache cannot contain all the blocks needed during program execution, leading to blocks being discarded and reloaded. Increasing cache size can reduce capacity misses.

Conflict Misses

Conflict misses occur when multiple blocks compete for the same cache line or set, especially in direct-mapped or low-associativity caches. Increasing the number of ways in a set-associative cache can reduce conflict misses.

Write Policies

In data cache traffic, writes account for a small percentage (approximately 21%), so optimizing for reads is often prioritized. Writes are handled "on the side."

Write Hit Policies

When a write operation hits data already in the cache:

Write-Back: Data is written only to the cache, with the write to lower-level memory deferred until the cache line is evicted.
Write-Through: Data is written to both the cache and lower-level memory simultaneously.

Write Miss Policies

When a write operation misses the cache:

Write-Allocate: The cache allocates space for the missed cache line, and the write is performed there.
No-Write-Allocate: The write bypasses the cache and is written directly to lower-level memory.

Write-Back and Write-Allocate

Pros:

Fewer DRAM writes
Faster write speed
Reduced DRAM bandwidth and power consumption

Cons:

Cache and DRAM can become inconsistent
Potential for increased cache miss rates upon eviction

Write-Through and No-Write-Allocate

Pros:

Easy to implement
Cache does not get polluted with write data
Cache and DRAM remain consistent

Cons:

Increased DRAM bandwidth and power usage
Slower write speed

Write Buffer Flush Policy

Write buffers improve write speed and reduce DRAM bandwidth, making them a suitable solution for write-through and no-write-allocate policies.

Conclusion

Understanding cache memory and its associated policies is crucial for optimizing system performance. Cache leverages temporal and spatial locality to speed up data access, and various levels and policies help balance speed and storage capacity. By effectively managing cache hits and misses, placement, and write strategies, we can significantly enhance computing efficiency.