Computer Organization and Architecture Cache Memory
Computer Organization and Architecture
Cache Memory
Outline
- Key terms
- Computer Memory System Overview
- Cache Memory Principles
- Elements of Cache Design
- Pentium 4 Cache Organization
- ARM Cache Organization
Key terms
-
access time
-
associative mapping
-
cache hit
-
cache line
-
cache memory
-
cache miss
-
data cache
-
direct access
-
direct mapping
-
high-performance computing
-
hit ratio
-
instruction cache
-
L1
cache -
L2
cache -
L3
cache -
Locality
-
logical cache
-
memory hierarchy
-
multilevel cache
-
physical cache
-
random access
-
replacement algorithm
-
set-associative mapping
-
spatial locality
-
split cache
-
Tag
-
temporal locality
-
unified cache
-
virtual cache
-
write back
-
write once
-
write through
Computer Memory System Overview
Memory
-
In a
Von Neumann structured computer
, memory does not distinguish between instructions and data -
The performance of memory is constantly improving, but it is far from the speed of CPU
-
Leading to a growing performance difference between them, which seriously affects the overall performance of the computer
Characteristics of Memory Systems
Location
-
The storage location mainly refers to whether the memory is inside or outside of the computer
-
Inside memory
- Registers and Cache in CPU
- Internal Cache and main memory
-
The processor needs to access these stores through the I/O controller, so it is called an external memory

Capacity Unit
-
Bit: 0 or 1
-
Byte: 8 bit
-
Word size
-
The natural unit of organisation
-
Not a fixed length
-
Now it is generally 16 bit, 32 bit, or 64 bit
-
Most registers are one word
-
Integers may be in words
-
Instruction length is generally the same as the word
-
Bus width
-
Capacity
-
Capacity: the amount of information can be stored
-
Number of addressable units
-
Unit size
-
-
Addressable unit: refers to a storage unit that can be uniquely addressed in memory
-
! ! ! $Capacity=Addressable\ unit\ number \times unit\ size\newline$
- Number of words
- Number of Bytes
Unit of transfer
-
Internal
- Usually governed by data bus width, word
-
External
- Usually a block which is much larger than a word
-
Addressable unit
-
Smallest location which can be uniquely addressed
-
Word internally
-
Cluster on external disks
-
Access methods ! ! !
-
Sequential
-
Start at the beginning and read through in order
-
Access time depends on location of data and previous location
-
e.g. tape
-
-
Direct
-
Individual blocks have unique address
-
Access is by jumping to vicinity(nearly) plus sequential search
-
Access time depends on location and previous location
-
-
Random
-
Individual addresses identify locations exactly
-
Access time is independent of location or previous access
-
e.g. RAM
-
-
Associative
-
Data is located by a comparison with contents of a portion of the store
-
Access time is independent of location or previous access
-
e.g. cache
-
Performance ! ! !
-
Access time
-
Time between presenting the address and getting the valid data
-
For RAM: time to address the unit and perform the transfer
-
For non-RAM: time to position the R/W head over the desired location
-
-
Memory Cycle time
-
Applied to RAM
-
Time may be required for the memory to “recover” before next access
-
$Memory\ Cycle\ time=access\ time+additional\ time\ required\ before\ next\ access\newline$
-
-
Transfer Rate
-
Rate at which data can be moved
-
! ! ! For RAM: $Rate = \frac{1}{Cycle\ time}\newline$
-
! ! ! For non-RAM: $Rate=\frac{N}{T_N-T_A}\newline$
-
$N: number\ of\ bits\newline$
-
$T_N:time\ to\ write\ or\ read\ N\ bits\newline$
-
$T_A=average\ access\ time(time\ for\ location)\newline$
-
-
Physical types
-
Semiconductor
- RAM
-
Magnetic
- Disk & Tape
-
Optical
- CD & DVD
-
Others
-
Bubble
-
Hologram
-
Physical characteristics
-
Decay or not
-
Decay: Flash disk
-
No decay: Hard disk
-
-
Volatility or not
-
Volatility: Memory
-
No volatility: Hard disk
-
-
Erasable or not
-
Erasable :Hard disk
-
No erasable: CD-ROM
-
-
Power consumption
The Memory Hierarchy
Design Objectives
-
Major design objective of any memory system
-
To provide adequate storage capacity
-
An acceptable level of performance
-
At a reasonable cost
-
Design methods
- Balance for $[cost,capacity,performance]\newline$
- Faster access time -> greater cost per bit
- Greater capacity -> smaller cost per bit
- Greater capacity -> slower access time
The hierarchy
-
Use memory hierarchy
- Each level characterized by its size, access time, and cost per bit
- Different levels of memory interact in some way
-
Goal $\rightarrow$ Try to basically match the speed of the CPU and the memory closest to it
Diagram
Cache Memory Principles
-
What is cache?
-
A high-speed memory faster than general RAM
-
Using expensive but faster
SRAM
technology
-
-
Use cache
- It is possible to build a computer which uses only static RAM, this would be very fast, but cost a very large amount
-
Hierarchy is the solution
Locality of Reference
- Most programs are highly sequential; the next instruction usually comes from the next memory location
- Data is usually structured, and data in these structures normally are stored in continuous memory locations
- Short loops are a common program structure, especially for the innermost sets of nested loops. This means that the same small set of instructions is used over and over
Good spatial locality
|
|
bad spatial locality
|
|
Cache Memory Principles
-
Cache stores a small part of memory data
-
If the data required by the CPU is in the cache, it will read the cache directly without reading the memory
-
If the required data is not in the cache, read the memory and update the cache
-

Characteristic
-
Small amount
-
Fast access - Operate at nearly the speed of the processor
-
Sits between normal main memory and CPU
-
May be located on CPU chip or module
-
Very expensive

Multi-level cache

Typical cache organization

Elements of Cache Design
-
Cache Addresses
-
Cache Size
-
Mapping Function
-
Replacement Algorithm
-
Write Policy
-
Line Size
-
Number of Cache
Virtual address
-
A technology of Memory Management
- Allows programs to address memory from a logical point of view, without regard to the amount of main memory physically available
-
A memory expansion technology
- It will not change the size of the physical space of memory, but it can make the way of program accessing memory more flexible
-
Hardware memory management unit (
MMU
) translates each virtual address into a physical address
Logic cache and Physical cache
-
Logic cache: Between processor and virtual memory management unit
-
Physical cache: Between
MMU
and main memory

Logic cache
-
Processor accesses cache directly, without going through the
MMU
(Memory management Unit) -
Virtual addresses use same address space for different applications(logicial)
-
Must flush cache on each context switch
Physical cache
-
Physical cache stores data using main memory physical addresses
-
Logical address is translated into the physical address by the
MMU
, and then the cache is accessed
Cache size
- More cache size is expensive
- More cache is faster(
up to a point
) - Checking cache for data takes time
Cache hits and misses
-
Cache hits
-
The CPU requested data is in cache, thus cache can pass the contents to the CPU
-
Hit Ratio: The fraction of memory access found in the upper level
-
$Hit\ Time=Access\ Time+Time\ to\ determine\ hit\ or\ miss(scan\ the\ cache)\newline$
-
$Hit\ Ratio=\frac{N_{hit}}{N_{hit}+N_{total}}\newline$
-
-
Cache miss
- The requested data is not in cache, it must be read from the memory
- $Miss\ Ratio=1-Hit\ Ratio\newline$
- $Miss\ Penalty=Time\ to\ replace\ a\ block\ in\ the\ cache+Time\ to\ deliver\ the\ missed\ data\ to\ the\ processor\newline$
-
Cache performance is often measured in terms of the $\frac {hit}{miss}$ ratio
-
$T_{average}=Hit\ Ratio \times T_{hit}+Miss\ Ratio \times T_{miss}=Hit\ Ratio \times T_{hit}+(1-Hit\ Ratio) \times T_{miss}\newline$
Cache mapping ! ! !
Direct mapping
-
Each block of main memory maps to only one cache line
- If a block is in cache, it must be in one specific place
-
Address is in two parts
-
Least Significant w bits identify unique word
-
Most Significant s bits specify one memory block
-
-
The
MSBs
are split into a cache line field r and a tag of s-r (most significant)
Characteristics of direct mapping
- Fixed location for given block
- Frequent replacement operations, low hit rate
- jitter
- If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
Associative mapping
-
A main memory block can load into any line of cache
-
Memory address is interpreted as tag and word
-
Tag uniquely identifies block of memory
-
Every line’s tag is examined for a match
-
-
Mapping is quite flexible
-
Cache searching gets complex
Characteristics of associative mapping
-
Advantage
- Flexibility
-
Disadvantage
-
Large tag memory
-
The complex circuitry required to examine the tags of all caches lines in parallel
-
-
Replacement algorithms are important to maximize the hit ratio
Set Associative Mapping
-
Cache is divided into a number of sets cache
-
Each set contains a number of lines
-
A given block maps to any line in a given set
-
-
Use set field to determine cache set to look in
-
Compare tag field to see if we have a hit
Replacement algorithms
For direct mapping
- When a new block in the memory needs to be written to the cache, you can only replace the data in the previous cache
For associative mapping
-
If there are blank lines in the corresponding multiple lines, write them directly
-
If there is no blank row in the corresponding multiple rows, you need to determine which row to overwrite
Replacement strategies
-
Least Recently used (
LRU
)- usage time of the cache line needs to be recorded
-
First in first out (
FIFO
)- replace block that has been in cache longest
-
Least frequently used
- replace block which has had fewest hits(by OS)
-
Random
Write policy
- content in cache are updated
- there are multiple CPUs in the system
- I/O may address main memory directly. Bringing data consistency problems
Write through
-
All writes go to main memory as well as cache
-
Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date
-
Lots of traffic
-
Slows down writes
Write back
-
Updates initially made in cache only
-
Update bit for cache slot is set when update occurs
-
If block is to be replaced, write to main memory only if update bit is set
-
I/O must access main memory through cache
-
In multi CPU architecture, there is a cache consistency problem when sharing memory
-
DMA
may also have consistency problems
Cache Coherency
-
Bus watching with write through
-
Hardware transparency
- The hardware method is used to ensure that all changes to main memory through cache will be reflected in all caches at the same time
-
Noncacheable memory
-
Part of the main memory is shared by multiple processors
-
All accesses to shared main memory will result in cache invalidation
-
The shared data in the memory will not be copied to the cache
-
Multilevel Caches
- High logic density enables caches on chip
- Common to use both on and off chip cache
Unified and Split Caches
-
Split Cache -> there was an architecture using two L1(Level 1) caches, one for storing data and the other for storing instructions
-
When instructions are needed, access the instruction cache
-
Access the data cache when data is needed
-
-
Advantages of unified cache
-
Easy design & implement
-
Balances load of instruction and data fetch
-
Higher hit rate
-
-
Advantages of split cache
-
Eliminates cache contention between instruction fetch/decode unit and execution unit
-
Important in pipelining
-
Pentium 4 Cache Organization

ARM Cache Organization
-
Small FIFO write buffer
-
Enhances memory write performance
-
Between cache and main memory
-
Much small than cache
-
Data put in write buffer at processor clock speed
-
Processor continues execution
-
External write in parallel until empty
-
If buffer full, processor stalls
-
