目录

Computer Organization and Architecture Cache Memory


Computer Organization and Architecture

Cache Memory

Outline

  • Key terms
  • Computer Memory System Overview
  • Cache Memory Principles
  • Elements of Cache Design
  • Pentium 4 Cache Organization
  • ARM Cache Organization

Key terms

  • access time

  • associative mapping

  • cache hit

  • cache line

  • cache memory

  • cache miss

  • data cache

  • direct access

  • direct mapping

  • high-performance computing

  • hit ratio

  • instruction cache

  • L1 cache

  • L2 cache

  • L3 cache

  • Locality

  • logical cache

  • memory hierarchy

  • multilevel cache

  • physical cache

  • random access

  • replacement algorithm

  • set-associative mapping

  • spatial locality

  • split cache

  • Tag

  • temporal locality

  • unified cache

  • virtual cache

  • write back

  • write once

  • write through

Computer Memory System Overview

Memory

  • In a Von Neumann structured computer, memory does not distinguish between instructions and data

  • The performance of memory is constantly improving, but it is far from the speed of CPU

  • Leading to a growing performance difference between them, which seriously affects the overall performance of the computer

Characteristics of Memory Systems

Location
  • The storage location mainly refers to whether the memory is inside or outside of the computer

  • Inside memory

    • Registers and Cache in CPU
    • Internal Cache and main memory
  • The processor needs to access these stores through the I/O controller, so it is called an external memory

/img/Computer Organization and Architecture/chapter4-1.png
Capacity Unit
  • Bit: 0 or 1

  • Byte: 8 bit

  • Word size

    • The natural unit of organisation

    • Not a fixed length

    • Now it is generally 16 bit, 32 bit, or 64 bit

    • Most registers are one word

    • Integers may be in words

    • Instruction length is generally the same as the word

    • Bus width

Capacity
  • Capacity: the amount of information can be stored

    • Number of addressable units

    • Unit size

  • Addressable unit: refers to a storage unit that can be uniquely addressed in memory

  • ! ! ! $Capacity=Addressable\ unit\ number \times unit\ size\newline$

    • Number of words
    • Number of Bytes
Unit of transfer
  • Internal

    • Usually governed by data bus width, word
  • External

    • Usually a block which is much larger than a word
  • Addressable unit

    • Smallest location which can be uniquely addressed

    • Word internally

    • Cluster on external disks

Access methods ! ! !
  • Sequential

    • Start at the beginning and read through in order

    • Access time depends on location of data and previous location

    • e.g. tape

  • Direct

    • Individual blocks have unique address

    • Access is by jumping to vicinity(nearly) plus sequential search

    • Access time depends on location and previous location

  • Random

    • Individual addresses identify locations exactly

    • Access time is independent of location or previous access

    • e.g. RAM

  • Associative

    • Data is located by a comparison with contents of a portion of the store

    • Access time is independent of location or previous access

    • e.g. cache

Performance ! ! !
  • Access time

    • Time between presenting the address and getting the valid data

    • For RAM: time to address the unit and perform the transfer

    • For non-RAM: time to position the R/W head over the desired location

  • Memory Cycle time

    • Applied to RAM

    • Time may be required for the memory to “recover” before next access

    • $Memory\ Cycle\ time=access\ time+additional\ time\ required\ before\ next\ access\newline$

  • Transfer Rate

    • Rate at which data can be moved

    • ! ! ! For RAM: $Rate = \frac{1}{Cycle\ time}\newline$

    • ! ! ! For non-RAM: $Rate=\frac{N}{T_N-T_A}\newline$

      • $N: number\ of\ bits\newline$

      • $T_N:time\ to\ write\ or\ read\ N\ bits\newline$

      • $T_A=average\ access\ time(time\ for\ location)\newline$

Physical types
  • Semiconductor

    • RAM
  • Magnetic

    • Disk & Tape
  • Optical

    • CD & DVD
  • Others

    • Bubble

    • Hologram

Physical characteristics
  • Decay or not

    • Decay: Flash disk

    • No decay: Hard disk

  • Volatility or not

    • Volatility: Memory

    • No volatility: Hard disk

  • Erasable or not

    • Erasable :Hard disk

    • No erasable: CD-ROM

  • Power consumption

The Memory Hierarchy

Design Objectives
  • Major design objective of any memory system

    • To provide adequate storage capacity

    • An acceptable level of performance

    • At a reasonable cost

Design methods
  • Balance for $[cost,capacity,performance]\newline$
    • Faster access time -> greater cost per bit
    • Greater capacity -> smaller cost per bit
    • Greater capacity -> slower access time
The hierarchy
  • Use memory hierarchy

    • Each level characterized by its size, access time, and cost per bit
    • Different levels of memory interact in some way
  • Goal $\rightarrow$ Try to basically match the speed of the CPU and the memory closest to it

Diagram

Cache Memory Principles

  • What is cache?

    • A high-speed memory faster than general RAM

    • Using expensive but faster SRAM technology

  • Use cache

    • It is possible to build a computer which uses only static RAM, this would be very fast, but cost a very large amount
  • Hierarchy is the solution

Locality of Reference

  • Most programs are highly sequential; the next instruction usually comes from the next memory location
  • Data is usually structured, and data in these structures normally are stored in continuous memory locations
  • Short loops are a common program structure, especially for the innermost sets of nested loops. This means that the same small set of instructions is used over and over

Good spatial locality

1
2
3
4
5
6
7
8
template <std::size_t M, std::size_t N>
int fun1(const std::array<std::array<int, N>, M>& arr) {
	int sum = 0;
	for (std::size_t i = 0; i < M; i++)
		for (std::size_t j = 0; j < N; j++)
			sum += arr[i][j];
	return sum;
}

bad spatial locality

1
2
3
4
5
6
7
8
template <std::size_t M, std::size_t N>
int fun2(const std::array<std::array<int, N>, M>& arr) {
	int sum = 0;
	for (std::size_t j = 0; j < N; j++)
		for (std::size_t i = 0; i < M; i++)
			sum += arr[i][j];
	return sum;
}

Cache Memory Principles

  • Cache stores a small part of memory data

    • If the data required by the CPU is in the cache, it will read the cache directly without reading the memory

    • If the required data is not in the cache, read the memory and update the cache

/img/Computer Organization and Architecture/chapter4-4.png

Characteristic

  • Small amount

  • Fast access - Operate at nearly the speed of the processor

  • Sits between normal main memory and CPU

  • May be located on CPU chip or module

  • Very expensive

/img/Computer Organization and Architecture/chapter4-2.png

Multi-level cache

/img/Computer Organization and Architecture/chapter4-3.png

Typical cache organization

/img/Computer Organization and Architecture/chapter4-5.png

Elements of Cache Design

  • Cache Addresses

  • Cache Size

  • Mapping Function

  • Replacement Algorithm

  • Write Policy

  • Line Size

  • Number of Cache

Virtual address

  • A technology of Memory Management

    • Allows programs to address memory from a logical point of view, without regard to the amount of main memory physically available
  • A memory expansion technology

    • It will not change the size of the physical space of memory, but it can make the way of program accessing memory more flexible
  • Hardware memory management unit (MMU) translates each virtual address into a physical address


Logic cache and Physical cache

  • Logic cache: Between processor and virtual memory management unit

  • Physical cache: Between MMU and main memory

/img/Computer Organization and Architecture/chapter4-6.png
Logic cache
  • Processor accesses cache directly, without going through the MMU(Memory management Unit)

  • Virtual addresses use same address space for different applications(logicial)

  • Must flush cache on each context switch

Physical cache
  • Physical cache stores data using main memory physical addresses

  • Logical address is translated into the physical address by the MMU, and then the cache is accessed

Cache size

  • More cache size is expensive
  • More cache is faster(up to a point)
  • Checking cache for data takes time

Cache hits and misses

  • Cache hits

    • The CPU requested data is in cache, thus cache can pass the contents to the CPU

    • Hit Ratio: The fraction of memory access found in the upper level

    • $Hit\ Time=Access\ Time+Time\ to\ determine\ hit\ or\ miss(scan\ the\ cache)\newline$

    • $Hit\ Ratio=\frac{N_{hit}}{N_{hit}+N_{total}}\newline$

  • Cache miss

    • The requested data is not in cache, it must be read from the memory
    • $Miss\ Ratio=1-Hit\ Ratio\newline$
    • $Miss\ Penalty=Time\ to\ replace\ a\ block\ in\ the\ cache+Time\ to\ deliver\ the\ missed\ data\ to\ the\ processor\newline$
  • Cache performance is often measured in terms of the $\frac {hit}{miss}$ ratio

  • $T_{average}=Hit\ Ratio \times T_{hit}+Miss\ Ratio \times T_{miss}=Hit\ Ratio \times T_{hit}+(1-Hit\ Ratio) \times T_{miss}\newline$

Cache mapping ! ! !

Direct mapping
  • Each block of main memory maps to only one cache line

    • If a block is in cache, it must be in one specific place
  • Address is in two parts

    • Least Significant w bits identify unique word

    • Most Significant s bits specify one memory block

  • The MSBs are split into a cache line field r and a tag of s-r (most significant)

Characteristics of direct mapping

  • Fixed location for given block
  • Frequent replacement operations, low hit rate
  • jitter
    • If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
Associative mapping
  • A main memory block can load into any line of cache

    • Memory address is interpreted as tag and word

    • Tag uniquely identifies block of memory

    • Every line’s tag is examined for a match

  • Mapping is quite flexible

  • Cache searching gets complex

Characteristics of associative mapping

  • Advantage

    • Flexibility
  • Disadvantage

    • Large tag memory

    • The complex circuitry required to examine the tags of all caches lines in parallel

  • Replacement algorithms are important to maximize the hit ratio

Set Associative Mapping
  • Cache is divided into a number of sets cache

    • Each set contains a number of lines

    • A given block maps to any line in a given set

  • Use set field to determine cache set to look in

  • Compare tag field to see if we have a hit

Replacement algorithms

For direct mapping
  • When a new block in the memory needs to be written to the cache, you can only replace the data in the previous cache
For associative mapping
  • If there are blank lines in the corresponding multiple lines, write them directly

  • If there is no blank row in the corresponding multiple rows, you need to determine which row to overwrite

Replacement strategies
  • Least Recently used (LRU)

    • usage time of the cache line needs to be recorded
  • First in first out (FIFO)

    • replace block that has been in cache longest
  • Least frequently used

    • replace block which has had fewest hits(by OS)
  • Random

Write policy

  • content in cache are updated
  • there are multiple CPUs in the system
  • I/O may address main memory directly. Bringing data consistency problems
Write through
  • All writes go to main memory as well as cache

  • Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date

  • Lots of traffic

  • Slows down writes

Write back
  • Updates initially made in cache only

  • Update bit for cache slot is set when update occurs

  • If block is to be replaced, write to main memory only if update bit is set

  • I/O must access main memory through cache


  • In multi CPU architecture, there is a cache consistency problem when sharing memory

  • DMA may also have consistency problems


Cache Coherency

  • Bus watching with write through

  • Hardware transparency

    • The hardware method is used to ensure that all changes to main memory through cache will be reflected in all caches at the same time
  • Noncacheable memory

    • Part of the main memory is shared by multiple processors

    • All accesses to shared main memory will result in cache invalidation

    • The shared data in the memory will not be copied to the cache

Multilevel Caches

  • High logic density enables caches on chip
  • Common to use both on and off chip cache

Unified and Split Caches

  • Split Cache -> there was an architecture using two L1(Level 1) caches, one for storing data and the other for storing instructions

    • When instructions are needed, access the instruction cache

    • Access the data cache when data is needed


  • Advantages of unified cache

    • Easy design & implement

    • Balances load of instruction and data fetch

    • Higher hit rate

  • Advantages of split cache

    • Eliminates cache contention between instruction fetch/decode unit and execution unit

    • Important in pipelining

Pentium 4 Cache Organization

/img/Computer Organization and Architecture/chapter4-7.png

ARM Cache Organization

  • Small FIFO write buffer

    • Enhances memory write performance

    • Between cache and main memory

    • Much small than cache

    • Data put in write buffer at processor clock speed

    • Processor continues execution

    • External write in parallel until empty

    • If buffer full, processor stalls

/img/Computer Organization and Architecture/chapter4-8.png