目录

Computer Organization and Architecture Processor Structure and Function


Computer Organization and Architecture

Processor Structure and Function

Outline

  • Processor Organization

  • Register Organization

  • Instruction Cycle

  • Instruction Pipelining

Processor Organization

  • A CPU must be able to

    • Fetch instruction from memory

    • Decode the instruction to determine what action to do

    • Fetch data

    • Process data

    • Write data

/img/Computer Organization and Architecture/chapter12-1.png
  • CPU必须要能够暂时保存一些数据,以对数据进行处理

  • CPU需要记住下一个指令的位置,这样才能在当前指令执行完成之后,能到找到下一个指令

  • 处理过程中需要能够保存指令和数据

  • CPU包括ALU,CU,还需要有一组存储部件——寄存器

  • CPU通过一组系统总线和计算机的其他部件进行连接。系统总线包括控制总线、数据总线和地址总线

/img/Computer Organization and Architecture/chapter12-2.png
  • CPU的内部总线把ALU、寄存器和CU连在一起,完成数据在寄存器和ALU的传送

  • 控制单元对寄存器、内部总线和ALU进行控制,控制各个部件按照指令要求完成相应的处理

  • 在ALU内部,还包括各种更小的组件,例如状态标志,移位器,求补器,以及算术和布尔逻辑等

Register Organization

  • CPU must have some working space (temporary storage)

    • Called registers
  • Number and function vary between processor designs

  • One of the major design decisions

  • Top level of memory hierarchy


  • Registers in the CPU Including two types

    • User-visible registers

    • Reduce access to main memory and improve instruction processing efficiency by optimizing the use of registers

  • Control and status registers

    • Used by control unit

    • Control the operation of the CPU and the execution of the program by the privileged operating system

User-visible registers

  • User visible registers can be divided into four categories according to its purpose

    • General purpose: assigned to various purposes

    • Data: for data retention only

    • Address: used for some addressing mode

    • Condition codes: also called flag register, it stores some flags of operation results

General purpose registers
  • True general purpose

    • Registers and opcodes are orthogonal in the instruction set
    • Registers can be arbitrarily matched with opcodes
  • Restricted general registers

    • Specially used for floating point number or stack operation
  • In some cases, general registers can be used for addressing

    • Register indirect addressing
    • Displacement addressing

  • Data register

    • It can only be used for store data, not for addressing
    • Accumulator
  • Addressing register

    • Used for a specific addressing mode

    • Segment pointer

    • Index register

    • Stack pointer


General or Specialized

  • Whether registers are general or special affects the design of instruction sets

  • Specialized

    • Opcode can implicitly use a register group or a register

    • Smaller instructions

    • Less flexibility

  • General purpose

    • Increase flexibility and programmer options

    • Increase instruction size & complexity

  • More registers require more bits to specify registers in instructions

  • Fewer registers require more memory access

  • Too many registers does not reduce memory references remarkably and takes up processor real estate

    • Between 8-32 is appropriate
  • Using register files with RISC makes use of using more registers


  • Address register: large enough to hold full address

  • Data address: large enough to hold full word

  • Sometimes combine two data registers to hold double length data

    • In C language, there is a double integer and a long integer, both of which are two words long

Condition code registers

  • Also called flag registers, some of which are visible to users

  • After operating,CPU set the condition bit according to the result

    • After arithmetic operation, positive, negative, zero or overflow may occur

    • These conditions will be set in

  • Programs are allowed to read the condition code and perform

  • Condition code cannot be modified by the program


Control & status registers

  • Four registers are essential to instruction execution

    • Program Counter (PC)

    • Instruction Register (IR)

    • Memory Address Register (MAR)

    • Memory Buffer Register (MBR)

  • Not all processors have MAR and MBR. However, the system still needs registers similar to these two registers


Program status word

  • The PSW contains status information

  • The flags include

    • Sign, zero, carry, equal, overflow

    • interrupt enable/disable

    • Supervisor: indicates whether the CPU is executing in supervisor or user mode

  • Supervisor mode

    • Not available to user programs

    • Used by operating system(System call)

    • Certain privileged instructions can be executed only in supervisor mode


Other status and control registers

  • Other additional status and control registers

    • Pointer register to process control block

    • Interrupt vector register in vector interrupt computer

    • System stack pointer

    • Page table pointer register in virtual memory

    • I/O operation related registers

  • Control and status registers design elements

    • Need to support the operating system
    • Storage location in registers and memory

Instruction Cycle

  • Instruction cycle includes fetching cycle and execution cycle

  • In execution cycle, first decode to get the operation type of the instruction

  • If instruction has operands, get the operand specifier in the instruction

    • Immediate

    • Register

    • Direct addressing: memory access once

    • Indirect addressing: may requires more memory accesses

      • Also called “indirect cycle”

Instruction cycle with indirect

  • 指令周期包括取指周期和执行周期,还可能包括间接周期和中断周期

  • 取指后,通过译码确定是否包含需要间接寻址的操作数,如果有,进入间接周期

  • 当前指令执行完成之后,检查是否有中断。如果有,进入中断周期

/img/Computer Organization and Architecture/chapter12-3.png
Instruction Cycle(with Interrupts - State Diagram)
  • 指令周期中,先取指,然后进行指令操作译码

  • 如果涉及到操作数,进行操作数地址计算,然后取操作数

  • 之后进行数据操作。操作结果如果要保存到存储器中,需要计算操作数的地址,然后保存

  • 在这条指令执行完成之后,检测是否有中断。如果没有中断,继续执行下一条指令。如果有中断,就按照中断的处理规则,进行中断处理

/img/Computer Organization and Architecture/chapter12-4.png
Indirect cycle
  • 间接寻址过程中,由于操作数地址需要通过计算得到,所以在取操作数的过程中,可能会存在多次访问存储器的情况

  • 取操作数和存结果的过程中,都可能会存在间接周期


Data flow (instruction fetch)

  • Depends on CPU design

  • Fetch inn general

    • PC contains address of next instruction

    • Address moved to MAR

    • Address placed on address bus

    • Control unit requests memory read

    • Result placed on data bus, copied to MBR, then to IR

    • Meanwhile PC incremented by 1

/img/Computer Organization and Architecture/chapter12-5.png
Data flow(fetch diagram)
  • 刚开始,下一个地址在PC中

  • 地址给MAR

  • 地址放到数据总线上

  • 控制单元发起读控制

  • 存储器把数据,也就是指令内容,放到数据总线上

  • MBR读取数据总线内容,然后把指令给IR

  • 控制单元还需要让PC+1,指向下一个指令


  • IR is examined

  • If there is no indirect addressing, enter the execution cycle

  • If indirect addressing, indirect cycle is performed

    • Rightmost N bits of MBR transferred to MAR

    • Control unit requests memory read

    • Result (address of operand) moved to MBR


Data flow (execute)

  • May take many forms

  • Depends on instruction being executed

  • May include

    • Memory read/write

    • Input/Output

    • Register transfers

    • ALU operations


Data flow (interrupt)

  • Simple and predictable

  • Current PC saved to allow resumption after interrupt

    • Contents of PC copied to MBR

    • Special memory location (e.g. stack pointer) loaded to MAR

    • MBR written to memory

  • PC loaded with address of interrupt handling routine

  • Interrupt handler first instruction fetched

Instruction Pipelining

Why need pipeline?

  • Development of computer application requires continuous improvement of processing capacity

  • The development of integrated circuit, clock frequency, registers, cache, etc. have reduced the instruction processing time and improved the processing ability

  • More and more difficult to solve problems by simply relying on the performance of hardware

  • The goal is the execution efficiency of instructions

  • Better organization is needed to improve the efficiency of instruction execution


What is pipeline? ! ! !

  • The working mode of factory assembly line is used for reference

    • Divide the execution of instructions into several stages

    • Different stages of multiple instructions can be processed in parallel

  • Although execution time of each instruction is not shortened, the execution time of a group of instructions is shortened due to the parallel method

  • This is the basic idea of instruction pipeline

Prefetch

  • Before pipelining, next instruction is taken after current instruction is executed

  • With pipelining, more than one instruction in different stages of the pipeline

  • How to get instructions is a problem

    • Fetch accessing main memory

    • Execution usually does not access memory

    • Fetch next instruction during execution of current instruction

  • Called instruction prefetch


Advantage

  • During execution of an instruction, a new instruction has entered the pipeline

  • After current instruction is executed, it can be executed immediately

    • Next instruction has finished fetching

    • Save time for fetching

  • Accessing memory is required for fetching

    • If cache hits, take it directly

    • If cache missing, access memory

  • In fact, the instruction cycle is divided into more detailed stages, more pipeline stages, and more overlapping and efficient instruction execution stages


Which instruction is Prefetched?

  • Which instruction is appropriate for prefetching?

  • Next instruction of the current instruction?

    • If it is executed sequentially, no problem

    • If there is a transition, the next instruction needs to be determined according to the conditions

    • Hard to predict

  • Does a misprediction in prefetching affect correctness?

    • No, prefetched data at a “mis-predicted” address is simply not used

    • There is no need for state recovery


Basics characteristics

  • In modern systems, prefetching is usually done in cache block granularity

  • Prefetching is a technique that can reduce both

    • Miss rate

    • Miss latency

  • Prefetching can be done by

    • Hardware
    • Compiler
    • Programmer

Prefetching: the four questions

  • What

    • What addresses to prefetch
  • When

    • When to initiate a prefetch request
  • Where

    • Where to place the prefetched data
  • How

    • Software, hardware, execution-based, cooperative

Two stage instruction pipeline

/img/Computer Organization and Architecture/chapter12-6.png
  • 简单的指令过程就是串行处理,取指-执行-取指-执行,效率低

  • 采用两阶段流水线后,在当前指令的执行过程中,进行下一个指令的取指

  • 如果当前指令执行完成后,下一个指令不是预取的,需要重新取指

  • 取指和执行指令的时间重叠,节省了时间

  • 但是由于取指和执行指令的时间需要不一样,所以执行速度不能翻倍

/img/Computer Organization and Architecture/chapter12-7.png
Instruction pipelining
  • 两阶段流水线的执行过程

  • 上一条指令的执行阶段和下一条指令的取指阶段在时间上是重叠的

  • 每个指令的总体执行时间没有缩短,部指令的执行时间缩短了

  • 如果取指和执行时间相同,那么流水线的执行时间是串行执行的一半,性能提升一倍

Improved performance

  • But not doubled

    • Fetch usually shorter than execution

    • Instruction execution process is complex and time-consuming

    • Execution time determines the improvement effect

  • Jump or branch instruction

    • means that prefetched instructions are not the required instructions

    • Get the actual instructions according to the results

  • Add more stages to improve performance

improve concurrency

  • Goal: More concurrency → Higher instruction throughput

  • Method: When an instruction is using some resources in its processing phase, process other instructions on idle resources

    • Fetch next instruction when an instruction is being decoded

    • Decode an instruction when an instruction is being executed

    • Execute the next instruction when current instruction is accessing memory

    • When an instruction is writing its result into the register file, access data memory for the next instruction

Summary

  • Analogy: “Assembly line processing” of instructions

  • Pipeline the execution of multiple instructions

    • Divide the instruction processing cycle into distinct “stages” of processing

    • Ensure there are enough hardware resources to process one instruction in each stage

    • Process a different instruction in each stage

    • Instructions are executed in the order of program

  • Benefit: Increases instruction processing throughput

/img/Computer Organization and Architecture/chapter12-8.png
execution of four independent add
  • 加法指令流水线

  • 整个指令分为4个阶段:取指,译码,执行,写结果,均为t

  • 采用4阶段流水线,每个阶段完全独立,n个指令,需要$nt+3t$的时间

  • 基本上是$\frac {1}{4}$的时间

In practice

/img/Computer Organization and Architecture/chapter12-9.png
  • 烘干衣服需要2个时间单位,这样,如果完全串行,需要20个时间单位

  • 采用流水线后,有等待烘干机的时间

  • 4件衣服需要11个时间单位

  • 理论上的速度为非流水线的$2.5$倍

  • 最慢的步骤决定了整个系统的吞吐量

How
/img/Computer Organization and Architecture/chapter12-10.png
  • 烘干机成为整个系统的瓶颈

  • 补充资源,配置2个烘干机

  • 下一个衣服洗完后,不需要等待上一个衣服的烘干,用另一台烘干机

  • 关键环节增加资源,使得整个吞吐量回到之前的情况

  • 代价就是配置额外的资源

Goal

  • Increase instruction throughput with little increase in cost

    • Process instructions in the order required by the program

    • Hardware cost cannot be increased too much

    • Instruction throughput can be greatly increased

An ideal pipeline

  • Repetition of identical operations

    • Same operation, different operation objects

    • Automobiles of the same model can be produced on one assembly line

    • Different operations require different steps, which affects the operation of the pipeline

    • The production of automobiles and motorcycles requires different steps and cannot be put on the same assembly line


  • Operating objects are independent of each other

    • There is no dependency between each operation object

    • For example, there is no relationship between cars produced on the assembly line

    • Operating objects with sequential dependencies affect each other during parallel operations


  • A complete operation can be decomposed into several sub operations

    • Each sub operation takes the same time

    • Each sub operation requires independent resources and does not share resources

    • If sub operation requires different time, some sub operations must wait

    • Resource sharing leads to resource contention


  • For the pipeline design of instructions, we divide the execution of instructions into six stages

    • Fetch instruction(FI)

    • Decode instruction(DI)

    • Calculate operands(CO)

    • Fetch operands(FO)

    • Execute instructions(EI)

    • Write result(WO)

  • Overlap these operations

Timing of pipeline

/img/Computer Organization and Architecture/chapter12-11.png
  • 理想的指令流水线的执行过程

  • 指令执行分为6个阶段,相互之间不共享资源

  • 按照流水线的方式来执行,从第六个时间单位开始,每个时间单位都会有1个指令完成执行

  • 指令数量足够多时,执行效率为原来的6倍

Summary

  • The total execution time for each individual instruction is not changed by pipelining

    • It still takes an instruction cycle to make it all the way through the processor
  • Pipelining doesn’t speed up instruction execution time

  • It does speed up program execution time by increasing the number of instructions finished per unit time

Branch in a pipeline

/img/Computer Organization and Architecture/chapter12-12.png
  • 指令1和2的执行都是正常的

  • 指令3在时间片8时,需要跳转到指令15的执行

  • 指令4-7已经完成的处理作废

  • 需要重新开始指令15的取指

  • 第9到第12时间片,没有指令完成执行,称为分支惩罚

  • 分支越多,分支惩罚就越多,整个程序的指令吞吐率就越低


Six stage instruction pipeline

/img/Computer Organization and Architecture/chapter12-13.png
  • 第一步是取指,之后是指令译码,并计算操作数地址

  • 此时,需要判断指令是否是无条件转移,如果是,那么更新PC,并清空流水线,继续开始取指

  • 如果不转移,正常执行指令,取操作数,然后执行指令,并写操作数

  • 判断是否进行分支,或者是否有中断。如果是,那么和无条件分支一样,更改PC,清空流水线,继续往下执行后续指令

Other factors

  • Data transmission between different parts takes time

  • Theoretically, the more stages, the higher the efficiency of instruction execution

    • The more stages are divided, the more complex the control between stages will be

    • Latching delay, buffering between phases takes a certain time

    • Need reasonable design


Speedup factors with instruction pipelining

  • 假定总共需要执行n条指令,采用的流水线段数为k,那么使用指令流水线相对于不使用流水线的加速比的定义是

$$ S_k=\frac {nk}{k+n-1} $$

  • 随着指令数的增加,加速比趋向于流水线的阶段

  • 指令数越多,加速比越接近理论上的加速比。而随着段数的增加,加速比增加缓慢

  • 流水线段数能带来更好的潜在加速比,但同时也带来很多问题。比如分支时需要清空流水线,段间延时也需要考虑

Analysis of instruction pipeline

  • What are the characteristics of an ideal pipeline?
    • Repetition of identical operations

    • Operating objects are independent of each other

    • A complete operation can be decomposed into several sub operations


Identical operations … NOT!

  • different instructions → not all need the same stages

  • Forcing different instructions to go through the same pipe stages

  • Some pipeline stages are idle

  • Leading to a waste of time, called external fragmentation


Independent operations … NOT!

  • instructions are not independent of each other

  • Need to detect and resolve inter-instruction dependencies to ensure the pipeline provides correct results

  • Pipeline stalls frequently due to branch

  • Poor operation of the pipeline


Uniform sub-operations … NOT!

  • different pipeline stages → not the same latency

  • Need to force each stage to be controlled by the same clock

  • Some pipe stages are too fast but all take the same clock cycle time

  • These wasted time are called internal fragmentation


Issues in pipeline design

  • Reasonably divide the stages of instructions

    • How many stages is the instruction cycle divided into?

    • what is done in each stage

  • Handling exceptions, interrupts

  • Keeping the pipeline correct, moving, and full

    • Data dependences

    • Control dependences

    • Resource conflict

    • Long-latency (or multi-cycle) operations


Causes of pipeline stalls

  • Pipeline stall: A condition when the pipeline stops moving

  • Causes of stall

    • Resource contention
    • Dependences between instructions, including data dependence and control dependence
    • Long-latency (multi-cycle) operations

Dependences and Their Types

  • Also called “hazard” or “pipeline bubble”

  • Dependences dictate ordering requirements between instructions

  • Two types

    • Data dependence

    • Control dependence

  • Resource contention is sometimes called resource dependence

  • When dependency occurs, the pipeline will be suspended, which is called pipeline adventure

Resource hazards

  • Two (or more) instructions in pipeline need same resource

    • Executed in serial rather than parallel for part of pipeline

    • Also called structural hazard

  • It is caused by unreasonable structure or insufficient resources

    • Such as using the same register
  • The solution is generally to increase available resources, such as adding a dryer in the previous example

Example

/img/Computer Organization and Architecture/chapter12-14.png
  • 第3个时钟周期,$I_1$需要读取内存取操作数,同时$I_3$也需要取指

  • 两个指令读需要读存储器,发生资源冲突

  • $I_3$需要空一个时钟周期,等到第4个时钟周期的时候,才去取指

  • 因为资源冲突而浪费了1个时钟周期

  • 如果只有一个$ALU$,执行指令也可能会冲突

Handling resource contention

  • Solution 1: Eliminate the cause of contention

    • Duplicate the resource or increase its throughput

    • E.g., use separate instruction and data memories (caches)

    • E.g., use multiple ports for memory structures

  • Solution 2: Detect the resource contention and stall one

    • Need to decide which one to stop

Data hazards

  • Conflict in access of an operand

    • E.g. ,both instructions access a particular memory or register operand
  • If two instructions are executed serially in strict order, that is one instruction executes after the finish of the previous instruction execution. No problem

  • If in a pipeline, operand value could be updated so as to produce different result from strict sequential execution

  • Data Hazard is caused by the conflict of access to the same operand location


Types of data hazard

  • Types of data dependences:

    • read after write

      • Called “True dependence ”
    • write after read

      • Called “Anti dependence ”
    • write after write

      • Called “Output dependence”
True dependency
  • Read after write (RAW), or true dependency

    • An instruction modifies a register or memory location

    • Succeeding instruction reads data in that location

    • Hazard occurs if read takes place before write complete

    • What needs to be read by succeeding instruction is the modified data

    • After the pipeline is adopted, the read data becomes the data before writing

/img/Computer Organization and Architecture/chapter12-15.png
  • 第一个指令需要写$r_3$

  • 第二个指令需要读$r_3$

  • 第二个指令必须要等第一个指令执行完成之后并写了$r_3$,才能完成读操作数的指令,否则读取的$r_3$不是需要的数

Anti dependence
  • Write after read (WAR), or anti-dependency

  • An instruction reads a register or memory location

  • Succeeding instruction writes to location

  • Hazard occur if write completes before read takes place

  • The data of the first instruction read operation is incorrect

/img/Computer Organization and Architecture/chapter12-16.png
  • 第一个指令读$r_1$

  • 第二个指令写$r_1$

  • 如果先执行了第二个指令,那么结果也不正确

  • 在超标量中会出现这种情况

Output dependence
  • Write after write (WAW), or output dependency

    • Two instructions both write to same location

    • Hazard if writes take place in reverse of order intended sequence

    • The data to be stored is the data written by the second instruction

    • In the pipeline, the data actually saved is the data written by the first instruction

    • Data of memory or register is not required

/img/Computer Organization and Architecture/chapter12-17.png
  • 第一个执行写$r_3$

  • 第三个指令也写$r_3$

  • 如果第三个指令先执行了,也结果不正确

  • 在超标量中会出现这种情况

How?

  • True dependences always need to be obeyed because they constitute true dependence on a value

  • True dependences always need to be obeyed because they constitute true dependence on a value

  • Anti and output dependences exist due to limited number of architectural registers

    • They are dependence on a name, not a value
  • Without special hardware and specific avoidance algorithms, results in inefficient pipeline usage

/img/Computer Organization and Architecture/chapter12-18.png
Data hazard diagram
  • 在第五个时钟周期,加法指令写EAX

  • 第四个时钟周期,减法要用EAX

  • 如果第二个指令不等待,那取的EAX还是最早的EAX,不是加法的结果

  • 所以减法指令需要停顿2个时钟周期,到第六个时钟周期才会去取操作数

  • 浪费了2个时钟周期

Method of handle

  • True dependences are more interesting

    • Actual interdependence between data,requires waiting
  • Anti and output dependences are easier to handle

    • It’s all about writing

    • Use more registers

    • Use different registers to eliminate possible correlation

  • Some fundamental ways of handling true dependences

    • Detect and wait until value is available in register file

    • Detect and eliminate the dependence at the software level

      • Register renaming

      • Discussed later

    • Predict the needed value(s), execute “speculatively”, and verify

Control dependence

  • Also called “control hazard”“branch hazard”

  • A Special Case of Data Dependence

  • Occurs when the pipeline makes a wrong judgment on branch transfer

  • Brings instructions into pipeline that must subsequently be discarded

  • The pipeline cannot run with full load

solve
  • Multiple Streams

  • Prefetch Branch Target

  • Loop buffer

  • Branch prediction

  • Delayed branching

Multiple streams
  • Have two pipelines for each branch

    • Prefetch each branch into a separate pipeline
  • Finally, determine which pipeline to use according to the branching conditions

  • Shortcoming

    • Leads to bus & register contention

    • Multiple branches lead to further pipelines being needed

Prefetch branch target
  • Target of branch is prefetched in addition to instructions following branch
  • It is not executed after prefetching, but fetching and decoding
  • Keep target until branch is executed
  • It can save the time of fetching and decoding
  • Used by IBM 360/91
Loop buffer
  • Very fast memory

  • Contains n instructions taken in the most recent order

  • When a branch may occur, first check whether the transfer target is in the buffer

  • Very good for small loops or jumps

Branch prediction
  • There are two types of branch predictions

    • Static branch predictions: the branch does not depend on the execution history

    • Dynamic branch prediction: the branch depends on the execution history

  • Static branching

    • Predict never taken

    • Predict always taken

    • Predict by Opcode

  • Dynamic branching

    • Taken/not taken switch

    • Branch history table


Static branch prediction

  • Predict never taken

    • Assume that jump will not happen

    • Always fetch next instruction

    • 68020 & VAX 11/780

  • Predict always taken

    • Assume that jump will happen

    • Always fetch target instruction

  • “Predict always taken ” are most used

  • Predict by Opcode

    • Some instructions are more likely to result in a jump than others

    • Can get up to 75% success


Dynamic branch prediction

  • Record the history of conditional branch instructions in a program

  • Taken/not taken switch: One or more bits are used to indicate recent history of the instruction

    • The branching decision is depended on these bits

    • Based on previous history

    • Good for loops


Branch history table

  • If Predict branch,target address can only be obtained by decoding instructions

    • A waiting time is required

    • How to improve efficiency?

  • A storage area called branch target buffer is designed

    • Also called branch history table

    • It records information related to branch transfer, including branch instruction address, transfer history bit, and target address information

  • Target address information

    • Can be target instruction

      • Use this instruction directly

      • Less time

      • It will take up more space

    • Can be the target instruction address

      • Less space

      • More time

    • Whether to save time or space depends on the specific situation

Branch history table

/img/Computer Organization and Architecture/chapter12-19.png
  • 预测转移后,指令预取的时候,先去转移历史表中查询

    • 如果有的话,根据指令状态进行预测,可能是目标地址,或者是下一顺序地址

    • 如果不匹配,顺序取下一个指令

  • 分支指令执行时,根据实际是否发生了转移,更新转移历史表中的状态位

  • 如果条件分支指令不在表中的时候,需要把指令加到这个表中,同时需要替换到当前表中的一项。替换方法可以采用很多种方法,类似于cache的替换策略

  • 转移历史表动态自动维护


Correlation-based prediction

  • The execution effect of the branch history table in the loop statement is good

  • In more complex structures, branch instruction directly correlates with that of related branches instruction

  • A method called Correlation-based branch history is proposed

    • Create a global branch history table
    • Predict by combining global and current branch instructions

Delayed Branch

  • A method of instruction rearrangement

  • Delayed branches need to calculate the impact of branches and determine which instructions are not affected before prefetching unwanted instructions

    • Execute such an instruction immediately after the branch instruction

    • The execution of this instruction keeps the pipeline in a full rotation state, and the clock cycle will not be wasted due to waiting