Computer Systems What the actual bits represent depends on the - - PowerPoint PPT Presentation

computer systems
SMART_READER_LITE
LIVE PREVIEW

Computer Systems What the actual bits represent depends on the - - PowerPoint PPT Presentation

Binary Representation Information is represented as a sequence of binary digits: Bits Computer Systems What the actual bits represent depends on the context: Numerical value (integer, floating point, fixed point) Sequence of


slide-1
SLIDE 1

1 of 87

Computer Systems

Seminar 3 Petru Eles petel@ida.liu.se tf: 281396 Some few slides are based on material from the course book as well as from the book “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron

2 of 87

Binary Representation

Information is represented as a sequence of binary digits: Bits

What the actual bits represent depends on the context:

Numerical value (integer, floating point, fixed point)

Sequence of characters (text)

Executable instruction

3 of 87

Binary Representation

Information is represented as a sequence of binary digits: Bits

What the actual bits represent depends on the context:

Numerical value (integer, floating point, fixed point)

Sequence of characters (text)

Executable instruction

Depending on the context, operations performed are:

Logical computation (context: logic) 1: true Operations: And, Or, Exclusive-Or (Xor), Not 0: false

Numerical Computation (context: numbers) 1 Operations: Addition, Subtraction, Multiplication, Division

4 of 87

Logical Computation: Boolean Algebra

& 0 1 1 1 | 1 1 1 1 1 ^ 0 1 1 1 1 ~ 1 1 And Not Xor Or

slide-2
SLIDE 2

5 of 87

Logical Computation: Boolean Algebra

It applies similarly to bit vectors (operations apply bitwise): & 0 1 1 1 | 1 1 1 1 1 ^ 0 1 1 1 1 ~ 1 1 And Not Xor Or 11100101 & 01101101 01100101 11100101 | 01101101 11101101 11100101 ^ 01101101 10001000 ~ 01101101 10010010

6 of 87

Arithmetical Computation

Adding two one bit numbers

A B Σ Cout 1 1 1 1 1 1 1

7 of 87

How is this Done in Computers?

Logic values are represented by voltage levels:

High Voltage (e.g. 3.3V):1

Low Voltage (e.g. 0V): 0 At the output of a circuit we can have the following signal; this circuit produces the sequence 0, 1, 0 (or false, true, false):

8 of 87

The Basic Building Block: The Transistor

Vout Vin

Source

H: high voltage level (1, true) L: low voltage level (0, false)

Vin Source Vout H H L L H H

slide-3
SLIDE 3

9 of 87

The Basic Building Block: The Transistor

Vout Vin

Source

H: high voltage level (1, true) L: low voltage level (0, false) Observe! This implements logic Not from Vin to Vout! ~ 1 1 Such a circuit is called a Not gate (al- so inverter):

Vin Source Vout H H L L H H

10 of 87

Gates for Boolean Operations

Gates are electronic devices that perform Boolean operations.

Gates are built as small electronic circuits based on transistors;

Gates are the basic building blocks out of which VLSI (very Large Scale Integration) circuits are built; today’s computers are implemented as VLSI circuits, with up to billions of transistors on a chip. In In In In Out Out Out Out

In1 In2 Out 1 1 1 1 1 In1 In2 Out 1 1 1 1 1 1 1 In1 In2 Out 1 1 1 1 1 1 In Out 1 1

And Not Xor Or

11 of 87

Gates for Boolean Operations

Gates are electronic devices that perform Boolean operations.

Gates are built as small electronic circuits based on transistors;

Gates are the basic building blocks out of which VLSI (very Large Scale Integration) circuits are built; today’s computers are implement- ed as VLSI circuits, with up to billions of transistors on a chip.

Any logical function can be implemented as a combination if such gates.

12 of 87

Implementing Arithmetical Computation

The one bit adder:

This is just a truth table capturing a logical function; thus, it can be im- plemented with a combination of logical gates! Sum = A Xor B Cout = A And B

A B Sum Cout 1 1 1 1 1 1 1

slide-4
SLIDE 4

13 of 87

Implementing Arithmetical Computation

The one bit adder:

This is just a truth table capturing a logical function; thus, it can be im- plemented with a combination of logical gates! Sum = A Xor B Cout = A And B

Cout

Here is the circuit: This is called a Half Adder (does not consider input carry).

A B Sum Cout 1 1 1 1 1 1 1

14 of 87

Implementing Arithmetical Computation

The Full Adder (adds two bits and input carry):

Cin A

B SumCout

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

15 of 87

Implementing Arithmetical Computation

The Full Adder (adds two bits and input carry):

Cin A

B SumCout

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

16 of 87

Implementing Arithmetical Computation

A four bits adder: adds two four bits numbers and an input carry:

By further cascading full adders, one can build 8, 16, 32, 64, ... bit adders.

In a similar way, circuits for

  • ther arithmetic opera-

tions can be implemented.

slide-5
SLIDE 5

17 of 87

How to Store a BIT?

Circuits like those shown in the previous slides are called “combinatorial”: they produce an output that only depends on the input; the output is main- tained as long as that input is applied.

What about a circuit that is able to store a bit? You can write 1 or 0 to the circuit, and the output will keep the value also af- ter the input has disappeared.

18 of 87

Flip-Flops

The circuit below has two inputs. One (called S) for setting it to 1 (H), the

  • ther (called R) for setting to 0 (L).

When there arrives an input 1 to S, the output becomes 1; it will stay 1, until there comes an input 1 to S.

Once an input 1 arrived to S, the output switches 0, and stays so until an 1 arrives to R. Out S R

19 of 87

Setting the Flip-Flop to 1

1

20 of 87

Setting the Flip-Flop to 1

1 1 1 1 1

slide-6
SLIDE 6

21 of 87

Setting the Flip-Flop to 1

1 1 1 1

The input has changed to 0, but the output still remembers 1!

22 of 87

Setting the Flip-Flop to 0

1

23 of 87

Setting the Flip-Flop to 0

1

24 of 87

Setting the Flip-Flop to 0

1

The input has changed to 0, but the output still remembers 0!

slide-7
SLIDE 7

25 of 87

Flip-Flops

One flip-flop can store one bit. Using groups of several flip-flops, arbitrary long sequences of bits can be stored. This is a basic technique to store data in computers e.g. in registers.

26 of 87

Let’s Go Over to Computers

We have seen how data (logical and numerical) is represented in a computer.

We have seen that it is possible to construct circuits that are able to operate on data and perform logical and arithmetical operations.

We have seen that circuits can be built which are able to store data.

27 of 87

Let’s Go Over to Computers

We have seen how data (logical and numerical) is represented in a computer.

We have seen that it is possible to construct circuits that are able to operate on data and perform logical and arithmetical operations.

We have seen that circuits can be built which are able to store data.

Now, let’s see how a computer is built and works!

28 of 87

What is a Computer/Computer-System?

A computer is a data processing machine which is operated automatically under the control of a list of instructions (called a program) stored in its main memory.

Computers today are extremely complex and are built of many interconnect- ed components; in addition to actual data processing, they have to perform tasks such as communicate with other computers and devices, to interact with the user and the environment, etc. Therefore we speak about Computer Systems.

slide-8
SLIDE 8

29 of 87

Computer Systems

USB controller Graphics adapter Disk controller Mouse Display Bridge Main memory

ALU Control Unit Registers IR PC CPU

Bus in- terface Memory bus System bus I/O bus Internal CPU bus Keyboard

Disk

30 of 87

Computer Systems

USB controller Graphics adapter Disk controller Mouse Keyboard Display Bridge

ALU Control Unit Registers IR PC CPU

Bus in- terface Memory bus System bus I/O bus Internal CPU bus

The CPU (Central Processing Unit) This is the hart of the system; it is the engine that interprets the in- structions and executes them (with the help of other components of the computer system).

Disk

Main memory

31 of 87

Computer Systems

USB controller Graphics adapter Disk controller Mouse Display Bridge

ALU Control Unit Registers IR PC CPU

Bus in- terface Memory bus System bus I/O bus Internal CPU bus

The Main Memory Is a temporary storage that stores both instructions (the program) and data.

Keyboard Disk Main memory

32 of 87

Computer Systems

USB controller Graphics adapter Disk controller Mouse Display Bridge

ALU Control Unit Registers IR PC CPU

Bus in- terface Memory bus System bus I/O bus Internal CPU bus

The CPU together with the Main Memory build the core computer; this is the minimal structure capa- ble of storing and executing pro- grams. The rest of the computer system deals with communication, Input/ Output, long term storage, and in- teraction with the environment.

Keyboard Disk Main memory

slide-9
SLIDE 9

33 of 87

Computer Systems

USB controller Graphics adapter Disk controller Mouse Display Bridge

ALU Control Unit Registers IR PC CPU

Bus in- terface Memory bus System bus I/O bus Internal CPU bus Keyboard

Buses Buses are the physical infrastruc- ture (electrical wiring) over which bytes are travelling between com- ponents of the computer system.

Disk

Main memory

34 of 87

Computer Systems

USB controller Graphics adapter Disk controller Mouse Display Bridge

ALU Control Unit Registers IR PC CPU

Bus in- terface Memory bus System bus I/O bus Internal CPU bus

Input/Output Devices They connect the computer to the external world. Connection is via controllers/adaptors.

Keyboard Disk Main memory

35 of 87

Computer Systems

USB controller Graphics adapter Disk controller Mouse Display Bridge

ALU Control Unit Registers IR CPU

Bus in- terface Memory bus System bus I/O bus Internal CPU bus

Disk drive The disk drive is a special device used as a long term storage for data and programs. Such a storage is also called Secondary Memory.

Keyboard Disk

On modern computers the sec-

  • ndary memory is often imple-

mented as solid state disk (SSD)

  • n flash memory.

PC

Main memory

36 of 87

How Does a Computer Work?

All computers in use, simple or complicated, big or small, cheap or expen- sive work according to the same basic concept, known as the von Neumann architecture:

Data and instructions are both stored in the main memory (stored pro- gram concept);

The content of the memory is addressable by location (without regard to what is stored in that location);

Instructions are executed sequentially (from one instruction to the next, in order of their location in memory) unless the order is explicitly modified.

slide-10
SLIDE 10

37 of 87

A Simple Computer Architecture

Main memory

ALU Control Unit Registers IR PC CPU

Bus in- terface

The basic organization (architecture):

Central processing unit (CPU) contains:

Control unit (CU) that coordinates the execution of instructions;

Arithmetic/logic unit (ALU) that per- forms arithmetic and logic operations;

A set of registers.

Main memory.

38 of 87

A Simple Computer Architecture

ALU Control Unit Registers IR PC CPU

Bus in- terface Main memory

Register Organization

The set of registers within the CPU represents the top level of the memory hierarchy inside the computer system:

User visible registers: can be accessed by programs, for data storing.

Control and Status registers: used by the Control Unit to control the operation

  • f the CPU; not directly accessible by

the programmer.

39 of 87

A Simple Computer Architecture

ALU Control Unit Registers IR PC CPU

Bus in- terface Main memory

User Visible Registers

A set of registers which can be used without restrictions as operands for any operation and as address registers; these are so called gen- eral-purpose registers.

40 of 87

A Simple Computer Architecture

ALU Control Unit Registers IR PC CPU

Bus in- terface Main memory

Control and Status Registers

Program Counter (PC): holds the address of the instruction to be fetched and executed.

Instruction Register (IR): holds the last in- struction fetched.

Program Status Word (PSW): Condition Code Flags + other bits defining the status

  • f the CPU.

. . . . . . . . . . . . . . . . . . . . . . .

slide-11
SLIDE 11

41 of 87

A Simple Computer Architecture

ALU Control Unit Registers IR PC CPU

Bus in- terface Main memory

Arithmetic Logic Unit (ALU)

Performs arithmetic and logic operations. There might be several of them in a CPU. ALUs are different, depending on the data type they operate on: integer ALU, floating point ALU, etc.

42 of 87

A Simple Computer Architecture

ALU Control Unit Registers IR PC CPU

Bus in- terface Main memory

Arithmetic Logic Unit (ALU)

Performs arithmetic and logic operations. There might be several of them in a CPU. ALUs are different, depending on the data type they operate on: integer ALU, floating point ALU, etc. Control Unit

The control unit generates the appropriate signals such that all other components of the CPU and the computer system, togeth- er, execute the current instruction.

The current instruction to execute is stored in the instruction register (IR); it is the in- struction whose memory address is stored in the program counter (PC)

43 of 87

Machine Instructions

A CPU can only execute machine instructions,

Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

44 of 87

Machine Instructions

A CPU can only execute machine instructions,

Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

A machine instruction is represented as a sequence of bits (binary digits). These bits are organized into fields that define:

0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1

  • pcode
  • perand 1

(memory)

  • perand 2

(register)

slide-12
SLIDE 12

45 of 87

Machine Instructions

A CPU can only execute machine instructions,

Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

A machine instruction is represented as a sequence of bits (binary digits). These bits are organized into fields that define:

What has to be done (the operation code).

0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1

  • pcode
  • perand 1

(memory)

  • perand 2

(register)

46 of 87

Machine Instructions

A CPU can only execute machine instructions,

Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

A machine instruction is represented as a sequence of bits (binary digits). These bits are organized into fields that define:

What has to be done (the operation code).

To whom the operation applies (source operands).

0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1

  • pcode
  • perand 1

(memory)

  • perand 2

(register)

47 of 87

Machine Instructions

A CPU can only execute machine instructions,

Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

A machine instruction is represented as a sequence of bits (binary digits). These bits are organized into fields that define:

What has to be done (the operation code).

To whom the operation applies (source operands).

Where does the result go (destination operand); in this example CPU it is assumed that the result of the operation is stored in the same place where the second operand was stored; no additional field is needed.

0 0 0 1 1 0 1 1 1 0 0 0 1 0 1 1

  • pcode
  • perand 1

(memory)

  • perand 2

(register)

48 of 87

Machine Instructions

A CPU can only execute machine instructions,

Each computer has a set of specific machine instructions which its CPU is able to recognize and execute.

The number of bits, number and length of the fields and their order is partic- ular to each computer; this defines the instruction format of that computer.

0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1

  • pcode
  • perand 1

(memory)

  • perand 2

(register)

slide-13
SLIDE 13

49 of 87

Types of Machine Instructions

Machine instructions are of four types:

Data transfer between memory and CPU registers

Arithmetic and logic operations

Program control (test and branch); these are those instructions that change the flow of instruction execution by jumping to an instruction different from the instruction following the current one in memory.

I/O transfer You see, there are very simple things a machine instruction does! But many machine instructions, together, perform the big thing!

50 of 87

Instruction Execution

Let’s imagine you write in a program the following instruction: Z := (Y + X) * 3; The instruction will be executed by the CPU as a sequence of four machine instructions!

51 of 87

Instruction Execution

Let’s imagine you write in a program the following instruction: Z := (Y + X) * 3; 0000101110001010 Move addr of Y Reg 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Memory address at which the instruc- tion/data is stored X Y Z 00001000 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 0000000000101010 Add addr of X Reg 2 Mul value “3” Reg 2 Move addr of Z Reg 2 00001001 00001010 00001011 01110000 01110001 01110010 Content of the memory Instructions Data Move value of Y to Reg 2 Add value of X to Reg 2 (result kept in Reg 2) Multiply Reg 2 with 3 (result kept in Reg 2) Store Reg 2 at address of Z Value of X: 11 Value of Y: 3 Final value of Z: 42

52 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 x x x xx x x xx x x xx x x x x x x xx x x xx x x xx x x x Instruction Register Register R2 x x x xx x x xx x x xx x x x Instructions X Y Z 00001000 Program Counter

Before the first instruction

slide-14
SLIDE 14

53 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 x x x xx x x xx x x xx x x x 0000101110001010 Instruction Register Register R2 x x x xx x x xx x x xx x x x Instructions X Y Z 00001000 Program Counter

Now the first instruction is fetched

54 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 x x x xx x x xx x x xx x x x Instruction Register Register R2 0000000000000011 Instructions X Y Z 00001001 Program Counter

After the first instruction

0000101110001010

55 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 x x x xx x x xx x x xx x x x 0001101110000010 Instruction Register Register R2 0000000000000011 Instructions X Y Z 00001001 Program Counter

Now the sec-

  • nd instruc-

tion is fetched

56 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 x x x xx x x xx x x xx x x x 0001101110000010 Instruction Register Register R2 0000000000001110 Instructions X Y Z 00001010 Program Counter

After the sec-

  • nd instruction
slide-15
SLIDE 15

57 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 x x x xx x x xx x x xx x x x 0010100000011010 Instruction Register Register R2 0000000000001110 Instructions X Y Z 00001010 Program Counter

Now the third instruction is fetched

58 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 x x x xx x x xx x x xx x x x 0010100000011010 Instruction Register Register R2 0000000000101010 Instructions X Y Z 00001011 Program Counter

After the third instruction

59 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 x x x xx x x xx x x xx x x x 0001001110010010 Instruction Register Register R2 0000000000101010 Instructions X Y Z 00001011 Program Counter

Now the fourth instruc- tion is fetched

60 of 87

Let’s Follow the Instruction Execution

Control unit ALU Main memory data instructions CPU 0000101110001010 0001101110000010 0010100000011010 0001001110010010 0000000000001011 0000000000000011 0000000000101010 0001001110010010 Instruction Register Register R2 0000000000101010 Instructions X Y Z 00001100 Program Counter

After the fourth and last instruction

slide-16
SLIDE 16

61 of 87

Compilers

We have written in our program: What the computer executes is: Z := (Y + X) * 3;

0000101110001010 0001101110000010 0010100000011010 0001001110010010

High Level Language (e.g. C, C++, Java) Machine instruc- tions for the partic- ular processor that runs the program.

62 of 87

Compilers

We have written in our program: What the computer executes is: Who brings us from our program to the machine instructions?

A compiler is a program that translates programs written in a high level lan- guage into machine code to be executed on a certain processor. Z := (Y + X) * 3;

0000101110001010 0001101110000010 0010100000011010 0001001110010010

High Level Language (e.g. C, C++, Java) Machine instruc- tions for the partic- ular processor that runs the program.

  • Z=(Y+X)*3
  • Compiler

HL-language program program in machine code

01001000 10110001 00111001 63 of 87

The Machine Cycle

From the previous example you have seen that many things have to be done to execute a simple machine instruction;

Fetch instruction

Decode instruction

Execute instruction

64 of 87

The Machine Cycle

From the previous example you have seen that many things have to be done to execute a simple machine instruction;

Fetch instruction

Decode instruction

Execute instruction

Fetch operand(s)

Execute instruction

slide-17
SLIDE 17

65 of 87

The Machine Cycle

From the previous example you have seen that many things have to be done to execute a simple machine instruction;

Fetch instruction

Decode instruction

Execute instruction

Fetch operand(s)

Execute instruction Fetch instruction Decode Fetch

  • perand

Execute instruction Machine Cycle Each instruction is performed as a sequence

  • f steps; the steps corresponding to the exe-

cution of one instruction are referred together as a machine cycle. The number and nature of steps in the ma- chine cycle differ from processor to processor.

66 of 87

The Quest for Speed

Running faster (more instructions per time unit) has been a permanent goal of computer designers. Two main factors contribute to high performance of modern processors:

  • 1. Fast circuit technology: smaller and faster switching transistors, allowing

the processor to run at higher frequency.

  • 2. Architectural features such as:

Smart memory hierarchies

Pipelining

Superscalar architectures Several instructions are executed in parallel.

67 of 87

Memory System

One of the most crucial aspects in designing efficient computer architectures is the memory system.

What do we need? We need memory to fit very large programs and to work at a speed comparable to that of the microprocessors.

Main problem:

Processors are working at a high clock rate and they need large memories;

Memories are much slower than microprocessors; but for executing a sin- gle instruction you need several memory accesses (fetch the instruction and operands); it doesn’t help that the processor is fast, if the memory is

  • rders of magnitude slower.

68 of 87

The CPU-Memory Gap

(main memory) memory access time (ns) CPU cycle time (ns)

slide-18
SLIDE 18

69 of 87

Memory Hierarchies

Fast memories are more expensive per byte and cannot be very large (main memory is much smaller than SSD or Disk)

It is possible to build memory structures that are as fast as the CPU, but they are very expensive and small.

70 of 87

Memory Hierarchies

Fast memories are more expensive per byte and cannot be very large (main memory is much smaller than SSD or Disk)

It is possible to build memory structures that are as fast as the CPU, but they are very expensive and small.

71 of 87

Memory Hierarchies

The good news:

It is possible to build a composite memory system which combines small, fast memories (from the top of the hierarchy) and large slow memories (from the middle and bottom of the hierarchy) and which behaves (most of the time) like a large fast memory.

How can this work?

72 of 87

Memory Hierarchies

The good news:

It is possible to build a composite memory system which combines small, fast memories (from the top of the hierarchy) and large slow memories (from the middle and bottom of the hierarchy) and which behaves (most of the time) like a large fast memory.

How can this work? The answer is: Locality

slide-19
SLIDE 19

73 of 87

The Principle of Locality

During execution of a program, memory references by the processor, for both instructions and data, tend to cluster: once an area of the program is entered, there are repeated references to a small set of instructions (loop, subroutine) and data (components of a data structure, local variables or pa- rameters on the stack).

Temporal locality (locality in time): If an item is referenced, it will tend to be referenced again soon.

Spacial locality (locality in space): If an item is referenced, items whose addresses are close by will tend to be referenced soon.

74 of 87

Cache Memory

A cache memory is a small, very fast memory that retains copies of recently used information (instructions and data). It operates transparently to the pro- grammer, automatically deciding which values to keep and which to overwrite.

Due to the property of locality, most of the time, the instruction or data re- quired by the CPU will be available in the top cache. If not, it will be loaded from the lower level cache; once loaded the information will be written into the top level cache and replace some existing one, in order to make space for the new information.

Which information is replaced when new one has to be written?

Some information is overwritten that has, for a long time, not been used by the CPU (and, thus, is less likely to be needed in the future)

The above procedure is repeated at each level of the hierarchy.

75 of 87

The Quest for Speed

Running faster (more instructions per time unit) has been a permanent goal of computer designers. Two main factors contribute to high performance of modern processors:

  • 3. Fast circuit technology: smaller and faster switching transistors, allowing

the processor to run at higher frequency.

  • 4. Architectural features such as:

Smart memory hierarchies

Pipelining

Superscalar architectures Several instructions are executed in parallel.

76 of 87

Pipelining

We remember the machine cycle Fetch instruction Decode Fetch

  • perand

Execute instruction

slide-20
SLIDE 20

77 of 87

Pipelining

We remember the machine cycle

Each step in the machine cycle is performed by a separate piece of hardware: Fetch instruction Decode Fetch

  • perand

Execute instruction

Stage 1 Stage 2 Stage 3 Stage 4

New instruction fetched Result comes out:

  • ne result every T

time units Takes time T

78 of 87

Pipelining

We remember the machine cycle

Each step in the machine cycle is performed by a separate piece of hardware: Fetch instruction Decode Fetch

  • perand

Execute instruction

Stage 1 Stage 2 Stage 3 Stage 4

New instruction fetched Result comes out:

  • ne result every T

time units Takes time T

The CPU works like a pipeline (assembly line): Once a stage finished with an instruction, it hands it over to the next stage and takes over a new instruction.

Stage 1 Stage 2 Stage 3 Stage 4

New instruction fetched Result comes out:

  • ne result every

T/4 time units!!!

79 of 87

Superscalar Architectures

You can imagine a superscalar processor as composed of several pipelines running together.

As opposed to simple pipelined computers, superscalars fetch several instructions and produce several results simultaneously Several instructions in, at the same time Several results out, at the same time

80 of 87

The Quest for Speed

Running faster (more instructions per time unit) has been a permanent goal of computer designers. Two main factors contribute to high performance of modern processors:

  • 5. Fast circuit technology: smaller and faster switching transistors, allowing

the processor to run at higher frequency.

  • 6. Architectural features such as:

Smart memory hierarchies

Pipelining

Superscalar architectures That one has been a primary source of performance improve- ment all over the years. Proces- sors running at higher and higher frequencies allowed for a continuous increase in speed. That doesn’t work any more!!!

slide-21
SLIDE 21

81 of 87

The Power Wall

We have reached the limit due to the temperature produced by the high power consumption! Further increase of the frequency is impossible! This is the main challenge today! New ways have to be explored in order to deliver performance!

82 of 87

Multicore Chips

Multicore chips: Several processors on the same chip.

This is the only way to increase chip performance without excessive increase in power consumption:

Instead of increasing processor frequency, use several processors and run them in parallel, each at lower frequency.

83 of 87

Intel Core Duo

Composed of two Pentium M superscalar processors.

Processor core 32-KB L1 I Cache 32-KB L1 D Cache Processor core 32-KB L1 I Cache 32-KB L1 D Cache 2 MB L2 Shared Cache Off chip Main Memory

84 of 87

Intel Core i7

Composed of four x86 SMT (simultaneous multithreading) processors.

Processor core

32-KB L1 I Cache 32-KB L1 D Cache

Processor core

32-KB L1 I Cache 32-KB L1 D Cache

Processor core

32-KB L1 I Cache 32-KB L1 D Cache

Processor core

32-KB L1 I Cache 32-KB L1 D Cache

256 KB L2 Cache 256 KB L2 Cache 256 KB L2 Cache 256 KB L2 Cache

8 MB L3 Shared Cache Off chip Main Memory

slide-22
SLIDE 22

85 of 87

ARM11 MPcore

Composed of four ARM11 processor cores.

Arm11 Processor core

32-KB L1 I Cache 32-KB L1 D Cache 32-KB L1 I Cache 32-KB L1 D Cache 32-KB L1 I Cache 32-KB L1 D Cache 32-KB L1 I Cache 32-KB L1 D Cache

Off chip Main Memory Cache coherence unit Arm11 Processor core Arm11 Processor core Arm11 Processor core

86 of 87

Intel's Single-Chip Cloud Computer (SCC)

Composed of 48 P54C Pentium cores

87 of 87

Where are We

Bits, Bytes, Words, Representations

Information

Arithmetical & Logical Computations

Computation

Transistors, Circuits & Processors

Hardware

Logical Gates & Networks

Hardware

Computer Architectures

Hardware

Machine Language

Programming

Assembly Language

Programming

Abstraction

Electronics Digital Electronics Signals & Systems Information Theory Discrete Mathematics Computer Systems Compiler Design