CSE141: Introduction to Computer Architecture Hung-Wei Tseng - - PowerPoint PPT Presentation

cse141 introduction to computer architecture
SMART_READER_LITE
LIVE PREVIEW

CSE141: Introduction to Computer Architecture Hung-Wei Tseng - - PowerPoint PPT Presentation

CSE141: Introduction to Computer Architecture Hung-Wei Tseng CSE141: Lets say something! Whats your name? Whats the most exciting thing you did so far this summer? Whats the most interesting computer science topic for you?


slide-1
SLIDE 1

CSE141: Introduction to Computer Architecture

Hung-Wei Tseng

slide-2
SLIDE 2
  • What’s your name?
  • What’s the most exciting thing you did so far this summer?
  • What’s the most interesting computer science topic for you?

2

CSE141: Let’s say something!

slide-3
SLIDE 3

3

slide-4
SLIDE 4

We want faster and faster computers!

4

slide-5
SLIDE 5

5

What is “architecture”

  • "Architecture." Merriam-Webster.com. Merriam-Webster, n.d. Dec 18 2018. <http://www.merriam-webster.com/dictionary/architecture>.
slide-6
SLIDE 6

6

Computer architecture? Architecture

the art or science of building computers

slide-7
SLIDE 7
  • What are inside computers?
  • Von Neumann architecture
  • Current state of CPU architectures
  • Why is CSE141 important?
  • What’s in this class?

7

Outline

slide-8
SLIDE 8

What are inside Computers

8

slide-9
SLIDE 9

9

CPU+DR AM

CPU DRAM CPU DRAM

CPU

DRAM DRAM DRAM

CPU

DRAM

CPU+DR AM

CPU CPU CPU CPU DRAM DRAM DRAM DRAM

Modern computers

The same spirit, but different implementations

slide-10
SLIDE 10

Desktop PC

10

PCIe

Memory CPU Socket

SATA

PCI

I/O connectors

slide-11
SLIDE 11

11

Server

CPU Socket

PCIe

Memory Memory Memory Memory CPU Socket CPU Socket CPU Socket

slide-12
SLIDE 12

12

Macbook Pro w/ Retina

System Hub Memory CPU Connectors SSD Slot

slide-13
SLIDE 13

13

iPhone

Sim Card CPU + DRAM

Peripherals Peripherals

slide-14
SLIDE 14

14

PS4

CPU+ GPU Memory Memory

Memory

Connectors

Peripherals

slide-15
SLIDE 15

15

Processor/memory is everywhere!

slide-16
SLIDE 16

16

Processors/CPUs for computers

slide-17
SLIDE 17

17

Memory/storage for computers

slide-18
SLIDE 18

The computer is now like a small network

18

SATA SSD HDD Wireless NIC NIC

Processor

DRAM processor-memory bus GPU Accelerator NVMe SSD FPGA/ASIC

slide-19
SLIDE 19

In the beginning …

19

slide-20
SLIDE 20

20

slide-21
SLIDE 21

Difference engine

21

1822: English mathematician Charles Babbage conceives of a steam-driven calculating machine that would be able to compute tables of numbers.

slide-22
SLIDE 22

ENIAC

22

ENIAC(Electronic Numerical Integrator And Computer) was the first electronic general-purpose computer. It was Turing-complete, digital, and could solve "a large class of numerical problems" through reprogramming.

You have to change the physical hardware configuration

slide-23
SLIDE 23

23

https://az-eandt-live-legacy.azureedge.net/news/2013/apr/images/640_edsac-web.jpg

slide-24
SLIDE 24

Von Neumann Architecture & modern computers

24

slide-25
SLIDE 25

Von Neumann architecture

25

memory

2 8 3

CPU is a dominant factor of performance since we heavily rely

  • n it to execute programs

By pointing “PC” to different part

  • f your memory, we can perform

different functions!

slide-26
SLIDE 26

Memory

26

You can only store 0 or 1 in each memory cell

slide-27
SLIDE 27
  • Assume that we have 4 bits



 
 
 


  • Example binary arithmetic

27

Representing a positive number

Decimal Binary Decimal Binary 0000 4 0100 1 0001 5 0101 2 0010 6 0110 3 0011 7 0111

3 + 2 = 5 0 0 1 1 + 0 0 1 0 1 1 carry 1 3 + 3 = 6 0 0 1 1 + 0 0 1 1 1 1 1 1

slide-28
SLIDE 28
  • Guidelines
  • Obvious representation of 0, 1, 2, ......
  • Efficient usage of number space
  • Equal coverage of positive and negative numbers
  • Easy hardware design
  • 1‘s complement + 1 = 2’s complement
  • Invert every bit, then + 1
  • 1 = b‘1110 + b’1 = b‘1111

28

2’s complement

Decimal Binary Decimal Binary 0000

  • 1

1111 1 0001

  • 2

1110 2 0010

  • 3

1101 3 0011

  • 4

1100 4 0100

  • 5

1011 5 0101

  • 6

1010 6 0110

  • 7

1001 7 0111

  • 8

1000

Does not waste 1111 anymore

slide-29
SLIDE 29
  • Do we need a separate procedure/hardware for adding positive and negative

numbers?

29

Evaluating 2’s complement

  • 3 + 2 = 5

  • 3 + (-2) = 1

  • A. No. The same procedure applies
  • B. No. The same “procedure” applies but it changes overflow detection
  • C. Yes, and we need a new procedure
  • D. Yes, and we need a new procedure and a new hardware
  • E. None of the above


0 0 1 1 + 0 0 1 0 1 1 1 0 0 1 1 + 1 1 1 0 1 = 1 1 1 1

slide-30
SLIDE 30

Full ALU

30 sign bit (adder output from bit 31) Binvert

CI

Operation

ADD 2 SUB 1 1 2 AND OR 1 SLT 3

slide-31
SLIDE 31
  • S: sign bit
  • Actual Exponent: E - 127
  • Mantissa
  • Normalized binary significand
  • Hidden integer bit: 1
  • The actual mantissa is 1.M

31

IEEE 754 standard floating point

S E (8 bits) M (23 bits)

  • 0 01111110 10000000000000000000000

b‘01111110=126
 126 - 127 = -1 b‘1.1 = 1+1*2-1 = 1.5

1.5 * 2-1 = 0.75

slide-32
SLIDE 32

FP add hardware

32

slide-33
SLIDE 33

Encoding an R-format instruction

33

add $v0, $a1, $a2

  • pcode

rs rt rd shift amount function

R-format

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

  • pcode

rs rt rd shift amount function

00110 000000 100000 00000 00010 00101

slide-34
SLIDE 34

Dive into the circuit a bit

34

Processor PC

$0 $at $v0 $ra

........

registers ALU

Register file 4-bit ALU memory each of them has different structure — different timing

slide-35
SLIDE 35

Register file 4-bit ALU

  • A hardware signal defines when data for any specific component is ready to

use by others

  • Think about the clock in real life!
  • We use edge-triggered clocking
  • Values stored in the sequential logic is updated only on a clock edge

35

Clock — synchronizing hardware components

clock cycle

memory

slide-36
SLIDE 36

Don’t worry — We are not going to talk about the previous 10 slides in detail

36

slide-37
SLIDE 37
  • Instruction fetch: where?

  • Decode:
  • What’s the instruction?
  • Where are the operands?
  • Execute
  • Memory access
  • Where is my data?
  • Write back
  • Where to put the result
  • Determine the next PC

37

How CPU handle instructions

Processor PC

120007a30: 0f00bb27 ldah gp,15(t12) 120007a34: 509cbd23 lda gp,-25520(gp) 120007a38: 00005d24 ldah t1,0(gp) 120007a3c: 0000bd24 ldah t4,0(gp) 120007a40: 2ca422a0 ldl t0,-23508(t1) 120007a44: 130020e4 beq t0,120007a94 120007a48: 00003d24 ldah t0,0(gp) 120007a4c: 2ca4e2b3 stl zero,-23508(t1)

instruction memory data memory

800bf9000: 00c2e800 12773376 800bf9004: 00000008 8
 800bf9008: 00c2f000 12775424
 800bf900c: 00000008 8 800bf9010: 00c2f800 12777472
 800bf9014: 00000008 8
 800bf9018: 00c30000 12779520
 800bf901c: 00000008 8

$0 $at $v0 $ra

........

registers ALU

program counter & instruction memory registers ALUs data memory registers

slide-38
SLIDE 38

Current state of CPU architectures

38

slide-39
SLIDE 39

Moore’s Law

39

  • The number of transistors we can build in a fixed area of silicon doubles

every 12 ~ 24 months.

(1) Moore, G. E. (1965), 'Cramming more components onto integrated circuits', Electronics 38 (8) . (1) Transistor Count 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 10,000,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Moore’s Law is the most important driver for historic CPU performance gains

slide-40
SLIDE 40

40

CPU performance scales well before 2002

  • 52%/year
slide-41
SLIDE 41

41

The slowdown of CPU scaling

SPECRate 10 20 30 40 50 60 70 80 90 Sep-06 Dec-06 Mar-07 Jun-07 Sep-07 Dec-07 Mar-08 Jun-08 Sep-08 Dec-08 Mar-09 Jun-09 Sep-09 Dec-09 Mar-10 Jun-10 Sep-10 Dec-10 Mar-11 Jun-11 Sep-11 Dec-11 Mar-12 Jun-12 Sep-12 Dec-12 Mar-13 Jun-13 Sep-13 Dec-13 Mar-14 Jun-14 Sep-14 Dec-14 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 Jun-16 Sep-16 Dec-16 Mar-17 Jun-17 Sep-17

5x in 67 months 1.5x in 67 months

slide-42
SLIDE 42

Power limits CPU performance scaling

42

istor Performance vs. Leakage

1x 0.001x 0.01x 0.1x 14 nm 22 nm 32 nm 45 nm 65 nm

Lower Leakage Power Higher Transistor Performance (switching speed)

Static power under different process technologies

  • Power = Active Power + Leakage Power
  • Active Power =aCV2f
  • a: switches per cycle
  • C: capacitance
  • V: voltage
  • f: frequency, usually 


linear with V

  • Static power
  • Roughly a fixed 


percentage of active power

  • Increasing as frequency goes higher
  • To limit static power, you cannot lower your Vdd and Vt
slide-43
SLIDE 43

43

Leakage power becomes increasingly significant

slide-44
SLIDE 44

44

slide-45
SLIDE 45
  • Multi-processors
  • Two must be faster than one!
  • “Theoretically,” they double the performance without changing the clock speed.
  • Seems simple, but...
  • Speeding up a single CPU makes everything faster!
  • An application’s performance double every 18 months with no effort on the programmer’s part.
  • Getting performance out of a multiprocessor is difficult.
  • Parallelizing code is difficult, it takes (lots of) work
  • There aren’t that many threads. Threads incur overhead, too.
  • Remember or look forward to CSE141

45

The rise of parallelism

slide-46
SLIDE 46

46

Intel P4 (2000) 1 core Intel Nahalem (2010) 4 cores Nvidia Tegra 3 (2011) 5 cores SPARC T3 (2010) 16 cores AMD Zambezi (2011) 16 cores AMD Athlon 64 X2 
 (2005) 2 cores

slide-47
SLIDE 47

Computers become heterogeneous

47

GPU TPU CPU DRAM SSD H.D.D. Network Interface FPGA

Additional data movement

slide-48
SLIDE 48

Why this course & what’s in this class

48

slide-49
SLIDE 49

49

Connecting to the physical world…

Physics/Materials Devices Micro-architecture Instruction Set Architectures Processors

slide-50
SLIDE 50

Why this course?

50

JVM

Instruction Set 
 Architecture Operating
 Systems Compilers
 Runtime
 Virtual Machines Programming
 Languages Programmers/
 Users

slide-51
SLIDE 51
  • Understand how CPUs/GPUs run programs
  • Understand why CPU/system performance (and other metrics) varies
  • Understand why program performance varies
  • Understand how to write efficient programs!

51

Why this course?

slide-52
SLIDE 52

52

Live demo!

if(option) std::sort(data, data + arraySize); for (unsigned c = 0; c < arraySize*10000; ++c) { if (data[c%arraySize] >= rand()) sum ++; } }

if option is set to 1 —> O(nlogn),

  • therwise, O(n)
slide-53
SLIDE 53
  • Instruction set architectures
  • MIPS
  • ISAs and the compiler
  • Measuring performance
  • Amdahl’s Law
  • Performance measurement
  • Metrics
  • Processor design
  • Basic design
  • Pipelining
  • Dealing with hazards
  • Speculation and control
  • Improving ILP
  • The memory system
  • Memory technologies
  • Caching
  • Introduction to multiprocessors —

pthread programming

  • GPU architecture

53

What’s in the class

slide-54
SLIDE 54

Date Topic Readings Due 2019/08/05 (8a) Introduction & ISA 2019/08/05 (9:30a) ISA 2.1-2.7, 2.10, 2.8, 2.12, 2.13, 2.14 and 2.17 2019/08/07 (8a) Performance Evaluation 1.5-1.10 Reading quizzes for 1.5-1.10 due before class 2019/08/07 (9:30a) Performance Evaluation (II) 2019/08/12 (8a) Performance (III) and Single-cycle Processor Design 4.1-4.4 Homework 1 due before class Reading quizzes for 4.1-4.9 due before class 2019/08/12 (9:30a) Pipeline processor 4.5-4.9 2019/08/14 (8a) Pipeline Processor (II) Reading quizzes for 4.5-4.9 due before class 2019/08/14 (9:30a) Pipeline (III) 2019/08/19 (8a) Branch Prediction Homework 2 due before class Reading quizzes for 5.1-5.4 due before class 2019/08/19 (9:30a) Midterm Review 2019/08/21(8a) Midterm 2019/08/21 (9:30a) Memory and caching 5.1-5.4 2019/08/26 (8a) Memory and caching 5.1-5.4 Homework 3 due Reading quizzes for 5.1-5.4 due before class 2019/08/26 (9:30a) Memory and caching 5.8 2019/08/28 (8a) Memory and caching Reading quizzes for 5.6 and 5.7 due before class 2019/08/28 (9:30a) Virtual Memory 5.6 and 5.7 2019/09/04(8a) Modern Processor Design 4.10 Homework 4 due before class
 Reading quizzes for 6.4-6.5, 5.10 due before class 2019/09/04(9:30a) Introduction to multithreaded processors 6.4-6.5 2019/09/04 (2p) Final Review 2019/9/7 Final

Tentative Schedule

54

You need to complete the reading of H&P Check due dates here Subject to change

slide-55
SLIDE 55

Logistics

55

slide-56
SLIDE 56
  • Lectures: MW 8a-9:20a, 9:30a-10:50a, PCH 120
  • Course webpage:


http://cseweb.ucsd.edu/classes/su19_2/cse141-a/

  • Discussion on piazza


https://piazza.com/class/jyvn9cs1nxx4m5

  • We do podcasting


https://podcast.ucsd.edu/watch/s219/cse141_a00
 


56

Course resource

slide-57
SLIDE 57
  • Hung-Wei Tseng
  • https://intra.engr.ucr.edu/~htseng/
  • Office hour: MW 1:30p-2:30p @ EBU3B 3236
  • E-mail: htseng+CSE141@eng.ucsd.edu

57

Instructor

slide-58
SLIDE 58
  • Harish Prasanth
  • Office hours: TuTh 12p-1p @ EBU3B B260A
  • E-mail: hgajendr@eng.ucsd.edu

58

Teaching Assistant

slide-59
SLIDE 59
  • Login/discussion in TritonEd and piazza.
  • Read the text before class!
  • Computer Organization and Design: The Hardware/Software Interface (5th Edition) --

previous editions are not supported

  • I’m not going to cover everything in class, but you are responsible for all the assigned

text.

  • Reading quizzes in TritonEd (15%)
  • Come to class (10%)
  • I will cover things not in the book.
  • 10% from clickers
  • Homework throughout the course. (15%)
  • Help to practice the concepts from each topic
  • Midterm (25%)
  • Cumulative final (35%)

59

Your tasks

slide-60
SLIDE 60
  • Questions about the assigned reading on
  • Complete before class begins
  • There is one due Wednesday!
  • Be sure to check the website for the reading quizzes!

60

Reading quizzes

slide-61
SLIDE 61

Most lectures today …

61

Me

slide-62
SLIDE 62

Me You

I expect the lecture to be…

62

slide-63
SLIDE 63
  • I’ll bring in activities to ENGAGE you in exploring your understanding of the

material

  • Let you practice
  • Bring out misconceptions
  • Let us LEARN from each other about difficult parts.
  • You will be GET CREDIT for your efforts to learn in class
  • By answering questions with a clicker (iClicker)
  • Answer 80% of the clicker questions in class, get 9% of your final grade
  • Process: Individual Think/Answer, Group Discussion, Group Answer

63

Peer instruction

slide-64
SLIDE 64

CSE 141 - Intro/Computer Architecture - Tseng [S119] (Course is unavailable to students) Tools

Tools

Announcements Create and view Course Announcements. Blogs Create and manage blogs for Courses and Course Groups. Calendar Track important events and dates through the Calendar. Contacts Instructors can post contact information about themselves and others. Discussion Board Create and manage Forums within the Discussion Board. Glossary View a list of important terms and their denitions. Groups Create and manage formal groups of students to collaborate on work. Journals Create and manage journals that can be assigned to each user in a group for the purposes of private communication with the instructor. My Grades Displays detailed information about your grades. Tasks Use tasks to keep track of work that must be completed. Each Task has a status and a due date. Wikis Create and manage wikis for Courses and Course Groups WileyPLUS Open the complete eTextbook for your course. Start using WileyPLUS for your assignments. Zoom Meeting i>clicker Student Registration Register your i>clicker Remote ID

?

Student Preview mode is ON

Exit Preview Settings

  • Set your channel to “CA”
  • Register your i-clicker
  • You can do this through TritonEd

64

Setup your i-clicker

slide-65
SLIDE 65
  • Assigned on Wednesdays, due on Mondays, before the lecture
  • The best way to prepare the examines
  • Submit through TritonEd.

65

Assignments

slide-66
SLIDE 66
  • You can see your grades on TritonEd.



 
 
 
 
 


  • Errors in grading
  • If you feel there has been an error in how an assignment or test was graded, you have one week

from when the assignment is return to bring it to our attention. You must submit (via email to the instructor and the appropriate TAs) a written description of the problem. Neither I nor the TAs will discuss regrades without receiving an email from you about it first.

  • For arithmetic errors (adding up points etc.)
  • you do not need to submit anything in writing, but the one week limit still applies.

66

Grading

slide-67
SLIDE 67
  • Don’t cheat.
  • Cheating on a test will get you an F in the class and no option to drop, and a visit with your

college dean.

  • Cheating on homework means you don’t have to turn them in any more, but you don’t get

points either. You will also take at least 25% penalty on the exam grades.

  • Copying solutions of the internet or a solutions manual is cheating
  • They are incorrect sometimes
  • Review the UCSD student handbook
  • When in doubt, ask.

67

Academic Honesty

slide-68
SLIDE 68

68

You