CS 356 Unit 0 Class Introduction Basic Hardware Organization 0.2 - - PowerPoint PPT Presentation

cs 356 unit 0
SMART_READER_LITE
LIVE PREVIEW

CS 356 Unit 0 Class Introduction Basic Hardware Organization 0.2 - - PowerPoint PPT Presentation

0.1 CS 356 Unit 0 Class Introduction Basic Hardware Organization 0.2 What is This Course About? Introduction to Computer Systems a.k.a. Computer Organization or Architecture Filling in the "systems" details How is


slide-1
SLIDE 1

0.1

CS 356 Unit 0

Class Introduction Basic Hardware Organization

slide-2
SLIDE 2

0.2

What is This Course About?

  • Introduction to Computer Systems

– a.k.a. Computer Organization or Architecture

  • Filling in the "systems" details

– How is software generated (compilers, libraries) and executed (OS, etc.) – How does computer hardware work and how does it execute the software I write?

  • Lays a foundation for future CS courses

– CS 350 (Operating Systems), ITP/CS 439 (Compilers), CS 353/EE 450 (Networks), EE 457 (Computer Architecture)

slide-3
SLIDE 3

0.3

Today's Digital Environment

Voltage / Currents Transistors / Circuits Digital Logic Processor / Memory / GPU / FPGAs Assembly / Machine Code OS / Libraries C++ / Java / Python Algorithms Voltage / Currents Transistors / Circuits Digital Logic Processor / Memory / GPU / FPGAs Assembly / Machine Code OS / Libraries C++ / Java / Python Algorithms Networks Applications

Our Focus in CS 356

slide-4
SLIDE 4

0.4

Why is System Knowledge Important?

  • Increase productivity

– Debugging – Build/compilation

  • High-level language abstractions break down

at certain points

  • Improve performance

– Take advantage of hardware features – Avoid pitfalls presented by the hardware

  • Basis of understanding security and exploits
slide-5
SLIDE 5

0.5

What Will You Learn

  • Binary representation systems
  • Assembly
  • Processor organization
  • Memory subsystems (caching, virtual

memory)

  • Compiler optimization and linking
slide-6
SLIDE 6

0.6

Administration + Syllabus

  • Course Website: usc-cs356.github.io (Install Course VM)
  • Textbook

Computer Systems: A Programmer’s Perspective Bryant and O’Hallaron, 2015

  • Grading:

– 30 points for assignments (5 assignments, equally weighted) – 40 points for midterms (25 for best MT, 15 for worst) – 30 points for final

  • Piazza
  • Expectations for getting help

– Not allowed to search online! We know some code is available only (those caught using or even referencing

  • nline code will be submitted to SJACS to be assigned an F)

– Acknowledge TA/CP help with comments in your code – Don’t discuss solutions with other students

slide-7
SLIDE 7

0.7

ABSTRACTIONS & REALITY

slide-8
SLIDE 8

0.8

Abstraction vs. Reality

  • Abstraction is good until reality intervenes

– Bugs can result – It is important to underlying HW implementations – Sometimes abstractions don't provide the control

  • r performance you need
slide-9
SLIDE 9

0.9

Reality 1

  • ints are not integers and floats aren't reals
  • Is x2 >= 0 ?

– Floats: Yes – Ints: Not always

  • 40,000*40,000 = 1,600,000,000
  • 50,000*50,000 = -1,794,967,296
  • Is (x+y)+z = x+(y+z)?

– Ints: Yes – Floats: Not always

  • (1e20 + -1e20) + 3.14 = 3.14
  • 1e20 + (-1e20 + 3.14) = around 0
slide-10
SLIDE 10

0.10

Reality 1: Examples

slide-11
SLIDE 11

0.11

Reality 1: Examples

slide-12
SLIDE 12

0.12

Reality 2

  • Knowing some assembly is critical
  • You'll probably never write much (any?) code in

assembly as compilers are often better than even humans at optimizing code

  • But knowing assembly is critical when

– Tracking down some bugs – Taking advantage of certain HW features that a compiler may not be able to use – Implementing system software (OS/compilers/libraries) – Understanding security and vulnerabilities

slide-13
SLIDE 13

0.13

Reality 2: Example

slide-14
SLIDE 14

0.14

Reality 3

  • Memory matters!

– Memory is not infinite – Memory can impact performance more than computation for many applications – Source of many bugs both for single-threaded and especially parallel programs – Source of many security vulnerabilities

slide-15
SLIDE 15

0.15

Reality 4

  • There's more to performance than asymptotic

complexity

– Constant factors matter! – Even operation counts do not predict performance

  • How long an instruction takes to execute is not

deterministic…it depends on what other instructions have been executed before it

– Understanding how to optimize for the processor

  • rganization and memory can lead to up to an
  • rder of magnitude performance increase
slide-16
SLIDE 16

0.16

COMPUTER ORGANIZATION AND ARCHITECTURE

Drivers and Trends

slide-17
SLIDE 17

0.17

Computer Components

  • Processor

– Executes the program and performs all the operations

  • Main Memory

– Stores data and program (instructions) – Different forms:

  • RAM = read and write but volatile

(lose values when power off)

  • ROM = read-only but non-volatile

(maintains values when power

  • ff)

– Significantly slower than the processor speeds

  • Input / Output Devices

– Generate and consume data from the system – MUCH, MUCH slower than the processor

Arithmetic + Logic + Control Circuitry

Program (Instructions) Data (Operands)

Output Devices Input Devices

Data Software Program Memory (RAM) Processor

Combine 2c. Flour Mix in 3 eggs

Instructions Data Processor (Reads instructions,

  • perates on data)

Disk Drive

slide-18
SLIDE 18

0.18

Architecture Issues

  • Fundamentally, computer architecture is all about the

different ways of answering the question: “What do we do with the ever-increasing number of transistors available to us”

  • Goal of a computer architect is to take increasing

transistor budgets of a chip (i.e. Moore’s Law) and produce an equivalent increase in computational ability

slide-19
SLIDE 19

0.19

Moore’s Law, Computer Architecture & Real-Estate Planning

  • Moore’s Law = Number of

transistors able to be fabricated on a chip grows exponentially with time

  • Computer architects decide,

“What should we do with all

  • f this capability?”
  • Similarly real-estate

developers ask, “How do we make best use of the land area given to us?”

USC University Park Development Master Plan

http://re.usc.edu/docs/University%20Park%20Development%20Project.pdf

slide-20
SLIDE 20

0.20

Transistor Physics

  • Cross-section of transistors
  • n an IC
  • Moore’s Law is founded on
  • ur ability to keep

shrinking transistor sizes

– Gate/channel width shrinks – Gate oxide shrinks

  • Transistor feature size is

referred to as the implementation “technology node”

slide-21
SLIDE 21

0.21

Technology Nodes

slide-22
SLIDE 22

0.22

Growth of Transistors on Chip

slide-23
SLIDE 23

0.23

Implications of Moore’s Law

  • What should we do with all these transistors

– Put additional simple cores on a chip – Use transistors to make cores execute instructions faster – Use transistors for more on-chip cache memory

  • Cache is an on-chip memory used to store data the

processor is likely to need

  • Faster than main-memory (RAM) which is on a separate

chip and much larger (thus slower)

slide-24
SLIDE 24

0.24

Memory Wall Problem

  • Processor performance is increasing much faster than memory

performance

Processor-Memory Performance Gap

7%/year 55%/year Hennessy and Patterson, Computer Architecture – A Quantitative Approach (2003)

slide-25
SLIDE 25

0.25

RAM

Processor

Cache Example

  • Small, fast, on-chip memory to

store copies of recently-used data

  • When processor attempts to

access data it will check the cache first

– If the cache has the desired data, it can supply it quickly – If the cache does not have the data, it must go to the main memory (RAM) to access it

System Bus RAM Cac he

Processor

Cac he

Cache has desired data Cache does not have desired data

System Bus

slide-26
SLIDE 26

0.26

Reality 3 & 4 Example

slide-27
SLIDE 27

0.27

Pentium 4

L2 Cache L1 Data L1 Instruc.

slide-28
SLIDE 28

0.28

Increase in Clock Frequency

slide-29
SLIDE 29

0.29

Intel Nehalem Quad Core

slide-30
SLIDE 30

0.30

Progression to Parallel Systems

  • If power begins to limit clock frequency, how can we

continue to achieve more and more operations per second?

– By running several processor cores in parallel at lower frequencies – Two cores @ 2 GHz vs. 1 core @ 4 GHz yield the same theoretical maximum ops./sec.

  • For various applications like graphics and

computationally intensive workloads this is taken to an extreme by GPUs

slide-31
SLIDE 31

0.31

GPU Chip Layout

  • 2560 Small

Cores

  • Upwards of

7.2 billion transistors

  • 8.2 TFLOPS
  • 320

Gbytes/sec

Photo: http://www.theregister.co.uk/2010/01/19/nvidia_gf100/ Source: NVIDIA

slide-32
SLIDE 32

0.32

Intel Haswell Quad Core

slide-33
SLIDE 33

0.33

8th Gen Coffee-Lake Hex-Core Intel Processor

https://www.researchgate.net/figure/Die-Map-of-a-Hexa-Core-Coffee-Lake-Processor_fig6_332543387