CDC 6600 The Worlds First Supercomputer Control Data Corporation - - PowerPoint PPT Presentation

cdc 6600
SMART_READER_LITE
LIVE PREVIEW

CDC 6600 The Worlds First Supercomputer Control Data Corporation - - PowerPoint PPT Presentation

CDC 6600 The Worlds First Supercomputer Control Data Corporation 1957 - Disgruntled employees of Sperry Rand found CDC 1958 - Seymour Cray joins 1959 - Seymour builds little character -- transistor-only machine 1960 -


slide-1
SLIDE 1

CDC 6600

The Worlds First Supercomputer

slide-2
SLIDE 2

Control Data Corporation

  • 1957 - Disgruntled employees of Sperry

Rand found CDC

  • 1958 - Seymour Cray joins
  • 1959 - Seymour builds “little character” --

transistor-only machine

  • 1960 - Delivers first commercial xtr-based

machine to the US Navy.

slide-3
SLIDE 3

Control Data Corporation

  • 1960 - Cray begins work on 50x faster

machine, realized germanium is too slow. Switches to silicon xtrs from Fairchild.

  • 5 year plan: “to produce the largest

machine in the world”

  • 1 year plan: “be one-fifth of the way”
  • 1962 - Cray demands his own lab and

complete artistic freedom.

slide-4
SLIDE 4

Guiding Principles: Simplicity and Speed

  • Machines of the day had one CPU for

everything

  • It was big and slow.
  • The ISA was complex -- it did everything.
slide-5
SLIDE 5

The Case for Simplicity

  • The CDC6600 is after parallel execution of

instructions

  • Complex ISAs are hard to decode
  • This slows down every instruction.
  • Decode can be as complex as execution!
  • If the machine issues instructions in

parallel, decode must be faster than

slide-6
SLIDE 6

The Case for Simplicity

  • Complex ISAs mean long instructions
  • Lower code density
  • Poor fetch bandwidth would also impede

parallelism

  • Complex ISAs mean big, slow, expensive

ALUs

slide-7
SLIDE 7

Implementing simplicity

  • Simplify the ISA
  • Eliminate I/O and other ops from ISA
  • Just loads, stores, math, logic, and control.
  • Use a load/store ISA with (mostly) GP

registers

  • Relegate other operations to peripheral

processors.

slide-8
SLIDE 8

Instruction Encoding

slide-9
SLIDE 9

Implementing Speed: Parallelism

  • 10Mhz clock (minor cycles)
  • Build 10 functional units
  • add, multiply (x2), divide, long add, shift,

boolean, increment (x2), branch.

  • They are specialized, so they are small and

cheap

  • Centralized scoreboard issues instructions as

possible, up to 1 per minor cycle

  • No renaming -- dst names limit parallelism
slide-10
SLIDE 10

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

slide-11
SLIDE 11

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6

slide-12
SLIDE 12

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2

slide-13
SLIDE 13

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

slide-14
SLIDE 14

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0

slide-15
SLIDE 15

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0 t4 f0 f6 f6, f0

I1

slide-16
SLIDE 16

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0 t4 f0 f6 f6, f0

I1

t5 I4 f0 f8 f0, f8

slide-17
SLIDE 17

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0 t4 f0 f6 f6, f0

I1

t5 I4 f0 f8 f0, f8 t6 f8 f0 f0, f8

I3

slide-18
SLIDE 18

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0 t4 f0 f6 f6, f0

I1

t5 I4 f0 f8 f0, f8 t6 f8 f0 f0, f8

I3

t7 I5 f10 f8 f8, f10

slide-19
SLIDE 19

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0 t4 f0 f6 f6, f0

I1

t5 I4 f0 f8 f0, f8 t6 f8 f0 f0, f8

I3

t7 I5 f10 f8 f8, f10 t8 f8 f10 f8, f10

I5

slide-20
SLIDE 20

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0 t4 f0 f6 f6, f0

I1

t5 I4 f0 f8 f0, f8 t6 f8 f0 f0, f8

I3

t7 I5 f10 f8 f8, f10 t8 f8 f10 f8, f10

I5

t9 f8 f8

I4

slide-21
SLIDE 21

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0 t4 f0 f6 f6, f0

I1

t5 I4 f0 f8 f0, f8 t6 f8 f0 f0, f8

I3

t7 I5 f10 f8 f8, f10 t8 f8 f10 f8, f10

I5

t9 f8 f8

I4

t10 I6 f6 f6

slide-22
SLIDE 22

Adapted from Arvind and Asanovic’s MIT Course 6.823

I1

DIVD

f6, f6, f4 I2 LD f2, 45(r3) I3 MULTD f0, f2, f4 I4 DIVD f8, f6, f2 I5 SUBD f10, f0, f6 I6 ADDD f6, f8, f2

Functional Unit Status Registers Reserved

Int(1) Add(1) Mult(3) Div(4) WB

for Writes

t0 I1 f6 f6 t1 I2 f2 f6 f6, f2 t2 f6 f2 f6, f2

I2

t3 I3 f0 f6 f6, f0 t4 f0 f6 f6, f0

I1

t5 I4 f0 f8 f0, f8 t6 f8 f0 f0, f8

I3

t7 I5 f10 f8 f8, f10 t8 f8 f10 f8, f10

I5

t9 f8 f8

I4

t10 I6 f6 f6 t11 f6 f6

I6

slide-23
SLIDE 23

Implementing Speed: Instruction Fetch

  • Keep 32 recently executed instructions

around in a “stack”

  • Primitive I-Cache or trace cache.
  • Fetch from there inside tight loops.
  • This is a response in shift the balance point

between CPU and memory.

  • CPU is now faster.
slide-24
SLIDE 24

CDC6600 CPU

slide-25
SLIDE 25

Memory System

  • Main memory
  • 32 banks for 4096, 60-bit words = 960KB
  • 5 memory busses (“trunks”)
  • Bound tightly to address registers
  • 41MB/s of bandwidth
  • Simple segment-based memory translation
  • Support for relocation
slide-26
SLIDE 26

The Peripheral Processor(s)

  • Effectively OS co-processors -- They handle

IO

  • They do all the complex stuff that the CDC

banished from the CPU

  • They can be slow
slide-27
SLIDE 27

Fine-grain threading

  • There are 10 virtual peripheral processors
  • There is only one physical peripheral processor
  • A “barrel and slot” system provides virtualization
  • The barrel holds 10 sets of registers
  • It rotates one step each minor cycle (10Mhz)
  • Each VPP advances 1 step each major cycle

(1Mhz).

  • This is “vertical” multithreading.
slide-28
SLIDE 28

Specs

  • >400,000 transistors (compare to 8008)
  • 750 sq. ft. (my first house in SD was

smaller)

  • 5 tons
  • 150 kW
slide-29
SLIDE 29

CDC 6600

It came with its own chair: “The standard chair that came with the machine had orange vinyl covering and wooden armrests” Cooling

Wrong chair!

Line printers card readers

slide-30
SLIDE 30

Memory module

6in? (the paper is unclear)

slide-31
SLIDE 31

Logic Module

Test ports main connector Heat spreader freon

slide-32
SLIDE 32

IBM’s Dismay

  • Thomas Watson Jr., IBM CEO, August 1963:

“Last week, Control Data ... announced the 6600

  • system. I understand that in the laboratory developing

the system there are only 34 people including the

  • janitor. Of these, 14 are engineers and 4 are

programmers... Contrasting this modest effort with our vast development activities, I fail to understand why we have lost our industry leadership position by letting someone else offer the world's most powerful computer.” To which Cray replied: “It seems like Mr. Watson has answered his own question.”

slide-33
SLIDE 33

In Context

  • The CDD 6600 invents three things:
  • RISC design
  • Out-of-order execution
  • Multithreading