Chapters 8 (partial coverage) 1 Interfacing Processors and - - PowerPoint PPT Presentation

chapters 8
SMART_READER_LITE
LIVE PREVIEW

Chapters 8 (partial coverage) 1 Interfacing Processors and - - PowerPoint PPT Presentation

Chapters 8 (partial coverage) 1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance is complex: access latency throughput connection between devices and the


slide-1
SLIDE 1

Chapters 8

1

(partial coverage)

slide-2
SLIDE 2

Interfacing Processors and Peripherals

  • I/O Design affected by many factors (expandability, resilience)
  • Performance is complex:

— access latency — throughput — connection between devices and the system — the memory hierarchy — the operating system

Processor Interrupts

2

Disk Disk Processor Cache Memory- I/O bus Main memory I/O controller I/O controller I/O controller Graphics

  • utput

Network

slide-3
SLIDE 3

I/O Devices

  • Very diverse devices

— behavior (i.e., input vs. output) — partner (who is at the other end?) — data rate

3

slide-4
SLIDE 4

I/O Example: Disk Drives

Example

Sector size: 512 bytes Average seek time: 6ms RPM: 10,000 Transfer rate: 50MB/sec Controller overhead is 0.2ms What’s the average time to read a sector for this disk?

4

  • To access data:

— seek: position head over the proper track (3 to 14 ms. avg.) — rotational latency: wait for desired sector (.5 / RPM) — transfer: grab the data (one or more sectors) 30 to 80 MB/sec 0.5 0.5KB

10,000RPM 50MB/sec

6.0ms + + + 0.2ms = 6.0 + 3.0 + 0.01 + 0.2 = 9.2 ms

slide-5
SLIDE 5

RAID: Redundant Arrays of Inexpensive Disks

  • Improve both performance and availability of disk storage

– Performance:

  • By replacing a large, expensive disk with many small,

inexpensive disks.

  • Performance is improved because many disks can operate

at once. – Availability:

  • Improve availability by adding redundancy with low cost.

5

  • Improve availability by adding redundancy with low cost.
  • !
  • "#$!
slide-6
SLIDE 6

RAID 0: No Redundancy

  • Characteristics

6

– Striping: allocation of logically sequential blocks to separate disks – Striping data over multiple disks, automatically forces accesses to several disks, while appear to software as a single large disk.

  • Advantages

– achieve higher performance than a single disk can deliver.

  • Disadvantages

– No fault-tolerance because of no redundancy.

slide-7
SLIDE 7
  • Characteristic

– Mirroring: write identical data to multiple disks to increase data

RAID 1: Mirroring

7

– Mirroring: write identical data to multiple disks to increase data availability – Whenever data are written to one disk, those data are also written to redundant disks, so there are always multiple copies of the information. – If a disk fails, system just goes to the “mirrored” disk to get data.

  • Advantage

– Simplest design.

  • Disadvantage

– Most expensive since it requires the most disks (100% redundancy).

slide-8
SLIDE 8

RAID 2: Error Detecting and Correcting Code

8

  • Characteristic

– Use Hamming code as Error Correction Code on redundant disks – No one use it already, we won’t discuss it more.

slide-9
SLIDE 9
  • Characteristic

– Protection group: the group of disks that share a common check disk

RAID 3: Bit-Interleaving Parity

9

– Protection group: the group of disks that share a common check disk which stores Parity to restore lost data on a failure. – Each data block is striped on byte-level across all the disks in the protection group (so each Read/Write goes to all disks in the group). – Parity disk is updated on each write.

  • Advantage

– Cost of redundancy is 1/N, (N: number of disks in a protection group) – Very high R/W transfer rate.

  • Disadvantage

– Need fairly complex disk controller, to make all disk synchronized. – Cannot do multiple small Reads/Writes in parallel.

slide-10
SLIDE 10
  • Characteristic

RAID 4: Block-Interleaving Parity

10

  • Characteristic

– Data are striped on block-level across all disks in the protection group. – Parity are updated on each write.

  • Advantage

– Same ratio of redundancy (1/N) as RAID3 – High data transfer rate for large Read/Write access. – Allow multiple small reads to occur in parallel.

  • Disadvantage

– Poor Small writes performance because of bottleneck on parity disk.

slide-11
SLIDE 11
  • Characteristic

RAID 5: Distributed Block-Interleaving Parity

11

  • Characteristic

– Same as RAID4 except that “Parity information is spread throughout all the disks.”

  • Advantage

– Low redundancy (1/N, same as RAID3, RAID4) – Good for large R/W – Allow multiple small R/W in parallel (no parity-write bottleneck).

  • Disadvantage

– Complex disk controller to design

slide-12
SLIDE 12

I/O Example: Buses

  • What is a Bus?
  • Types of buses:

— processor-memory — backplane — I/O

  • Synchronous Bus vs. Asynchronous Bus
  • Bus Arbitration:

12

Bus Arbitration: — daisy chain arbitration — centralized arbitration — collision detection

slide-13
SLIDE 13

A Bus Is:

  • shared communication link (one or more wires)
  • single set of wires used to connect multiple subsystems

What is a bus?

13

  • A Bus is also a fundamental tool for composing large,

complex systems

– systematic means of abstraction

slide-14
SLIDE 14

Advantages of Bus

  • Versatility: By defining a single scheme, devices can be added easily
  • Cost-effective: A single set of wires is shared in multiple ways

Disadvantages of Bus

  • It creates a communication bottleneck

– The bandwidth of that bus can limit the maximum I/O throughput

Buses

14

Challenge of bus design

  • The maximum bus speed is largely limited by:

– The length of the bus – The number of devices on the bus

  • The need to support a range of devices with:

– Widely varying latencies, Widely varying data transfer rates

  • It is difficult to run many parallel wires at high speed

– the industry is in transition from parallel buses to high-speed serial point-to-point interconnections (or networks).

slide-15
SLIDE 15

Types of Buses

  • Processor-Memory Bus (design specific)

– A bus that connects processor and memory – Short and high speed – The bandwidth only needs to match the memory system so as to Maximize memory-to-processor bandwidth

  • I/O Bus (industry standard)

– Usually is lengthy and slower – The bandwidth needs to match a wide range of I/O devices

15

– The bandwidth needs to match a wide range of I/O devices – Not connect to memory directly, but connect to the processor- memory bus or backplane bus – e.g., IDE bus, SCSI bus, USB bus.

  • Backplane Bus

– Allow processors, memory, and I/O devices to coexist – Cost advantage: one bus for all components – often standardized, e.g., PCI

slide-16
SLIDE 16

A Two-Bus System

Processor Memory I/O Bus

Processor Memory Bus

Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus

16

  • I/O buses tap into the processor-memory bus via bus adaptors to speed

match between bus types: – Processor-memory bus: mainly for processor-memory traffic – I/O buses: provide expansion slots for I/O devices

  • Apple Macintosh-II

– NuBus: Processor, memory, and a few selected I/O devices – SCSI Bus: the rest of the I/O devices

slide-17
SLIDE 17

A Three-Bus System (+ backside cache)

Processor Memory

Processor Memory Bus

Bus Adaptor Bus Adaptor Bus I/O Bus Backside Cache bus I/O Bus

L2 Cache Backplan

17

  • A small number of backplane buses tap into the processor-memory bus

– Processor-memory bus focus on traffic to/from memory – I/O buses are connected to the backplane bus

  • Advantage:

– loading on the processor bus is reduced & busses run at different speeds

Adaptor

ane

Bus

slide-18
SLIDE 18
  • Synchronous Bus:

– Includes a clock in the control lines – A fixed protocol for communication that is relative to the clock – Advantage: involves very little logic and can run very fast – Disadvantages:

  • Every device on the bus must run at the same clock rate
  • To avoid clock skew, they cannot be long if they are fast

Synchronous and Asynchronous Bus

18

  • To avoid clock skew, they cannot be long if they are fast

– Processor-memory buses are often synchronous

  • Asynchronous Bus:

– It is not clocked – It can accommodate a wide range of devices – It can be lengthened without worrying about clock skew – It requires a handshaking protocol – Example: USB2.0

slide-19
SLIDE 19
  • One of the most important issues in bus design:

– How is the bus reserved by a device that wishes to use it?

  • Chaos is avoided by a master-slave arrangement:

Bus Master Bus Slave Control: Master initiates requests Data can go either way

Arbitration: Obtaining Access to the Bus

19

  • Chaos is avoided by a master-slave arrangement:

– Only the bus master can control access to the bus:

  • It initiates and controls all bus requests

– A slave responds to read and write requests

  • The simplest system:

– Processor is the only bus master – All bus requests must be controlled by the processor – Major drawback: the processor is involved in every transaction

slide-20
SLIDE 20

Bus Arbitration

  • Multiple Potential Bus Masters: the Need for Arbitration
  • Bus arbitration schemes usually try to balance two factors:

– Bus priority: the highest priority device should be serviced first – Fairness: Even the lowest priority device should never be completely locked out from the bus

  • Bus arbitration schemes:

– Daisy chain arbitration (not very fair)

20

– Daisy chain arbitration (not very fair) – Centralized, parallel arbitration (e.g., PCI ) – Distributed arbitration by collision detection ( e.g., Ethernet )

  • Each device just “goes for it”.
  • Detect if any collision.
  • If collision found, wait some time and try again..
slide-21
SLIDE 21

The Daisy Chain Bus Arbitrations Scheme

Bus Arbiter Device 1 Highest Priority Device N Lowest Priority Device 2 Grant Grant Grant Release Request

21

  • Advantage:

– simple

  • Disadvantages:

– Cannot assure fairness: A low-priority device may be locked out indefinitely – The use of the daisy chain grant signal also limits the bus speed wired-OR

slide-22
SLIDE 22
  • when a device wants to use the bus during the next cycle, it

activates the bus request line – note that multiple devices can request at once

  • the arbiter then activates bus grant, which is passed along through

the daisy chain to the first device with an active request

  • that device then activates the select acknowledge, indicating that it

has been selected to use the bus when it's next available for a data

Bus Arbitration Schemes: The Daisy Chain 22

has been selected to use the bus when it's next available for a data transfer

  • all devices then remove their requests
  • the arbiter removes the grant
  • if (or when) the bus is not busy, the selected device will transfer data

(on a separate set of wires) and de-activate select acknowledge to indicate that arbitration can start again

slide-23
SLIDE 23

Bus Arbitration Scheme: Centralized Parallel Arbitration

Bus Arbiter Device 1 Device N Device 2 Grant

Req Req Req

23

  • Used in essentially all processor-memory busses and in high-speed

I/O busses

  • Require a bus arbiter
slide-24
SLIDE 24

Interfacing I/O devices

  • Giving commands to I/O devices

– Memory-mapped I/O

  • Portions of address space are assigned to I/O devices.
  • Reads and Writes to these addresses are interpreted as commands

to the I/O device.

– I/O instructions

  • A dedicated instruction used to give a command to an I/O device

24

  • A dedicated instruction used to give a command to an I/O device
  • Communicating with the processor

– Polling

  • The process of periodically checking the status of an I/O device to

determine the need to service the device

  • Disadvantages: it wastes a lot of processor time

– Interrupt-driven I/O

  • An I/O scheme that uses interrupts to notify the processor when an

I/O device needs attention.

slide-25
SLIDE 25

What is DMA (Direct Memory Access)?

  • Transferring data between a Device and Memory

– Processor handle the data transfer between a device and memory – DMA (Direct Memory Access)

  • It gives external device ability to write memory directly without

involving the processor

25

involving the processor

  • Much lower overhead than having processor request one word

at a time

  • The interrupt mechanism is used to notify processor only on

completion of the I/O transfer or error occurs.

  • Issue: Cache coherence

– What is devices write data that is currently in processor Cache? – Solutions: flush cache on every I/O operation (expensive) or have hardware invalidate cache lines

slide-26
SLIDE 26

DMA Transfer

  • DMAC (DAM Controller)

– A specialized controller for DMA. – The bus master during DMA transfer. Direct R/W between itself and memory.

  • DMA transfer steps:

– Processor setup DMAC by supplying

  • Identity of the device (I/O device address)
  • Operation to perform (I/O mem or memI/O)
  • memory address that is the source or destination of data to be transferred
  • Number of bytes to transfer

26

  • Number of bytes to transfer

– DMAC starts operation on the device and arbitrates for the bus

I/O

  • Memory
  • 1. check device to see if data is ready?

if yes

  • goto 2, else
  • goto 1;
  • 2. read data into DMAC’s register;
  • 3. write data to memory M-addrs;

during 2,3: inc. M-addrs; dec. Count;

  • 4. if count = 0 finish DMA, return control

to CPU, else goto 1;

Memory

  • I/O
  • 1. read data from M-addrs into DMAC’s

register; increment M-addrs;

  • 2. check if I/O device ready?

if yes

  • 3. else
  • 2.
  • 3. write DMAC’s register data to I/O device;

decrement Count

  • 4. if count = 0 finish DMA, return control to

CPU else goto 1

slide-27
SLIDE 27

I/O Bus Standards

  • Today we have two dominant bus standards:

27

slide-28
SLIDE 28

Pentium 4

  • I/O Options

Parallel ATA (100 MB/sec) CSA (0.266 GB/sec) AGP 8X (2.1 GB/sec) Serial ATA (150 MB/sec) Pentium 4 processor 1 Gbit Ethernet Memory controller hub (north bridge) 82875P Main memory DIMMs DDR 400 (3.2 GB/sec) DDR 400 (3.2 GB/sec) Graphics

  • utput

(266 MB/sec) System bus (800 MHz, 604 GB/sec)

28

(100 MB/sec) Parallel ATA (100 MB/sec) (20 MB/sec) PCI bus (132 MB/sec) (150 MB/sec) Disk Serial ATA (150 MB/sec) Disk AC/97 (1 MB/sec) Stereo (surround- sound) USB 2.0 (60 MB/sec) . . . I/O controller hub (south bridge) 82801EB CD/DVD Tape 10/100 Mbit Ethernet

slide-29
SLIDE 29

Fallacies and Pitfalls

  • Fallacy: the rated mean time to failure of disks is 1,200,000 hours,

so disks practically never fail.

  • Fallacy: magnetic disk storage is on its last legs, will be replaced.
  • Fallacy: A 100 MB/sec bus can transfer 100 MB/sec.
  • Pitfall: Moving functions from the CPU to the I/O processor,

expecting to improve performance without analysis.

29

expecting to improve performance without analysis.