Chapters 8 (partial coverage) 1 Interfacing Processors and - - PowerPoint PPT Presentation
Chapters 8 (partial coverage) 1 Interfacing Processors and - - PowerPoint PPT Presentation
Chapters 8 (partial coverage) 1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance is complex: access latency throughput connection between devices and the
Interfacing Processors and Peripherals
- I/O Design affected by many factors (expandability, resilience)
- Performance is complex:
— access latency — throughput — connection between devices and the system — the memory hierarchy — the operating system
Processor Interrupts
2
Disk Disk Processor Cache Memory- I/O bus Main memory I/O controller I/O controller I/O controller Graphics
- utput
Network
I/O Devices
- Very diverse devices
— behavior (i.e., input vs. output) — partner (who is at the other end?) — data rate
3
I/O Example: Disk Drives
Example
Sector size: 512 bytes Average seek time: 6ms RPM: 10,000 Transfer rate: 50MB/sec Controller overhead is 0.2ms What’s the average time to read a sector for this disk?
4
- To access data:
— seek: position head over the proper track (3 to 14 ms. avg.) — rotational latency: wait for desired sector (.5 / RPM) — transfer: grab the data (one or more sectors) 30 to 80 MB/sec 0.5 0.5KB
10,000RPM 50MB/sec
6.0ms + + + 0.2ms = 6.0 + 3.0 + 0.01 + 0.2 = 9.2 ms
RAID: Redundant Arrays of Inexpensive Disks
- Improve both performance and availability of disk storage
– Performance:
- By replacing a large, expensive disk with many small,
inexpensive disks.
- Performance is improved because many disks can operate
at once. – Availability:
- Improve availability by adding redundancy with low cost.
5
- Improve availability by adding redundancy with low cost.
- !
- "#$!
RAID 0: No Redundancy
- Characteristics
6
– Striping: allocation of logically sequential blocks to separate disks – Striping data over multiple disks, automatically forces accesses to several disks, while appear to software as a single large disk.
- Advantages
– achieve higher performance than a single disk can deliver.
- Disadvantages
– No fault-tolerance because of no redundancy.
- Characteristic
– Mirroring: write identical data to multiple disks to increase data
RAID 1: Mirroring
7
– Mirroring: write identical data to multiple disks to increase data availability – Whenever data are written to one disk, those data are also written to redundant disks, so there are always multiple copies of the information. – If a disk fails, system just goes to the “mirrored” disk to get data.
- Advantage
– Simplest design.
- Disadvantage
– Most expensive since it requires the most disks (100% redundancy).
RAID 2: Error Detecting and Correcting Code
8
- Characteristic
– Use Hamming code as Error Correction Code on redundant disks – No one use it already, we won’t discuss it more.
- Characteristic
– Protection group: the group of disks that share a common check disk
RAID 3: Bit-Interleaving Parity
9
– Protection group: the group of disks that share a common check disk which stores Parity to restore lost data on a failure. – Each data block is striped on byte-level across all the disks in the protection group (so each Read/Write goes to all disks in the group). – Parity disk is updated on each write.
- Advantage
– Cost of redundancy is 1/N, (N: number of disks in a protection group) – Very high R/W transfer rate.
- Disadvantage
– Need fairly complex disk controller, to make all disk synchronized. – Cannot do multiple small Reads/Writes in parallel.
- Characteristic
RAID 4: Block-Interleaving Parity
10
- Characteristic
– Data are striped on block-level across all disks in the protection group. – Parity are updated on each write.
- Advantage
– Same ratio of redundancy (1/N) as RAID3 – High data transfer rate for large Read/Write access. – Allow multiple small reads to occur in parallel.
- Disadvantage
– Poor Small writes performance because of bottleneck on parity disk.
- Characteristic
RAID 5: Distributed Block-Interleaving Parity
11
- Characteristic
– Same as RAID4 except that “Parity information is spread throughout all the disks.”
- Advantage
– Low redundancy (1/N, same as RAID3, RAID4) – Good for large R/W – Allow multiple small R/W in parallel (no parity-write bottleneck).
- Disadvantage
– Complex disk controller to design
I/O Example: Buses
- What is a Bus?
- Types of buses:
— processor-memory — backplane — I/O
- Synchronous Bus vs. Asynchronous Bus
- Bus Arbitration:
12
Bus Arbitration: — daisy chain arbitration — centralized arbitration — collision detection
A Bus Is:
- shared communication link (one or more wires)
- single set of wires used to connect multiple subsystems
What is a bus?
13
- A Bus is also a fundamental tool for composing large,
complex systems
– systematic means of abstraction
Advantages of Bus
- Versatility: By defining a single scheme, devices can be added easily
- Cost-effective: A single set of wires is shared in multiple ways
Disadvantages of Bus
- It creates a communication bottleneck
– The bandwidth of that bus can limit the maximum I/O throughput
Buses
14
Challenge of bus design
- The maximum bus speed is largely limited by:
– The length of the bus – The number of devices on the bus
- The need to support a range of devices with:
– Widely varying latencies, Widely varying data transfer rates
- It is difficult to run many parallel wires at high speed
– the industry is in transition from parallel buses to high-speed serial point-to-point interconnections (or networks).
Types of Buses
- Processor-Memory Bus (design specific)
– A bus that connects processor and memory – Short and high speed – The bandwidth only needs to match the memory system so as to Maximize memory-to-processor bandwidth
- I/O Bus (industry standard)
– Usually is lengthy and slower – The bandwidth needs to match a wide range of I/O devices
15
– The bandwidth needs to match a wide range of I/O devices – Not connect to memory directly, but connect to the processor- memory bus or backplane bus – e.g., IDE bus, SCSI bus, USB bus.
- Backplane Bus
– Allow processors, memory, and I/O devices to coexist – Cost advantage: one bus for all components – often standardized, e.g., PCI
A Two-Bus System
Processor Memory I/O Bus
Processor Memory Bus
Bus Adaptor Bus Adaptor Bus Adaptor I/O Bus I/O Bus
16
- I/O buses tap into the processor-memory bus via bus adaptors to speed
match between bus types: – Processor-memory bus: mainly for processor-memory traffic – I/O buses: provide expansion slots for I/O devices
- Apple Macintosh-II
– NuBus: Processor, memory, and a few selected I/O devices – SCSI Bus: the rest of the I/O devices
A Three-Bus System (+ backside cache)
Processor Memory
Processor Memory Bus
Bus Adaptor Bus Adaptor Bus I/O Bus Backside Cache bus I/O Bus
L2 Cache Backplan
17
- A small number of backplane buses tap into the processor-memory bus
– Processor-memory bus focus on traffic to/from memory – I/O buses are connected to the backplane bus
- Advantage:
– loading on the processor bus is reduced & busses run at different speeds
Adaptor
ane
Bus
- Synchronous Bus:
– Includes a clock in the control lines – A fixed protocol for communication that is relative to the clock – Advantage: involves very little logic and can run very fast – Disadvantages:
- Every device on the bus must run at the same clock rate
- To avoid clock skew, they cannot be long if they are fast
Synchronous and Asynchronous Bus
18
- To avoid clock skew, they cannot be long if they are fast
– Processor-memory buses are often synchronous
- Asynchronous Bus:
– It is not clocked – It can accommodate a wide range of devices – It can be lengthened without worrying about clock skew – It requires a handshaking protocol – Example: USB2.0
- One of the most important issues in bus design:
– How is the bus reserved by a device that wishes to use it?
- Chaos is avoided by a master-slave arrangement:
Bus Master Bus Slave Control: Master initiates requests Data can go either way
Arbitration: Obtaining Access to the Bus
19
- Chaos is avoided by a master-slave arrangement:
– Only the bus master can control access to the bus:
- It initiates and controls all bus requests
– A slave responds to read and write requests
- The simplest system:
– Processor is the only bus master – All bus requests must be controlled by the processor – Major drawback: the processor is involved in every transaction
Bus Arbitration
- Multiple Potential Bus Masters: the Need for Arbitration
- Bus arbitration schemes usually try to balance two factors:
– Bus priority: the highest priority device should be serviced first – Fairness: Even the lowest priority device should never be completely locked out from the bus
- Bus arbitration schemes:
– Daisy chain arbitration (not very fair)
20
– Daisy chain arbitration (not very fair) – Centralized, parallel arbitration (e.g., PCI ) – Distributed arbitration by collision detection ( e.g., Ethernet )
- Each device just “goes for it”.
- Detect if any collision.
- If collision found, wait some time and try again..
The Daisy Chain Bus Arbitrations Scheme
Bus Arbiter Device 1 Highest Priority Device N Lowest Priority Device 2 Grant Grant Grant Release Request
21
- Advantage:
– simple
- Disadvantages:
– Cannot assure fairness: A low-priority device may be locked out indefinitely – The use of the daisy chain grant signal also limits the bus speed wired-OR
- when a device wants to use the bus during the next cycle, it
activates the bus request line – note that multiple devices can request at once
- the arbiter then activates bus grant, which is passed along through
the daisy chain to the first device with an active request
- that device then activates the select acknowledge, indicating that it
has been selected to use the bus when it's next available for a data
Bus Arbitration Schemes: The Daisy Chain 22
has been selected to use the bus when it's next available for a data transfer
- all devices then remove their requests
- the arbiter removes the grant
- if (or when) the bus is not busy, the selected device will transfer data
(on a separate set of wires) and de-activate select acknowledge to indicate that arbitration can start again
Bus Arbitration Scheme: Centralized Parallel Arbitration
Bus Arbiter Device 1 Device N Device 2 Grant
Req Req Req
23
- Used in essentially all processor-memory busses and in high-speed
I/O busses
- Require a bus arbiter
Interfacing I/O devices
- Giving commands to I/O devices
– Memory-mapped I/O
- Portions of address space are assigned to I/O devices.
- Reads and Writes to these addresses are interpreted as commands
to the I/O device.
– I/O instructions
- A dedicated instruction used to give a command to an I/O device
24
- A dedicated instruction used to give a command to an I/O device
- Communicating with the processor
– Polling
- The process of periodically checking the status of an I/O device to
determine the need to service the device
- Disadvantages: it wastes a lot of processor time
– Interrupt-driven I/O
- An I/O scheme that uses interrupts to notify the processor when an
I/O device needs attention.
What is DMA (Direct Memory Access)?
- Transferring data between a Device and Memory
– Processor handle the data transfer between a device and memory – DMA (Direct Memory Access)
- It gives external device ability to write memory directly without
involving the processor
25
involving the processor
- Much lower overhead than having processor request one word
at a time
- The interrupt mechanism is used to notify processor only on
completion of the I/O transfer or error occurs.
- Issue: Cache coherence
– What is devices write data that is currently in processor Cache? – Solutions: flush cache on every I/O operation (expensive) or have hardware invalidate cache lines
DMA Transfer
- DMAC (DAM Controller)
– A specialized controller for DMA. – The bus master during DMA transfer. Direct R/W between itself and memory.
- DMA transfer steps:
– Processor setup DMAC by supplying
- Identity of the device (I/O device address)
- Operation to perform (I/O mem or memI/O)
- memory address that is the source or destination of data to be transferred
- Number of bytes to transfer
26
- Number of bytes to transfer
– DMAC starts operation on the device and arbitrates for the bus
I/O
- Memory
- 1. check device to see if data is ready?
if yes
- goto 2, else
- goto 1;
- 2. read data into DMAC’s register;
- 3. write data to memory M-addrs;
during 2,3: inc. M-addrs; dec. Count;
- 4. if count = 0 finish DMA, return control
to CPU, else goto 1;
Memory
- I/O
- 1. read data from M-addrs into DMAC’s
register; increment M-addrs;
- 2. check if I/O device ready?
if yes
- 3. else
- 2.
- 3. write DMAC’s register data to I/O device;
decrement Count
- 4. if count = 0 finish DMA, return control to
CPU else goto 1
I/O Bus Standards
- Today we have two dominant bus standards:
27
Pentium 4
- I/O Options
Parallel ATA (100 MB/sec) CSA (0.266 GB/sec) AGP 8X (2.1 GB/sec) Serial ATA (150 MB/sec) Pentium 4 processor 1 Gbit Ethernet Memory controller hub (north bridge) 82875P Main memory DIMMs DDR 400 (3.2 GB/sec) DDR 400 (3.2 GB/sec) Graphics
- utput
(266 MB/sec) System bus (800 MHz, 604 GB/sec)
28
(100 MB/sec) Parallel ATA (100 MB/sec) (20 MB/sec) PCI bus (132 MB/sec) (150 MB/sec) Disk Serial ATA (150 MB/sec) Disk AC/97 (1 MB/sec) Stereo (surround- sound) USB 2.0 (60 MB/sec) . . . I/O controller hub (south bridge) 82801EB CD/DVD Tape 10/100 Mbit Ethernet
Fallacies and Pitfalls
- Fallacy: the rated mean time to failure of disks is 1,200,000 hours,
so disks practically never fail.
- Fallacy: magnetic disk storage is on its last legs, will be replaced.
- Fallacy: A 100 MB/sec bus can transfer 100 MB/sec.
- Pitfall: Moving functions from the CPU to the I/O processor,