Big Data Processing Technologies
Chentao Wu Associate Professor
- Dept. of Computer Science and Engineering
Big Data Processing Technologies Chentao Wu Associate Professor - - PowerPoint PPT Presentation
Big Data Processing Technologies Chentao Wu Associate Professor Dept. of Computer Science and Engineering wuct@cs.sjtu.edu.cn Schedule lec1: Introduction on big data and cloud computing Iec2: Introduction on data storage lec3: Data
Contents
registers
cache (SRAM) main memory (DRAM) local secondary storage (local disks) Larger, slower, and cheaper (per byte) storage devices remote secondary storage (tapes, distributed file systems, Web servers)
Local disks hold files retrieved from disks on remote network servers. Main memory holds disk blocks retrieved from local disks.
cache (SRAM)
L1 cache holds cache lines retrieved from the L2 cache memory. CPU registers hold words retrieved from L1 cache. L2 cache holds cache lines retrieved from main memory.
L0: L1: L2: L3: L4: L5: Smaller, faster, and costlier (per byte) storage devices
Spindle Arm Actuator Platters Electronics SCSI connector
Image courtesy of Seagate Technology
Magnetic Transition
spindle surface tracks track k sectors gaps
surface 0 surface 1 surface 2 surface 3 surface 4 surface 5 cylinder k spindle platter 0 platter 1 platter 2
Read/Write Head Upper Surface Platter Lower Surface Cylinder Track Sector Arm Actuator
After BLUE read
After BLUE read
After BLUE read Seek for RED
After BLUE read Seek for RED Rotational latency
After BLUE read Seek for RED Rotational latency After RED read
After BLUE read Seek for RED Rotational latency After RED read
Contents
RAID Controller
Hard Disks Logical Array (RAID Sets) RAID Array Host
RAID Controller
Host
Stripe Strip
Host
Block 0
RAID Controller
Block 0 Block 0
RAID Controller
D1 D2 D3 D4 P
4 6 1 7 18
Host
Actual parity calculation is a bitwise XOR operation
Host
4 + 6 + ? + 7 = 18 ? = 18 – 4 – 6 – 7 ? = 1
Regeneration of data when Drive D3 fails: D1 D2 D3 D4 P
4 6 ? 7 18 RAID Controller
A PCI-bus-based, IDE/ATA hard disk RAID controller, supporting levels 0, 1, and 01.
Hot spare Failed disk Replace failed disk
RAID Controller