SLIDE 3 Memory Latency and Short Vectors
VLD v1 VMUL v2,v1,r1 VLD v3 VMUL v4,v3,r2 Instruction Execution in Time VMUL v2,v1,r1 VLD v3 Address VLD v1 Address VLD v1 Data VLD v3 Data VMUL v4,v3,r2 Memory Latency Vector Instruction Sequence
Cray-style
VLD v1 Address VMUL v2,v1,r1 VLD v1 Data VLD v3 Address VLD v3 Data VMUL v4,v3,r2
Decoupled Pipeline
Enqueue VMUL Mem Idle Mult Idle Addr.Gen. Data Bus Multiplier Addr.Gen. Data Bus Multiplier Enqueue VLD data
Decoupled Pipeline Issues
Latencies:
Decoupling hides memory latency in most cases but exposes latency in others.
Exceptions:
- IEEE Floating-Point
- Page Faults for Virtual Memory
F D X M W Instruction Queues Scalar Pipe A Vector Load Pipe W R X X X W Vector Arithmetic Pipe
Scalar Unit Reads of Vector Unit State Scatter/Gather Indices Load/Store Masks Memory Latency