1
Lecture 12: SIMD-machines & data parallelism, dependency analysis for automatic vectorizing and parallelizing of serial program Part 1
2
SIMD Single Instruction Multiple Data
Parallelism through simultaneous operations on different data
Fine grain parallelism
- Systolic arrays
- Parallel SIMD machines
– 10k++ processors
- Vector/Pipeline units
3
Memory
Systolic Array
- Network of “processors”, memory around
– Performance by doing all computations before restoring
- Often hardware implementations solving one problem
– Special topologies
4
SIMD Machine
- Front End
– Normal von Neuman – Runs the application program
- Processor array
– Synchronous – The same operation at the same time or idle – Extends the FPU:s instructions – Small memory/processor – Smart memory – I/O
- Example
– ILLIAC IV, IBM GF 11, Maspar, CM200(Bellman 16k) Host Controller
5
Data Parallell Programming
- Idea: update the elements of an array at the same time
- Divides the work between the programmer and the compiler
- The programmers solves the problem in their model
– Concentrates on structure and concepts on a hight level – Collective operations on large data structures – Keeps data in large arrays with mapping information
- The compiler maps the program on a physical machine
– Fills in all the details (gladly receives hints from the user) – Optimizes computations and communications
6
Building Blocks in Data Parallel Programming
- The user controls the placing of data on processors
– Minimize communication: keep all processors busy
- Operations on whole arrays
– Apply one operation on each element in the array in parallel
- Methods to access parts of an array
- Operations can operate on these parts
– Example: element < 0 ⇒ element := 1
- Reduction operations on arrays
– produces a result from a combination of many array elements: sum, max, min, ...
- Shift operations along the axis on multidimensional arrays
- Scan-operations
– prefix/suffix-operations
- Generalized communication