Parallel Algorithms: Theory and Practice
Race
CS26 S260 – Lecture cture 9* Yan n Gu
Race Why is parallelism hard? Non-determinism!! Practice Theory - - PowerPoint PPT Presentation
Parallel Algorithms: Theory and Practice CS26 S260 Lecture cture 9* Yan n Gu Race Why is parallelism hard? Non-determinism!! Practice Theory 2 Why is parallelism hard? Non-determinism!! Sc Schedu duli ling ng is
CS26 S260 – Lecture cture 9* Yan n Gu
2
Theory Practice
3
Schedu duli ling ng is is unkno nown wn
lativ ive e orderin ing g for operatio tions ns is is un unknown nown
debug ug
4
iation tion therapy py mach chin ine e — kil ille led 3 3 people le and d serio iously usly in injur ured ed many y more (betwee ween n 1985 a and d 1987).
https:// //en.wiki wikiped pedia. a.org/ rg/wiki wiki/Th /Therac rac-25 25
ican n Bla lackout ut of 2003 2003 — le left 50 m mil illi lion n peopl ple e wit ithout ut power for up to a a w week. .
https ps:// //en. en.wiki kiped pedia.org
/wiki ki/N /North
east st_bl blacko ckout_ t_of_
003 003
ce bugs s are notorio
usly y difficult icult to discover cover by y co conventional nventional testing! ting!
inition: n: a deter ermina minacy cy race occurs when n two lo logi gicall lly para ralle llel l in instru ructi ctions
ccess ss the same me memo mory ry lo loca catio ion n and at le least t one of the in instru tructio tions ns perform
s a w writ ite.
5
direct_reduce(A, n) { parallel_for (i=0;i<n;i++) sum = sum + 1; return sum; }
inition: n: a deter ermina minacy cy race occurs when n two lo logi gicall lly para ralle llel l in instru ructi ctions
ccess ss the same me memo mory ry lo loca catio ion n and at le least t one of the in instru tructio tions ns perform
s a w writ ite.
6
sum = 0 r0 = sum r0 += 1 sum = r0 r1 = sum r1 += 1 sum = r1 Return sum direct_reduce(A, n) { parallel_for (i=0;i<2;i++) sum = sum + 1; return sum; }
inition: n: a deter ermina minacy cy race occurs when n two lo logi gicall lly para ralle llel l in instru ructi ctions
ccess ss the same me memo mory ry lo loca catio ion n and at le least t one of the in instru tructio tions ns perform
s a w writ ite.
7
sum = 0 r0 = sum r0 += 1 sum = r0 r1 = sum r1 += 1 sum = r1 return sum 1 2 3 4 5 6 direct_reduce(A, n) { parallel_for (i=0;i<2;i++) sum = sum + 1; return sum; }
inition: n: a deter ermina minacy cy race occurs when n two lo logi gicall lly para ralle llel l in instru ructi ctions
ccess ss the same me memo mory ry lo loca catio ion n and at le least t one of the in instru tructio tions ns perform
s a w writ ite.
8
direct_reduce(A, n) { parallel_for (i=0;i<2;i++) sum = sum + 1; return sum; } sum = 0 r0 = sum r0 += 1 sum = r0 r1 = sum r1 += 1 sum = r1 return sum 1 2 5 3 4 6
Suppose
instruct uction ion A and in instruct uction ion B both access s a lo loca catio ion n x, and suppose ppose that t A∥B (A is parallel to B).
section ions s of code are in indepe epend ndent ent if if they y have e no determi rminac nacy y races s between een them. m.
9
A B Race Type Read Read No race Read Write Read race Write Read Read race Write Write Write race
rations ions of a para rall llel_for el_for lo loop shoul uld d be in indepen ependent dent
een two in in_pa paral alle lel tasks, s, the code of the spaw awned ned chil ild shoul uld d be in indepe epend ndent ent of th the code of the parent nt, , in inclu ludi ding ng code execut cuted ed by addi dition ional al spawne wned d or ca call lled chil ildr dren en
10
Schedu duli ling ng is is stil ill l unkno nown wn
Rela lativ ive e ord rderi ring ng for r opera ratio tions ns is is st stil ill l unkno nown wn
er, , the comput uted ed valu lue e of ea each in instruc uction tion is is determi rminist nistic! ic! This is is is ea easy y to d debug. ug.
tion: : gi given a D DAG, G, show w all ll th the races
lse e sharing ring: : nast sty y rela lated ed effec fect
but can be inefficient
11
Struct { char a, b; } x;