✐❚❤r❡❛❞s✿ ❆ ❚❤r❡❛❞✐♥❣ ▲✐❜r❛r② ❢♦r P❛r❛❧❧❡❧ ■♥❝r❡♠❡♥t❛❧ ❈♦♠♣✉t❛t✐♦♥ P❛♣❡r ❘❡❛❞✐♥❣ ●r♦✉♣ Pr❛♠♦❞ ❇❤❛t♦t✐❛ P❡❞r♦ ❋♦♥s❡❝❛ ❯♠✉t ❆✳ ❆❝❛r ❇❥⑧ ♦r♥ ❇✳ ❇r❛♥❞❡♥❜✉r❣ ❘♦❞r✐❣♦ ❘♦❞r✐❣✉❡s Pr❡s❡♥ts✿ ▼❛❦s②♠ P❧❛♥❡t❛ ✵✾✳✵✼✳✷✵✶✺
❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts ▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥
●♦❛❧s ▼❛❦❡ ✐♥❝r❡♠❡♥t❛❧ ❝♦♠♣✉t❛t✐♦♥s ❡❛s② t♦ ✉s❡✿ ■ ❈♦♥✈❡♥✐❡♥t ❢♦r ✉s❡r ■ ▼✉❧t✐t❤r❡❛❞❡❞ ❡♥✈✐r♦♥♠❡♥t ■ ▲❡❣❛❝② s✉♣♣♦rt ■ ❯s❡ ❡①✐st✐♥❣ ❖❙ ❢❛❝✐❧✐t✐❡s ■ ▲❛♥❣✉❛❣❡ ✐♥❞❡♣❡♥❞❡♥t ■ ●❡♥❡r✐❝ ♣r♦❣r❛♠ ♠♦❞❡❧ ■ ◆♦ ♣r♦❣r❛♠♠❡r ✐♥t❡r✈❡♥t✐♦♥ ■ ▲♦✇ ♦✈❡r❤❡❛❞
❲♦r❦✌♦✇ ✶✳ ■♥✐t❛❧ r✉♥ ✷✳ ❇✉✐❧❞ ❈♦♥❝✉rr❡♥t ❉②♥❛♠✐❝ ❉❡♣❡♥❞❡♥❝❡ ●r❛♣❤ ✭❈❉❉●✮ ✸✳ ❙♣❡❝✐❢② ✐♥♣✉t ❝❤❛♥❣❡s ✹✳ ■♥❝r❡♠❡♥t❛❧ r✉♥ ✉s❡s ❝❤❛♥❣❡ ♣r♦♣❛❣❛t✐♦♥ ✺✳ ❯♣❞❛t❡ ❈❉❉●
❲♦r❦✌♦✇ ✶✳ ■♥✐t❛❧ r✉♥ ✷✳ ❇✉✐❧❞ ❈♦♥❝✉rr❡♥t ❉②♥❛♠✐❝ ❉❡♣❡♥❞❡♥❝❡ ●r❛♣❤ ✭❈❉❉●✮ ✸✳ ❙♣❡❝✐❢② ✐♥♣✉t ❝❤❛♥❣❡s ✹✳ ■♥❝r❡♠❡♥t❛❧ r✉♥ ✉s❡s ❝❤❛♥❣❡ ♣r♦♣❛❣❛t✐♦♥ ✺✳ ❯♣❞❛t❡ ❈❉❉● $ LD PRELOAD=iThreads.so // preload iThreads $ ./<program executable> <input-file> // initial run $ emacs <input-file> // input modified $ echo "<off> <len>" >> changes.txt // specify changes $ ./<program executable> <input-file> // incremental run Figure 1. How to run an executable using iThreads 646
❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts ▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥
❙②st❡♠ ♠♦❞❡❧ ■ ▼❡♠♦r② ♠♦❞❡❧ ■ ❘❡❧❡❛s❡ ❝♦♥s✐st❡♥❝② ■ ❙②♥❝❤r♦♥✐③❛t✐♦♥ ♠♦❞❡❧ ■ ♣t❤r❡❛❞s ❆P■ ■ ❉❡t❡r♠✐♥✐st✐❝ ❜❡❤❛✈✐♦r
❚❤✉♥❦ ■ ❯♥✐t ♦❢ s❡q✉❡♥t✐❛❧ ❡①❡❝✉t✐♦♥ ■ ❙✉rr♦✉♥❞❡❞ ❜② s②♥❝❤r♦♥✐③❛t✐♦♥ ♦♣❡r❛t✐♦♥s ■ ❙t❛t❡ ■ ❘❡❛❞ ❛♥❞ ✇r✐t❡ s❡ts ■ ❈❛✉s❛❧❧② ♦r❞❡r❡❞ ✭✈❡❝t♦r ❝❧♦❝❦s✮ ■ ❚❤✉♥❦ r❡❝♦♠♣✉t❡❞ ✮ ❆❧❧ t❤✉♥❦s ✐♥ t❤❡ t❤r❡❛❞ r❡❝♦♠♣✉t❡❞ Unresolved Resolved 3 Reused and applied Resolved 1 Enabled memoized effects valid 2 Pending 5 Resolved Re-executed and 4 Invalid invalid modified dirty set Figure 4. State transition for thunks during incremental run 650
❊①❛♠♣❧❡ Sub-computations Case Input Thread schedule Reused Recomputed T 1 .a → T 2 .a → T 2 .b A x , y *, z T 2 .a T 1 .a , T 2 .b B x, y, z ( T 2 .a → T 2 .b → T 1 .a )* T 2 .a T 1 .a , T 2 .b x, y, z T 1 .a → T 2 .a → T 2 .b T 1 .a , T 1 .b , T 2 .a — C Figure 3. For the incremental run, some cases with changed input or thread schedule (changes are marked with *) Thread 1 ( T 1 ) Thread 2 ( T 2 ) /* T 1 .a */ lock() ; read = { y } z = ++y; write = { y, z } unlock() ; ց lock() ; /* T 2 .a */ x++; read = { x } unlock() ; write = { x } ↓ lock() ; /* T 2 .b */ y = 2*x + z; read = { x, z } unlock() ; write = { y } Figure 2. An example of shared-memory multithreading 647 646
❆r❝❤✐t❡❝t✉r❡ Application iThreads library Memoizer Recorder / Replayer CDDG Memory subsystem OS support OS Figure 5. iThreads implementation architecture. Shaded boxes represent the main components of the system. 651
■♠♣❧❡♠❡♥t❛t✐♦♥ ■ ❉t❤r❡❛❞s ■ ❙❡♣❛r❛t❡ ❛❞❞r❡ss s♣❛❝❡s ❢♦r t❤r❡❛❞s ■ P❛❣❡ r❡❛❞✴✇r✐t❡ ♣r♦t❡❝t✐♦♥ ■ ❇②t❡✲❧❡✈❡❧ ❞❡❧t❛ Shared Thread-1 Thread-2 address space private address space private address space Write Thunk execution Thunk execution Write Sync Sync Shared memory Shared memory commit commit Thunk execution Thunk execution Figure 6. Overview of the RC model implementation
❚❛❜❧❡ ♦❢ ❈♦♥t❡♥ts ▼♦t✐✈❛t✐♦♥ ❉❡t❛✐❧s ❊✈❛❧✉❛t✐♦♥ ❈♦♥❝❧✉s✐♦♥
▼❡tr✐❝s ❚✐♠❡ r✉♥t✐♠❡ ♦❢ t❤❡ s❧♦✇❡st t❤r❡❛❞ ❲♦r❦ s✉♠ ♦❢ t❤❡ t♦t❛❧ r✉♥t✐♠❡ ♦❢ ❛❧❧ t❤r❡❛❞s ❇❡♥❝❤♠❛r❦s✿ P❆❘❙❊❈ ❛♥❞ P❤♦❡♥✐①
Number of threads 12 24 48 64 100 Work speedup 10 1 0.1 0.01 Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index Number of threads 12 24 48 64 10 Time speedup 1 <0.1 0.1 Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index Figure 7. Performance gains of iThreads with respect to pthreads for the incremental run 653 653
❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡ Number of threads 12 24 48 64 100 Work speedup 10 1 0.1 Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index Number of threads 12 24 48 64 10 Time speedup 1 0.1 Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index Figure 8. Performance gains of iThreads with respect to Dthreads for the incremental run 653 653
❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡✱ ❞✐☛❡r❡♥t ✐♥♣✉t s✐③❡s 4.5 4.5 Work Time Normalized input size Normalized input size Normalized input size Normalized input size 4 4 100 100 13 13 13 13 Input Input 3.5 3.5 Work speedup Work speedup Time speedup Time speedup 3 3 10 10 10 10 2.5 2.5 2 2 10 10 7 7 7 7 1.5 1.5 4 4 1 1 4 4 0.5 0.5 1 1 1 1 1 1 S S M M L L S S M M L L S S M M L L S S M M L L S S M M L L S S M M L L Histogram Linear-reg. String-match Histogram Linear-reg. String-match Histogram Linear-reg. String-match Histogram Linear-reg. String-match Figure 9. Scalability with data (work and time speedups) 654
❙✐♥❣❧❡ ♠♦❞✐☞❡❞ ♣❛❣❡✱ ❞✐☛❡r❡♥t ✇♦r❦ ❛♠♦✉♥t 16 Normalized total work pthreads Blackscholes 14 iThreads Blackscholes 12 pthreads Swapations iThreads Swapations 10 8 6 4 2 0 1X2X 4X 8X 16X Normalized computation size Figure 10. Scalability with work 654
❙❡✈❡r❛❧ ♠♦❞✐☞❡❞ ♣❛❣❡s Number of dirty pages 2 4 8 16 32 64 100 Work speedup 10 1 0.1 <0.01 <0.01 0.01 Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index Number of dirty pages 2 4 8 16 32 64 10 Time speedup 1 <0.1 <0.1 0.1 Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index Figure 11. Scalability with input change compared to pthreads for 64 threads 654 654
❖✈❡r❤❡❛❞ ♦❢ ✐❚❤r❡❛❞s s②st❡♠ ❞❛t❛ Application Input size Memoized state CDDG Histogram 230400 347 (0.15%) 57 (0.02%) Linear-reg. 132436 192 (0.14%) 33 (0.02%) Kmeans 586 1145 (195.39%) 27 (4.61%) Matrix-mul. 41609 4162 (10.00%) 64 (0.15%) Swapations 143 1473 (1030.07%) 1 (0.70%) Blackscholes 155 201 (129.68%) 1 (0.65%) String match 132436 128 (0.10%) 33 (0.02%) PCA 140625 3777 (2.69%) 43 (0.03%) Canneal 9 15381 (170900.00%) 4 (44.44%) Word count 12811 10191 (79.55%) 24 (0.19%) Rev-index 359 260679 (72612.53%) 64 (17.83%) Table 1. Space overheads in pages and input percentage 654
■♥✐t✐❛❧ r✉♥ ♦✈❡r❤❡❛❞ Number of threads 12 24 48 64 1000 Work overhead 100 10 1 0.1 Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index 1000 Number of threads 12 24 48 64 Time overhead 100 10 1 0.1 Histogram Linear_reg Kmeans Matrix_mul Swapations Blackscholes String_match PCA Canneal Word_count Reverse_index Figure 12. Performance overheads of iThreads with respect to pthreads for the initial run 655 655
Recommend
More recommend