Parallelization of Utility Programs Based on Behavior Phase Analysis - - PowerPoint PPT Presentation

parallelization of utility programs based on behavior
SMART_READER_LITE
LIVE PREVIEW

Parallelization of Utility Programs Based on Behavior Phase Analysis - - PowerPoint PPT Presentation

Parallelization of Utility Programs Based on Behavior Phase Analysis Xipeng Shen Chen Ding Department of Computer Science University of Rochester 1 Motivation Multi-core is coming to personal computers Many programs, especially


slide-1
SLIDE 1

1

Parallelization of Utility Programs Based on Behavior Phase Analysis

Xipeng Shen Chen Ding

Department of Computer Science University of Rochester

slide-2
SLIDE 2

2

Motivation

 Multi-core is coming to personal computers  Many programs, especially those run on

past personal computers, are sequential programs

 Automatic parallelization is the path of

least resistance

slide-3
SLIDE 3

3

Utility Programs

 A class of dynamic programs which take

a group of requests and serve them one by one

 Examples

 Compilers, interpreters, compressions,

transcoding utilities, ...

 GNU C compiler (Gcc)

The compilation of a function is a phase

slide-4
SLIDE 4

4

 Dynamic data (access)

 Dynamically allocated data structures  One or more levels of indirections

 Complex control flow

 Input-dependent execution paths  Many (recursive) function calls

 More difficult to analyze and parallelize

than scientific programs

Challenges

slide-5
SLIDE 5

5

Opportunities

 Different phase instances operate on

different data, thus have few data dependences between them

 Recently we found a way to detect the

phase boundaries

 Can we automatically parallelize those

programs at the phase level?

slide-6
SLIDE 6

6

Overview

 Objective: to preliminarily check the feasibility

  • f parallelizing utility programs at phase level

without special hardware support

 Technology

 Phase detection  Dependence detection  Program transformation

 Evaluation  Summary

slide-7
SLIDE 7

7

Behavior Phase Detection

 Key idea: active profiling

 Use regular input to trigger repetitive

behavior

 Filtering dynamic basic block trace based

  • n frequency and recurring distance

 Use real input to verify phase boundaries

*Refer to “Shen et. al., TR 848, CS, U of Rochester, 2004”

slide-8
SLIDE 8

8

Phase-based Parallelization

 Process-based parallelization

 Separate address space

 Each process executes one or a group

  • f phase instances
slide-9
SLIDE 9

9

Phase-Dependence Detection

 Trace memory accesses in profiling runs  Detect different kinds of dependences

 anti- and output dependences can be

ignored because of separate address space

 Classify flow dependences into removable

and non-removable types

slide-10
SLIDE 10

10

Flow Dependence

 Removable flow dependence

 Memory reuses  Implicit initialization  Byte operations

slide-11
SLIDE 11

11

Memory Reuses

Two objects are allocated to the same memory location in different part of the execution.

slide-12
SLIDE 12

12

Implicit Initialization

NODE* xlevel(NODE* expr){

if (++xltrace<TDEPTH){ ... }

  • - xltrace;

}

*code fragments from SPEC2K/LI

slide-13
SLIDE 13

13

Byte Operation

char * buf; ... buf[i] = 0; // byte operation

*code fragments from SPEC2K/Parser

lda s4, -28416(gp) // load array base address addq s4, s0, s4 // shift to the target array element ldq u v0, 0(s4) // load a quadword from the current element mskbl v0, s4, v0 // set the target byte to 0 by masking stq u v0, 0(s4) // store the new quadword to the array

slide-14
SLIDE 14

14

Program Transformation

 We parallelize programs by hand at phase

boundaries based on the information provided by the automatic tool

 A fully automatic tool would include

automatic parallelization with run-time support to guarantee correctness and rollback when necessary

 Currently being studied

slide-15
SLIDE 15

15

Evaluation (4-CPU Xeon 2GHz)

  • 0.5

0.5 1 1.5 1 2 4 8 Process Number Speedup times Gzip Parser

slide-16
SLIDE 16

16

Evaluation (16-CPU Sunfire Sparc V9 1.2 GHz)

2 4 6 8 10 12 14 1 2 4 8 16 32 Process Number Speedup times Gzip Parser

slide-17
SLIDE 17

17

Summary

 A preliminary exploration on the coarse-

grain parallelization of utility programs based on behavior phases

 Fully automatic system remains our

future work

slide-18
SLIDE 18

18

The End Thanks!