A Preliminary Study On the Vectorization of Multimedia Applications - - PowerPoint PPT Presentation

a preliminary study on the vectorization of multimedia
SMART_READER_LITE
LIVE PREVIEW

A Preliminary Study On the Vectorization of Multimedia Applications - - PowerPoint PPT Presentation

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions Gang Ren Peng Wu David Padua University of Illinois IBM T.J. Watson Research University of Illinois Presented by Gang Ren


slide-1
SLIDE 1

A Preliminary Study On the Vectorization

  • f Multimedia Applications for

Multimedia Extensions

Gang Ren Peng Wu David Padua

University of Illinois IBM T.J. Watson Research University of Illinois

Presented by Gang Ren

slide-2
SLIDE 2

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Multimedia Extensions (MME)

Additions to accelerate multimedia applications

  • For general-purpose processors:
  • MAX(HP), VIS(Sun), AltiVec(Motorola/IBM/Apple), SSE(Intel)
  • For special-purpose processors:
  • PS2(SONY), Graphics Processing Unit(NVIDIA)

Use SIMD architecture

Intel SSE2

128bits 128bits 128bits

Vector Unit

Register File (xmm0~xmm7) 16 chars 8 shorts 4 integers 2 doubles 4 singles

slide-3
SLIDE 3

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Programming Multimedia Extensions

  • int a[16],b[16],c[16];

for(i=0; i<16; i++) c[i] = a[i] + b[i]; vector int a[4],b[4],c[4]; for(i=0; i<4; i++) c[i] = vec_add(a[i], b[i]); movaps xmm0, XMMWORD PTR [eax] addps xmm0, XMMWORD PTR [edx] movaps XMMWORD PTR [ecx], xmm0

slide-4
SLIDE 4

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Motivation

Scientific Applications

Traditional Vectorization

Multimedia Applications

MME Vectorization

GAP

Diff Diff Vector Processors Multimedia Extensions

slide-5
SLIDE 5

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Gaps From Architecture

Diff

Scientific Applications

Traditional Vectorization

Multimedia Applications

MME Vectorization Vector Processors Multimedia Extensions Diff

GAP

slide-6
SLIDE 6

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Differences in memory unit

  • MME: No scatter/gather memory operations
  • MME: Only support aligned memory access

Differences in ISA

  • MME: Special instructions for media processing
  • Example: Saturated Operations
  • MME: Non-uniform support for different element types
  • SSE2: Max/min operations on 16-bit short integers

MME vs. Vector Processor

for(i=0; i<8; i++) for(j=0; j<8; j++) s += a[i*8+j] * b[j*8+i];

slide-7
SLIDE 7

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Gaps From Applications

Diff

Scientific Applications

Traditional Vectorization

Multimedia Applications

MME Vectorization

GAP

Diff Vector Processors Multimedia Extensions

slide-8
SLIDE 8

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Berkeley Multimedia Workload

Evolves from MediaBench 12 applications written in C/C++

  • Audio compression: ADPCM, GSM, LAME, mpg123
  • Image/video compression: DVJU, JPEG, MPEG2
  • Graphics: POVray, Mesa, Doom
  • Others: Rsynth, Timidity

Where are the example codes from?

  • Important loops in core procedures ( >10% total ex. time)
slide-9
SLIDE 9

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Where Are Gaps From?

Different programming styles

  • Pointer access
  • Manually unrolled loops

Mismatches between application and language

  • Integer promotion
  • Saturated operation

Different code patterns

  • Bit-wise operations
  • Lookup tables
slide-10
SLIDE 10

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

C Language Issues: Integer Promotion

  • Integer promotion
  • Forced by ANSI C semantics (ISO/IEC 9899:1999)
  • All char or short types are automatically promoted to integer type

before conducting any arithmetic operations.

  • Fit traditional scalar architecture well
  • MME supports sub-word level parallelism
  • Integer promotion will waste computation bandwidth
  • How to eliminate unnecessary integer promotion?
  • Some analyses needed

to ensure the same result

for(i=0; i<1024; i++) for(j=0; j<1024; j++) dst[i,j]=src1[i,j]+src2[i,j];

slide-11
SLIDE 11

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

C Language Issues: Saturated Operations

/* From BMW/GSM */ ltmp=a+b; if(( (unsigned)ltmp – MIN_WORD ) > ( MAX_WORD - MIN_WORD )) if(ltmp>0) ltmp = MAX_WORD; else ltmp = MIN_WORD; for(i=0; i<1024; i++) for(j=0; j<1024; j++) { dst[i,j]=src1[i,j]+src2[i,j]; if(dst[i,j] > 255) dst[i,j] = 255; if(dst[i,j] < 0 ) dst[i,j] = 0; }

slide-12
SLIDE 12

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Code Pattern: Lookup Tables

To implement saturated operations To replace expensive math function calls

/* From BMW/Lame */ if (init==0) for (i=0;i<LUTABSIZE;i++) lutab[i]=pow(...); ... for (i=0;i<l_end;i++) { temp=...; if (temp<1000.0) { ix[i]=lutab[(temp*10)]; } }

slide-13
SLIDE 13

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Some Related Work

  • Compilation based on traditional vectorization
  • Cheong and Lam’s optimizer for VIS (Sun)
  • Krall and Lelait’s traditional vectorizer for VIS
  • Sreraman and Govindarajan’s vectorizer for MMX(Intel)
  • Aart’s intra-register vectorization for the Intel architecture
  • Other compilation techniques
  • Krall and Lelait’s “Vectorization by loop unrolling”
  • Larsen and Amarasinghe’s “Superword level parallelism”
  • Fisher and Dietz’s “SIMD-within-a-register”
  • Product compilers
  • VAST/AltiVec, CodePlay/VectorC, Intel compiler,…
slide-14
SLIDE 14

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Conclusions

Gaps exist between traditional vectorization

and compilation for multimedia extensions

From differences between two architectures From different programming styles, mismatch

with language semantics, different code patterns

Additional compiler techniques need to be

developed or extended to bridge these gaps

slide-15
SLIDE 15

A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions LCPC ‘03

Future Work

Our first step to unleash the power of MMEs

Manual vectorization to see how far we can go Implement our vectorizer on SUIF

Propose new techniques to bridge the gaps Extend application domain

Traditional applications: SPECfp, SPECint Applications for embedded systems

slide-16
SLIDE 16

Thank You