MP3 - - PowerPoint PPT Presentation

mp3
SMART_READER_LITE
LIVE PREVIEW

MP3 - - PowerPoint PPT Presentation

MP3 V L S I / C A D Outline MPEG1 Audio Layer III Encoding MP3 Encoder Design MPEG1 Audio Layer III Decoding MP3


slide-1
SLIDE 1

MP3 系統設計與驗證

報告人 :林士生 指導教授:周哲民 成功大學電機研究所V L S I / C A D 組

slide-2
SLIDE 2

2

Outline

MPEG1 Audio Layer III Encoding MP3 Encoder Design MPEG1 Audio Layer III Decoding MP3 Decoder Design Synthesis Result and Verification

slide-3
SLIDE 3

3

Introduction to MPEG Audio Layer III Encoding

channel 0

Aliasing reducing Quantization Side information encoder Huffman encoder

Header Side information Main data(gr0 ch0) Main data(gr0 ch1) CRC …

subband MDCT

Bitstream formater

Psychoacoustic model Stereo Processing

Loop

PolyIn(ch0) Xm PolyIn(ch1) Xm Poly phase filter bank win_type MDCTin si MDCTout Xi

reorder

MDCToutR

Scfsi encoder

scfsi

Code & table used side information SMR MDCToutRA

PCM sample pre- processing channel 1 A/D converter

slide-4
SLIDE 4

4

Poly Phase Analysis Filter Bank

31 ) (

63 7 64 64

to i for X C M s

k j j k j k ik i

= × × = ∑∑

= = + +

)] 16 ( ) 1 2 ( 64 cos[ − × + × = k i M ik π

Ck+64j window can be found in standard Xk+64j is the input sample

32 samples c window + + …… + + 63 …… …… z vector 63 SMDCT 63 31 = 512 buffer Shift 32 new samples into 512 FIFO buffer Xm Shift For m=0 to 511 do Zm=Cm*Xm X[i] multiply by C window For k=0 to 63 For j=0 to 7 Yk+=Zk+j*64 Calculate Y vector For i=0 to 31 For k=0 to 63 si+=Mik*Yk Multiply by MDCT vector

slide-5
SLIDE 5

5

MDCT

Before MDCT, the subband output must pass through a window function. The window function is determined by the psychoacoustic model. As shown in Fig.3-6, Layer III has 4 MDCT windows: normal window (block_type=0), start window (block_type=1), short window (block_type=2), and stop window (block_type=3).

35 )) 2 1 ( 36 sin( to i for i w

i

= + = π

35 30 29 24 )) 2 1 18 ( 12 sin( 23 18 1 17 )) 2 1 ( 36 sin(            = = + − = = + = to i for to i for i to i for to i for i w

i

π π

2 to k 11, to for ) ) 2 1 ( 12 sin( = = + = i i w

i

π

           = + = = + − = = 35 18 )) 2 1 ( 36 sin( 17 12 1 11 6 )) 2 1 6 ( 12 sin( 5 to i for i to i for to i for i to i for wi π π

slide-6
SLIDE 6

6

MDCT (cont.)

After windowing, the result will be as following in long block: In short block, the result will be: The MDCT is calculated by

i i i

w s z =

where is subband output

i

s 2 11

) ( ) (

to k for to i for w y z

i k i k i

= = = 11 11 11

18 ) 2 ( 12 ) 1 ( 6 ) (

to i for s y to i for s y to i for s y

i i i i i i

= = = = = =

+ + +

and is subband output

i

s

where

− =

− = + + + =

1

1 2 )) 1 2 )( 2 1 2 ( 2 cos(

n k k i

n to i for i n k n z x π

for long block, n=36, for short block, n=18

slide-7
SLIDE 7

7

MDCT (cont.)

  • 1

1 2 3 4 5 6 7 8 9 1 2 3 4 . 2 . 4 . 6 . 8 1 1 . 2 5 1 1 5 2 2 5 3 3 5 4

wi si

i i i

w s z =

  • 1

1 2 3 4 5 6 7 8 5 1 1 5 2 2 5 3 3 5 4

slide-8
SLIDE 8

8

Reordering

Before reordering, in long window, frequency lines ordered firstby subband and then by frequency. Before reordering, in short window , frequency lines ordered first by subband and then by window and at last by frequency. In order to increase the efficiency of the Huffman coding the fr equency lines for the short windows case were reordered into subbands first , then frequency and at last by window. Reordering is needed only in short window(type= 2)

0 1 2…5 6 7…11 12…17 sb0 Long window 0 1 2…5 6 7…11 12…17 0 1 2…5 6 7…11 12…17 sb1 sb31 Short window 0 6 12 1 7 13 .. 5 11 17 sb0 sb1 sb31 0 6 12 1 7 13 .. 5 11 17 0 6 12 1 7 13 .. 5 11 17 0 1 …5 6 7 …11 12… 17 0 1 …5 6 7 …11 12… 17 0 1 …5 6 7 …11 12… 17

slide-9
SLIDE 9

9

Alias Reduction for Long Block

After MDCT, an alias reduction is processed for long block to remove some artifacts caused by the overlapping bands of the poly phase filter bank. The calculation is done by a butterfly form, which is shown as following:

7 ~ 1 1 1

2 2

= + = + = i c c ca c cs

i i i i i

where are constants, which can be found in the standard.

i

c

i i i k i k i i k i i i i

ca xr cs xr xra ca xr cs xr xra × + × = × − × =

+ + +

csi csi cai cai + + — +

xri+k xri xrai xrai+k

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 spectral index

slide-10
SLIDE 10

10

Scfsi Encoder

all spectrum != 0? No short block ?

krit

tot en tot en tot en _ _ _

1

< −

krit band r scalefacto all

dif en sb en sb en _ ) ( ) (

1

< −

∑ ∑

< −

krit

band scfsi xm sb xm sb xm ) _ ( ) ( ) (

1

< −

krit

band scfsi en sb en sb en ) _ ( ) ( ) (

1

scfsi= 0 scfsi= 1

Y N N N Y Y Y Y Y N xr(i),block type

slide-11
SLIDE 11

11

Scfsi Encoder

for ( temp = 0, i = samp_per_frame2; i--; ) temp + = sq(xr[ i] ); if ( temp ) en_tot[gr] [ch] = log( temp ) / log2 ; else en_tot[gr] [ch] = 0; for ( temp = 0, i = samp_per_frame2; i--; ) temp + = sq(xr[ i] ); if ( temp ) en_tot[gr] [ch] = log( temp ) / log2 ; else en_tot[gr] [ch] = 0;

2 log ) ( log ) ( log _

1 2 1 2 2

∑ ∑

= =

= =

n i n i

i xr i xr tot en

C a l c u l a t e t h e t

  • t

a l e n e r g y

  • f

e a c h g r a n u l e W h e r e x r ( i ) a r e t h e s p e c t r a l v a l u e s

for(sfb= 21; sfb--; ){ start = scalefac_band_long[ sfb ] ; end = scalefac_band_long[ sfb+ 1 ]; for ( temp = 0.0, i = start; i < end; i+ + ) temp + = xr[i] * xr[ i] ; if ( !temp ) en[gr] [ ch][sfb] = 0.0; else en[gr] [ ch][sfb] = log( temp )/ log2; if ( l3_xmin-> l[gr] [ ch] [sfb] ) xm[gr] [ ch][sfb] = log( l3_xmin-> l[gr][ ch][sfb] ) / log2; else xm[gr] [ ch][sfb] = 0.0; } for(sfb= 21; sfb--; ){ start = scalefac_band_long[ sfb ] ; end = scalefac_band_long[ sfb+ 1 ]; for ( temp = 0.0, i = start; i < end; i+ + ) temp + = xr[i] * xr[ i] ; if ( !temp ) en[gr] [ ch][sfb] = 0.0; else en[gr] [ ch][sfb] = log( temp )/ log2; if ( l3_xmin-> l[gr] [ ch] [sfb] ) xm[gr] [ ch][sfb] = log( l3_xmin-> l[gr][ ch][sfb] ) / log2; else xm[gr] [ ch][sfb] = 0.0; }

− + =

=

1 ) ( ) ( ) ( 2 2

) ( log ) (

sb bw sb lbl sb lbl i

i xr sb en

C a l c u l a t e t h e e n e r g y

  • f

e a c h s c a l e f a c t

  • rb

a n d W h e r e l b l ( s f b ) i s t h e l

  • w

e r b

  • u

n d a r y

  • f

s c a l e f a c t

  • rb

a n d s f b , b w ( s f b ) i s t h e w i d t h

  • f

s c a l e f a c t

  • rs

f b

slide-12
SLIDE 12

12

Scfsi Encoder

band all for 10 ) _ ( band all for 10 ) _ ( 100 _ 10 _ = = = =

krit krit krit krit

band scfsi xm band scfsi en dif en tot en

內定值

for ( sfb = cod_info-> sfb_lmax; sfb--; ) { start = scalefac_band_long[ sfb ] ; end = scalefac_band_long[ sfb+ 1 ]; bw = end - start; for ( en = 0, l = start; l < end; l+ + ) en + = xr[gr] [ ch][l] * xr[gr][ ch][l]; l3_xmin-> l[gr] [ ch] [sfb] = ratio-> l[gr] [ch][sfb] * en / bw; } for ( sfb = cod_info-> sfb_lmax; sfb--; ) { start = scalefac_band_long[ sfb ] ; end = scalefac_band_long[ sfb+ 1 ]; bw = end - start; for ( en = 0, l = start; l < end; l+ + ) en + = xr[gr] [ ch][l] * xr[gr][ ch][l]; l3_xmin-> l[gr] [ ch] [sfb] = ratio-> l[gr] [ch][sfb] * en / bw; }

) ( ) ( ) min(

1 ) ( ) ( ) ( 2

sb bw i xr ratio sfb x

sb bw sb lbl sb lbl i ∑ − + =

× =

C a l c u l a t e t h e a l l

  • w

e d d i s t

  • r

t i

  • n
  • f

e a c h s c a l e f a c t

  • rb

a n d

)} min( int{log ) (

2

sfb x sfb xm =

slide-13
SLIDE 13

13

Scale factor length

3 2 1 scfsi … … 162~1 95 16 … … 134~1 61 15 418~57 5 342~4 17 62~7 3 52~6 1 … 24~2 9 20~2 3 … 4~7 0~3 範圍 21 20 11 10 … 6 5 … 1 band scalefactor長度=slen1 scalefactor長度=slen2 全部填0 sample frequency = 44.1kHz Long block (block type=0,1,3) : 348~ 377 378~ 407 … … … 90~ 119 6 … 408~ 575 318~ 347 66~ 89 … … … 8~11 4~7 0~3 範圍 12 11 5 … band scalefactor長度=slen1 scalefactor長度=slen2 全部填0 Short block (block type=2 and mixed_block_flag=0) :

2 10 1 11 _ 2 slen slen length part × + × =

2 ) 3 6 ( 1 ) 3 6 ( _ 2 slen slen length part × × + × × =

slide-14
SLIDE 14

14

Cont.

static int slen1_tab[16] = { 0, 0, 0, 0, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4 }; static int slen2_tab[16] = { 0, 1, 2, 3, 0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 2, 3 }; slen1 = slen1_tab[ gi->scalefac_compress ]; slen2 = slen2_tab[ gi->scalefac_compress ]; static int slen1_tab[16] = { 0, 0, 0, 0, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4 }; static int slen2_tab[16] = { 0, 1, 2, 3, 0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 2, 3 }; slen1 = slen1_tab[ gi->scalefac_compress ]; slen2 = slen2_tab[ gi->scalefac_compress ];

264~353 16 16 16 11 11 … … 10 Short block Long block 10 102~143 10 354~575 144~197 … 36~65 30~35 … 4~7 0~3 範圍 17 11 … 8 8 8 7 … 1 band

scalefactor長度=slen1 scalefactor長度=slen2 全部填0 Mix block (block type=2 and mixed_block_flag=1) :

2 ) 3 6 ( 1 ) 3 3 8 ( _ 2 slen slen length part × × + × × + =

slide-15
SLIDE 15

15

Nonuniform quantization

) 0946 . ) 2 ) ( (( nint ) (

75 .

4

− =

tepSize quantizerS

i xr i ix

Quantization

Where xr(i) are the spectral values , and quantizerStepSize is the quantization step The nonuniform quantization is done by raising its input to the 0.75 power first to offer more consistent SMR over the range of quantizer values. In this way , larger input values are automatically coded with less accuracy ,and some noise shaping is already built into the quantization process.

slide-16
SLIDE 16

16

Huffman coding

big_value count1 rzero

big_value*2 big_value*2+c

  • unt1*4

575

2 個值為一組編碼 4 個值為一組編碼 全部為0

hlen bits

Huffman code sig v sig w sig x sig y

1 bit 1 bit 1 bit 1 bit

region0 region1 region2

region0_count region0_count+ region1_count+1 big_value*2

hlen [index] bits index= (x* vlen )+ y

Huffman code sig x

linbits 1 bit 1 bit

ESC x sig y ESC y

linbits hlen [index] bits index= (x* vlen )+ y

Huffman code sig x

1 bit 1 bit

sig y

ESC table?

Yes No

After quantization , the coefficients tend to fall at the lower frequencies and long runs

  • f zero tend to locate the higher frequencies. Thus for lower frequencies , it works on

pairs and , in the case of very small numbers to be coded , in q uadruples. Huffman code format

slide-17
SLIDE 17

17

Count1 Encoder

find sig v and abs(v) start find sig w and abs(w) find sig x and abs(x) find sig y and abs(y) value= v + (w<<1) + (x<<2) + (y<<3) total bit=sigv+sigw+sigx+sigy+codelenght find Huffman code code0, code1 code0len>c

  • de1len?

Huffman code =code0 table select=0

Y N

Huffman code =code1 table select=1

slide-18
SLIDE 18

18

Count1 Encoder

int sum0 = 0, sum1 = 0; for(i= cod_info-> big_values< < 1, k= 0; k< cod_info-> count1; i+ = 4, k+ + ){ v = abs(ix[i]); w = abs(ix[i+ 1]); x = abs(ix[i+ 2]); y = abs(ix[i+ 3]); p = v + (w< < 1) + (x< < 2) + (y< < 3); signbits = 0; if(v!= 0) signbits+ + ; if(w!= 0) signbits+ + ; if(x!= 0) signbits+ + ; if(y!= 0) signbits+ + ; sum0 + = signbits; sum0 + = ht[32].hlen[ p] ; sum1 + = signbits; sum1 + = ht[33].hlen[ p] ; } if(sum0< sum1){ cod_info-> count1table_select = 0; return sum0; } else{ cod_info-> count1table_select = 1; return sum1; } int sum0 = 0, sum1 = 0; for(i= cod_info-> big_values< < 1, k= 0; k< cod_info-> count1; i+ = 4, k+ + ){ v = abs(ix[i]); w = abs(ix[i+ 1]); x = abs(ix[i+ 2]); y = abs(ix[i+ 3]); p = v + (w< < 1) + (x< < 2) + (y< < 3); signbits = 0; if(v!= 0) signbits+ + ; if(w!= 0) signbits+ + ; if(x!= 0) signbits+ + ; if(y!= 0) signbits+ + ; sum0 + = signbits; sum0 + = ht[32].hlen[ p] ; sum1 + = signbits; sum1 + = ht[33].hlen[ p] ; } if(sum0< sum1){ cod_info-> count1table_select = 0; return sum0; } else{ cod_info-> count1table_select = 1; return sum1; }

slide-19
SLIDE 19

19

Big Value Encoder

start sum0 > sum1? find max(ix) all region proceed? find Huffman table that can encode max(ix) calculate total bits required sum0 calculate total bits required for other table sum1

N Y

use new table table>15?( use ESC table?)

N Y

find sig x, sig y and abs(x), abs(y) find sig x, sig y and abs(x), abs(y) ESCx=x-15 x=15 x>14? find Huffman code y>14? ESCy=y-15 y=15 find Huffman code encode ESCx encode ESCy encode sigx encode sigy encode sigx encode sigy

end N N Y Y Y N

slide-20
SLIDE 20

20

Big Value Huffman Table Select

max = ix_max(ix,begin,end); if(!max) return 0; choice[0] = 0; choice[1] = 0; if(max< 15){ for ( i = 14; i--; ) /* try tables with no linbits * / if ( ht[i] .xlen > max ){ choice[ 0 ] = i; break; } sum[ 0 ] = count_bit( ix, begin, end, choice[0] ); switch ( choice[0] ){ case 2 : sum[ 1 ] = count_bit( ix, begin, end, 3 ); if ( sum[1] < = sum[0] ) choice[ 0 ] = 3; break; case 5 : sum[ 1 ] = count_bit( ix, begin, end, 6 ); if ( sum[1] < = sum[0] ) choice[ 0 ] = 6; break; max = ix_max(ix,begin,end); if(!max) return 0; choice[0] = 0; choice[1] = 0; if(max< 15){ for ( i = 14; i--; ) /* try tables with no linbits * / if ( ht[i] .xlen > max ){ choice[ 0 ] = i; break; } sum[ 0 ] = count_bit( ix, begin, end, choice[0] ); switch ( choice[0] ){ case 2 : sum[ 1 ] = count_bit( ix, begin, end, 3 ); if ( sum[1] < = sum[0] ) choice[ 0 ] = 3; break; case 5 : sum[ 1 ] = count_bit( ix, begin, end, 6 ); if ( sum[1] < = sum[0] ) choice[ 0 ] = 6; break; case 7 : sum[ 1 ] = count_bit( ix, begin, end, 8 ); if ( sum[1] < = sum[0] ){ choice[ 0 ] = 8; sum[ 0 ] = sum[ 1 ] ; } sum[ 1 ] = count_bit( ix, begin, end, 9 ); if ( sum[1] < = sum[0] ) choice[ 0 ] = 9; break;

…………

} case 7 : sum[ 1 ] = count_bit( ix, begin, end, 8 ); if ( sum[1] < = sum[0] ){ choice[ 0 ] = 8; sum[ 0 ] = sum[ 1 ] ; } sum[ 1 ] = count_bit( ix, begin, end, 9 ); if ( sum[1] < = sum[0] ) choice[ 0 ] = 9; break;

…………

}

slide-21
SLIDE 21

21

Big Value Huffman Encoder

Huffman code sig x ESC x sig y ESC y Huffman code sig x sig y

signx = abs_and_sign( &x ); signy = abs_and_sign( &y ); if ( table_select > 15 ) { if ( x > 14 ){ linbitsx = x - 15; x = 15; } if ( y > 14 ){ linbitsy = y - 15; y = 15; } idx = (x * ylen ) + y; * code = h -> table[idx]; * cbits = h-> hlen [idx]; signx = abs_and_sign( &x ); signy = abs_and_sign( &y ); if ( table_select > 15 ) { if ( x > 14 ){ linbitsx = x - 15; x = 15; } if ( y > 14 ){ linbitsy = y - 15; y = 15; } idx = (x * ylen ) + y; * code = h -> table[idx]; * cbits = h-> hlen [idx]; if ( x > 14 ){ * ext | = linbitsx; * xbits + = linbits; } if ( x != 0 ){ * ext < < = 1; * ext | = signx; * xbits + = 1; } if ( y > 14 ){ * ext < < = linbits; * ext | = linbitsy; * xbits + = linbits; } if ( y != 0 ){ * ext < < = 1; * ext | = signy; * xbits + = 1; } if ( x > 14 ){ * ext | = linbitsx; * xbits + = linbits; } if ( x != 0 ){ * ext < < = 1; * ext | = signx; * xbits + = 1; } if ( y > 14 ){ * ext < < = linbits; * ext | = linbitsy; * xbits + = linbits; } if ( y != 0 ){ * ext < < = 1; * ext | = signy; * xbits + = 1; } else{ idx = (x * ylen ) + y; * code = h-> table[idx]; * cbits + = h-> hlen [ idx ]; if ( x != 0 ){ * code < < = 1; * code | = signx; * cbits + = 1; } if ( y != 0 ){ * code < < = 1; * code | = signy; * cbits + = 1; } } else{ idx = (x * ylen ) + y; * code = h-> table[idx]; * cbits + = h-> hlen [ idx ]; if ( x != 0 ){ * code < < = 1; * code | = signx; * cbits + = 1; } if ( y != 0 ){ * code < < = 1; * code | = signy; * cbits + = 1; } }

slide-22
SLIDE 22

22

Outline

MPEG1 Audio Layer III Encoding MP3 Encoder Design MPEG1 Audio Layer III Decoding MP3 Decoder Design Synthesis Result and Verification

slide-23
SLIDE 23

23

Experiment of Encoder Complexity

Hardware: PIII-450 256MB RAM Software: Virtual C++

slide-24
SLIDE 24

24

Algorithm Simplification of Poly Phase Analysis Filter Bank

The MP3 encoding flow needs two MDCT, the first one is in poly phase analysis filter bank (SMDCT) and the other is in MDCT. The SMDCTis calculated as following:

31 ) (

63 7 64 64

to i for X C M s

k j j k j k ik i

= × × = ∑∑

= = + +

)] 16 ( ) 1 2 ( 64 cos[ − × + × = k i Mik π

where

48 ] ][ [ 16 1 ] ][ [ 47 33 ] 96 ][ [ ] ][ [ 15 ] 32 ][ [ ] ][ [ = = = = ≤ ≤ − − = ≤ ≤ − = k for k i M k for k i M k for k i M k i M k for k i M k i M

Symmetry on i direction :

else ] ][ 16 [ ] ][ 15 [ ] ][ 31 [ ] ][ [ 2 4 % for ] ][ 16 [ ] ][ 15 [ ] ][ 31 [ ] ][ [ 4 % for ] ][ 16 [ ] ][ 15 [ ] ][ 31 [ ] ][ [ k i M k i M k i M k i M k k i M k i M k i M k i M k k i M k i M k i M k i M + − = − − − = = + − = − − = − = = + = − = − =

Symmetry on k direction :

slide-25
SLIDE 25

25

Algorithm Simplification of Poly Phase Analysis Filter Bank (cont.)

for(i=0;i<8;i++){ sum1 = M1[i][ 0]*(y[ 0]+y[32]) + M1[i][ 1]*(y[ 4]+y[28]) + M1[i][ 2]*(y[ 8]+y[24]) + …… M1[i][ 6]*(y[44]-y[52]) + y[16]; sum2 = M2[i][ 0]*(y[ 2]+y[30]) + …… M2[i][ 7]*(y[46]-y[50]); sum3 = M3[i][ 0]*(y[ 1]+y[31]) + …… M3[i][15]*(y[47]-y[49]); sum4 = M3[15-i][ 0]*(y[ 1]+y[31]) + …… M3[15-i][15]*(y[47]-y[49]); s[ i]=sum1+sum2+sum3; s[31-i]=sum1+sum2-sum3; s[15-i]=sum1-sum2+sum4; s[16+i]=sum1-sum2-sum4; } for(i=0; i<31; i++) for (k=0, s[i]= 0; k<64; k++) s[i] +=M[i][k] * y[k]; //where y[k] is pre-calculated by //multiplying input x and C window

We need 64*32=2048 space to store these coefficients and 2048 multiplications to generate 32 outputs. Thus a frame need (2*1152/32)*2048=147456 multiplications to complete the SMDCT. After simplifing, we only need 7*8+8*8+16*16=376 memory space to store the coefficients, and 376 multiplications to generate 32 outputs. Thus a frame needs (1152*2/32)*376=27072 multiplications.

slide-26
SLIDE 26

26

Algorithm Simplification of MDCT

The same symmetry of cosine coefficient can be applied toMDCT. The MDCT is calculated by the following equation: The following relation can be derived: For long block, window function is: The symmetry properties is:

− =

− = + + + =

1

1 2 )) 1 2 )( 2 1 2 ( 2 cos(

n k k i

n to i for i n k n z x π

define cosine coefficient

)) 1 2 )( 2 1 2 ( 2 cos( ] ][ cos[ + + + = i n k n i k π 4 ] ][ cos[ ] 1 2 ][ cos[ n k for i k i n k ≤ ≤ − = − − 4 ] 2 ][ cos[ ] 1 ][ cos[ n k for i n k i N k < ≤ + = − − 35 )) 2 1 ( 36 sin( to i for i wi = + = π 17 9 8

35 35

to i for w w to i for w w

i i i i

= − = = =

− −

slide-27
SLIDE 27

27

Algorithm Simplification of MDCT (cont.)

We needs 18*36*2=1296 multiplications to generate 18

  • utput. In other words, a

frame needs (1152*2/18)*1296=165888 multiplications for 2 channels.

for(i=0;i<18;i++) for(k=0,x[i]=0;k<36;k++) x[i] += win[k] * s[k] * cos[i][k]; for(i=0;i<18;i++){

  • ut[i] = w[ 0] * (s[ 0]*cos[i][ 0] + in[35]*cos[i][ 9]) +

w[ 1] * (s[ 1]*cos[i][ 1] + in[34]*cos[i][10]) + w[ 2] * (s[ 2]*cos[i][ 2] + in[33]*cos[i][11]) + …… w[ 8] * (s[ 8]*cos[i][ 8] + in[27]*cos[i][17])- w[ 9] * (s[ 9]*cos[i][ 8] - in[26]*cos[i][17]) - …… w[17] * (s[17]*cos[i][ 0]-in[18]*cos[i][ 9]); }

We only need 3*18*18=972 multiplications to generate 18 outputs and 18*18=324 memory space to store cosine coefficients. Thus, a frame needs (2*1152/18)*972=124416 multiplications to complete MDCT.

slide-28
SLIDE 28

28

Implementation of Poly Phase Analysis Filter Bank

We can divide the entire poly phase analysis filter bank into tw

  • main parts,

the shifting and windowing part, and the SMDCT part. Shifting and windowing part:

Input: 32 PCM samples Output: Y vector (64 elements) 32 input samples are shifted into an Xbuffer, which are then multiplied by 512 enwindow coefficients. Required 10 clocks to compute a Y element, where 2 clocks latency and 8 clocks for computing time.

SMDCT part:

Input: Y vector (64elements) Output: 32 subband outputs Two Y elements are added and multiplied by a SMDCT coefficient filter. The adding and multiplying result is accumulated into a register sum. Iteration times depend on symmetrical properties of SMDCT coefficients, which can be 7, 8, or 15 times. we need (3+t) clocks to generate a sum, where t is iteration times and the latency is 3 clocks.

slide-29
SLIDE 29

29

Scheduling of Poly Phase Analysis Filter Bank

ex buffer Shift 32 samples c window X buffer reg1

x +

1 2

temp1 reg1 temp1 temp1 c window

x +

reg1 temp1 temp1 Xbuffer c window

x +

reg1 temp1 temp1 Xbuffer

64 ) (

7 64 64

to k for X C Y

j j k j k k

= × = ∑

= + +

slide-30
SLIDE 30

30

1

filter

x

reg3

+

Y reg2 Y

+

Y reg2 Y filter

x

reg3 sum

+

sum reg3 reg2 sum sum

+

sum

+

Y reg2 Y filter

x

reg3 sum

+

sum

+

Y reg2 Y filter

x

reg3

Scheduling of Poly Phase Analysis Filter Bank

31

63

to i for Y M s

k k ik i

= × = ∑

=

slide-31
SLIDE 31

31

Architecture of Poly Phase Analysis Filter bank

sum1 sum2 sum3 sum4 Y reg3 filter reg2

x

0 1 2 3 0 1 2 0 1 2 3 0 1 2 3

+ ±

0 1 1 0 Exbuffer Xbuffer enwindow

x +

1 0 reg1 temp1 0 1 0 1 2 0 1 2 0 1 2 0 1 2 OUTbuffer

slide-32
SLIDE 32

32

Implementation of MDCT

The MDCT include the multiplying and accumulating (MA) iterationand the aliasing reduction part. MA iteration:

Input: 18 subband outputs from current processing & 18 subband outputs from previous processing. Output: 18 MDCT output coefficient. First iteration: 36 inputs multiplie cosine coefficient cos_l and accumulates into a register temp1. Second iteration: The temp1 register is then multiplied by a window function win after two first iterations. The iteration needs two multiplies and two adders. But the utility rate of multiplier and adder are not 100%. In State 2, only two adders, one multiplier, and stage 1 of the other multiplier are used. In State 3, only one adder, one multiplier, and stage 2 of the other multiplier are used. Thus we can use these unused adder and multiplier to process aliasing reduction.

slide-33
SLIDE 33

33

Scheduling of MDCT

2 3 1 cos x + temp1 reg2 temp1 reg1

  • ldsi

si MDCTin reg1 temp2 cos x + temp1 reg1 temp1 MDCTin cos x + temp1 reg1 MDCTin win x reg2 temp2 + cos x + temp1 reg1 temp1 MDCTin win reg2 temp2 + temp2 Combine 36 input cos x + reg1 MDCTin cos x + temp1 MDCTin win x reg2 cos + temp1 reg1 MDCTin + temp1 reg1 temp1

− =

− = + + + =

1

1 2 )) 1 2 )( 2 1 2 ( 2 cos(

n k k i

n to i for i n k n z x π

i i i

w s z =

cos MDCTin temp2 x temp1 reg1 temp1 x x

slide-34
SLIDE 34

34

Scheduling of Aliasing Reduction

1

Ca

x +

reg3 xri Cs

x

reg4 xri+k Cs

x

reg3 xri Ca

x

reg4 xri+k reg3 reg4

+

reg5 reg3 reg4 State Corresponding State

3 2 1 3 2 1 2 3 2 3 2 3 2 3 3 2

i i i k i k i i k i i i i

ca xr cs xr xra ca xr cs xr xra × + × = × − × =

+ + + xrai+k reg5 xrai

Aliasing reduction part: After scheduling, the aliasing reduction requires only one adder and the first stage of one multiplier in State 0, which corresponding to State 3 of MA iteration. State 1, State 2, and State 3 of aliasing reduction all require

  • nly one stage of multiplier,

and correspond to State 2, State 3, and State 2 of MA iteration, respectively.

slide-35
SLIDE 35

35

Architecture of MDCT + Aliasing reduction

newout MDCTin

  • ldout

cos_l

x

reg1

+

temp1 0 1 0 1 MDCTout CsCa win 0 1 0 1

x

reg2 0 1 0 1 reg3 0 1 temp2 1 0 reg4 0 1 reg5 0 1 MDCTenc

+

0 1 0 1

slide-36
SLIDE 36

36

Simplification of Quantization

inner_loop( ){ do{ …… quantizerStepSize ++; Quantization(quantizerStepSize, xr, ix); …… }while(bit_used>bit_available) } Quantization(quantizerStepSize, xr, ix){ if(!(quantizerStepSize)) step = 1.0; else step = pow(2.0, quantizerStepSize/4 ); step=1/step; for(i=0;i<576;i++){ dbl = fabs(xr[i])*step; if(dbl<0.499996) ix[i]=0; else if(dbl<1.862955) ix[i]=1; else if(dbl<3.565282) ix[i]=2; else if(dbl<5.506396) ix[i]=3; else if(dbl<7.638304) ix[i]=4; else if(dbl<9.931741) ix[i]=5; else { dbl = sqrt(sqrt(dbl)*dbl); if(dbl<0.0946) ix[i] = (int)(dbl - 0.5946); else ix[i] = (int)(dbl + 0.4154); } } }

) 0946 . ) 2 ) ( (( nint ) (

75 .

4

− =

tepSize quantizerS

i xr i ix

slide-37
SLIDE 37

37

Outline

MPEG1 Audio Layer III Encoding MP3 Encoder Design MPEG1 Audio Layer III Decoding MP3 Decoder Design Synthesis Result and Verification

slide-38
SLIDE 38

38

Introduction to MPEG Audio Layer III Decoding

Synchronization & CRC Check CRC buffer side information buffer scale factor decoder Header decoder side information decoder scale factor buffer invert Quantizier Reorder Aliasing reduction IMDCT subband Header buffer PCM sample Stereo Processing header side_info main_data scale factor Huffman code Huffman code decoder channel channel side information scale factor decoded value isi channel & mode Header & side_info

Poly phase synthesis bank

ir ro

HyperOut zi maxb HyperIn Xk

constructed value xri win_type mix_block _flag post- processing audio signal D/A converter

slide-39
SLIDE 39

39

Synchronization & CRC Check

start search synchronization word (FFF) CRC check bit? Get Header to buffer Get 16bits CRC to buffer

  • No. of channel?

Get 256bits side information to buffer Get 136bits side information to buffer Get Main data

end Header buffer CRC buffer side information buffer N Y Y N 1 2

slide-40
SLIDE 40

40

Scalefactor Decoder

start (window switch flag= 1) && (block type= 2) ? mix_block_flag = 1? find slen1, slen2 (use scalefac_compress) Get new scalefactor according to scfsi band, long block format, and slen1,slen2

End

(gr= 0) or (scfsi= 0) ? use scalefactor same as gr0 Get scalefactor according to mixed block format and slen1,slen2 Get scalefactor according to short block format and slen1,slen2

N Y Y N N Y Long block Short block mixed block

slide-41
SLIDE 41

41

Scalefactor Decoder Algorithm

if (window_switching_flag && (block_type = = 2)){ if (mixed_block_flag) else } else Function A

if ((scfsi[0] = = 0) | | (gr = = 0)) for (sfb = 0; sfb < 6; sfb+ + ) (* scalefac).l[sfb] = slen [0][scalefac_compress]; if ((scfsi[1] = = 0) | | (gr = = 0)) for (sfb = 6; sfb < 11; sfb+ + ) (* scalefac).l[sfb] = slen [0][scalefac_compress]; if ((scfsi[2] = = 0) | | (gr = = 0)) for (sfb = 11; sfb < 16; sfb+ + ) (* scalefac).l[sfb] = slen [1][scalefac_compress]; if ((scfsi[3] = = 0) | | (gr = = 0)) for (sfb = 16; sfb < 21; sfb+ + ) (* scalefac).l[sfb] = slen [1][scalefac_compress];

Function A (Long block) Function B Function C

slide-42
SLIDE 42

42

Scalefactor Decoder Algorithm(cont.)

Function B (Short block)

for (sfb = 0; sfb < 6; sfb+ + ) for (window= 0; window< 3; window+ + ) (* scalefac).s[window] [sfb] = slen [0][scalefac_compress]; for (sfb = 6; sfb < 12; sfb+ + ) for (window= 0; window< 3; window+ + ) (* scalefac).s[window] [sfb] = slen [1][scalefac_compress]; for (sfb= 12,window= 0; window< 3; window+ + ) (* scalefac).s[window] [sfb]= 0; for (sfb = 0; sfb < 8; sfb+ + ) (* scalefac).l[sfb ]= slen[ 0][scalefac_compress]; for (sfb = 3; sfb < 6; sfb+ + ) for (window= 0; window< 3; window+ + ) (* scalefac).s[window] [sfb] = slen [0][scalefac_compress]; for (sfb = 6; sfb < 12; sfb+ + ) for (window= 0; window< 3; window+ + ) (* scalefac).s[window] [sfb] = slen [1][scalefac_compress]; for (sfb= 12,window= 0; window< 3; window+ + ) (* scalefac).s[window] [sfb]= 0;

Function C (Mixed block)

slide-43
SLIDE 43

43

Huffman Decoder

start Get Huffman table from count1table_select assign 0 until line = 575

end

calculate count1 length decode v,w,x,y (window switch flag= 1) && (block type= 2) ? region1start=36 region2start= 576 use region0_count+ 1 and region1_count+ 1 to find region1start and region2start Get Huffman table from table_select[0~ 2] decode x,y Big value end ? count1 end ?

N Y N Y Y N big value count1 rzero

slide-44
SLIDE 44

44

Invert Quantization (Q-1)

Invert Quantization

( ) ( )

])) [ * ] [ *( _ ( ] [ _ * 2

2 * ] [

i pretab preflag i sf multiplier scalefac i gain subblock B

A i xr

+ − −

=

[ ] ( )

3 4

] [ * i is i is sign A =

) 210 _ ( * 4 1 − = gain global B

Scalefac_multiplier =0.5 when scalefac_scale=0 Scalefac_multiplier =1 when scalefac_scale=1

])) [ * ] ][ [ ] ][ ][ [ _ *( _ ( ) 210 ] ][ [ _ ( 4 1 3 4

2 * 2 * * ) (

sfb pretab ch gr preflag gr ch sfb l scalefac multiplier scalefac ch gr gain global i i i

is is sign xr

+ − −

=

Long window(block type= 0,1,3)

( )

[ ] ( ) ( )

] ][ ][ ][ [ _ * _ ] ][ ][ ][ [ _ * 8 210 ] [ _ 4 1 3 4

2 * 2 * *

window sfb ch gr s scalefac multiplier scalefac window sfb ch gr gain subblock ch gr gain global i i i

is is sign xr

− − −

=

Short window(block type= 2)

slide-45
SLIDE 45

45

Invert Quantization(cont.)

[ ] ( )

3 4

] [ * i is i is sign A =

) 210 _ ( * 4 1 − = gain global B

Scalefac_multiplier =0.5 when scalefac_scale=0 Scalefac_multiplier =1 when scalefac_scale=1

Long window(block type= 0,1,3) From side information : global_gain , scalefac_scalescalefac_multiplier , preflagpretab From main data : sf(scale factor) value : subblock_gain= 0 Short window(block type= 2) From side information : global_gain , subblock_gains , scalefac_scalescalefac_multiplier From main data : sf(scale factor) value : preflag= 0 , pretab= 0

( ) ( )

])) [ * ] [ *( _ ( ] [ _ * 2

2 * ] [

i pretab preflag i sf multiplier scalefac i gain subblock B

A i xr

+ − −

=

slide-46
SLIDE 46

46

Reordering

synthesis filter bank

PCM sample

I MDCT window window window … … Alias Reduction Alias Reduction Alias Reduction …

Construct value

I MDCT I MDCT

0 1 2…5 6 7…11 12…17 sb0 Long window 0 1 2…5 6 7…11 12…17 0 1 2…5 6 7…11 12…17 sb1 sb31 Short window 0 1 2…5 6 7…11 12…17 sb0 sb1 sb31 0 1 2…5 6 7…11 12…17 0 1 2…5 6 7…11 12…17 0 1 2 3 4 5 … 15 16 17 0 1 2 3 4 5 … 15 16 17 0 1 2 3 4 5 … 15 16 17

slide-47
SLIDE 47

47

IMDCT

The IMDCT is calculated by the following equation: The window function is the same as MDCT, after windowing, the result is:

For long block: For short block:

− =

= + + + =

1

2

1

  • ))

1 2 )( 2 1 2 ( 2 cos(

n

k k i

n to i for k n i n X x π

i i i

w x z =            = = = + = + = = =

− − − − − −

35 30 29 24 23 18 17 12 11 6 5

) 3 ( 18 ) 3 ( 18 ) 2 ( 12 ) 2 ( 12 ) 1 ( 6 ) 1 ( 6

to i for to i for y to i for y y to i for y y to i for y to i for z

i i i i i i i

2 to 11, to

) ( ) (

= = = j i for w x y

i j i j i

where and is IMDCT result for the jth window

) ( j i

x

slide-48
SLIDE 48

48

Poly Phase Synthesis filter bank

] 64 ) 16 ( ) 1 2 ( cos[ π × + × + = i k Nik

31 to ) ( ) (

32 64 96 128 7 64 128

= × + × =

+ + + + = + +

k D V D V s

k i k i i k i k i k

=

= × =

31

63 to

k k ik i

i S N V

D window can be found in standard

For i=64~1024, Shift 64 to 1024 FIFO buffer Vi ( Vi=Vi-64 ) Shift For i=0 to 63 For k=0 to 31 Vi+=Nik*zk Input 32 sample multiply by IMDCT vector For n=0 to 511 Wn=Un*Dn Multiply U vector by D window For m=0 to 7 For j=0 to 31 Um*64+j=Vm*128+j Um*64+32+j=Vm*128+96+j Build U vector For k=0 to 31 For i=0 to 15 Sk+=Wi*31+k Calculate 32 samples

slide-49
SLIDE 49

49

Poly Phase Synthesis filter bank (cont.)

31

SIMDCT

31 32 63 31 32 63

0 31 511 U vector D window 0 31 511 W vector 31 + + + +

=

31 1024 FIFO D window

slide-50
SLIDE 50

50

Outline

MPEG1 Audio Layer III Encoding MP3 Encoder Design MPEG1 Audio Layer III Decoding MP3 Decoder Design Synthesis Result and Verification

slide-51
SLIDE 51

51

Experiment of Decoder Complexity

Hardware: PIII-450 256MB RAM Software: Virtual C++

slide-52
SLIDE 52

52

Algorithm Simplification of IMDCT

The MP3 decoding process requires two IMDCT, one in normal IMDCTblock, the other in poly phase synthesis filter bank (SIMDCT). The IMDCT is calculated as following equation: Define cosine coefficient: Symmetry properties of cosine coefficient:

− =

= + + + =

1

2

1

  • ))

1 2 )( 2 1 2 ( 2 cos(

n

k k i

n to i for k n i n X x π )) 1 2 )( 2 1 2 ( 2 cos( ] ][ cos[ + + + = k n i n k i π 4 ] ][ 2 cos[ ] ][ 1 cos[ n i for k i n k i N < ≤ + = − − 4 ] ][ 2 cos[ ] ][ 1 cos[ n i for k i n k i N < ≤ + = − −

slide-53
SLIDE 53

53

Algorithm Simplification of IMDCT(cont.)

for(i=0;i<18;i++){ sum=0.0; for(k=0;k<18;k++) sum+=in[k]*cos[i][k];

  • ut[index[i][0]]=sum*w[block_type][i];
  • ut[index[i][1]]=sum*w[block_type][i+18];

} for (i = 0; i < 36; i++) { sum = 0.0; for (k = 0; k < 18; k++) sum += in[k] * const3[i][k];

  • ut[i] = sum * win[block_type][i];

} for(j=0;j<3;j++) for(i=0, sum=0.0;i<12;i++){ for(k=0; k<6; k++) sum+=in[j+3*k]*cos[i][k];

  • ut[i+j*6+6] += sum * win[block_type][i+j*12];

} for(j=0;j<3;j++) for(i=0;i<6;i++){ sum=in[j+0] * const1[i][0]+ in[j+3] * const1[i][1]+ in[j+6] * const1[i][2]+ in[j+9] * const1[i][3]+ in[j+12] * const1[i][4]+ in[j+15] * const1[i][5];

  • ut[index[j*6+i][0]] += sum*win[2][i];
  • ut[index[j*6+i][1]] += sum*win[2][i+6];

}

slide-54
SLIDE 54

54

Algorithm Simplification of Poly Phase Synthesis Filter Bank

Define the cosine coefficient: Symmetry on i direction:

for k=0 to 31

Symmetry on k direction:

63 to k , 31 to i )) 1 2 )( 16 ( 64 cos( ] ][ [ = = + + = k i k i N π 48 1 ] ][ [ 16 ] ][ [ 47 33 ] ][ 96 [ ] ][ [ 15 ] ][ 32 [ ] ][ [ = − = = = ≤ ≤ − = ≤ ≤ − − = i for k i N i for k i N i for k i N k i N i for k i N k i N else k i N k i N i if k i N k i N k i N k i N i if k i N k i N k i N k i N ] ][ 31 [ ] ][ [ 2 4 % ] ][ 16 [ ] ][ 15 [ ] ][ 31 [ ] ][ [ 4 % ] ][ 16 [ ] ][ 15 [ ] ][ 31 [ ] ][ [ − − = = + − = − − = − = = + = − = − =

for i=0 to 63

slide-55
SLIDE 55

55

Algorithm Simplification of Poly Phase Synthesis Filter Bank (cont.)

for (i = 0, sum = 0.0; i < 64; i++) { for(k=0; k<32; k++) sum+ = fsout[ch][ss][k] *N[i][k]; vbuff[ch][i]=sum; }

Tsampfor(i=0;i<32;i++){ if(i<16) Tsamp[i]=fsout[ch][ss][i]+fsout[ch][ss][31-i]; else Tsamp[i]=fsout[ch][ss][i-16]-fsout[ch][ss][47-i]; } for(i=0;i<8;i++){ temp1=Tsamp[i]+Tsamp[15-i]; temp2=Tsamp[i]-Tsamp[15-i]; Tsamp[ i]=temp1; Tsamp[15-i]=temp2; } for(i=0, sum=0.0; i<16; i++){ if(i%2) for(j=0;j<16;j++) sum+=Tsamp[j+16]*N[i*12-4+j]; else if(i%4) for(j=0;j<8;j++) sum+=Tsamp[j+8]*N[i*12+j]; else for(j=0;j<8;j++) sum+=Tsamp[j]*N[i*12+j]; vbuff[ch][i+off[ch]]=sum; } for(i=16, sum=0.0; i<32; i++){ if((i%2)==0) for(j=0;j<16;j++) sum+=Tsamp[j+16]*N[i*12+j]; else if((i%4)==1) for(j=0;j<8;j++) sum+=Tsamp[j+8]*N[i*12+4+j]; else for(j=0;j<8;j++) sum+=Tsamp[j]*N[i*12+4+j]; vbuff[ch][i+off[ch]]=sum; }

We need 64*32=2048 memory space to store cosine coefficients and 2048 multiplications to generate 64

  • utputs.

We only need 32*32=1024 memory space to store these coefficients, and 16*16+16*8=320 multiplications to generate 32

  • utputs.
slide-56
SLIDE 56

56

Implementation of IMDCT

The IMDCT consists of three main parts, the multiplying and accumulating (MA) part, the windowing part, and the overlap adding part. MA part:

Input: 18 mdct inputs output: accumulated result The MA part of IMDCT is an iteration that multiplies input by a cosine coefficient and accumulates the result into a register sum. The iteration time depends on what block type is used. For long block, number of iterations are 18; for short block, number of iterations are 6. The iteration time are (2+t) clocks where t is the number of iterations and the latency is 2 clocks.

Windowing part & overlap adding part:

Input: accumulated result output: 36 mdct outputs The accumulated result sum is multiplied by a window coefficient win and accumulates into a register twice to generate two results. These results are then stored into a temporary outBuffer. The outBuffer is then overlap added by a previous results preBuffer to generate final 18 results

slide-57
SLIDE 57

57

Scheduling of IMDCT

FSin input 18 samples cos in reg1

x +

1 2

sum reg1 sum sum cos

x +

reg1 sum sum in cos

x +

reg1 sum sum in

− =

= + + + =

1

2

1

  • ))

1 2 )( 2 1 2 ( 2 cos(

n

k k i

n to i for k n i n X x π

slide-58
SLIDE 58

58

Scheduling of IMDCT

wi

x +

1

temp1

x +

temp1 sum reg3 reg3 wi sum reg4 reg4

  • utBuffer
  • utBuffer

2

for(i=0;i<18;i++){

  • ut[index[i][0]]=sum*w[block_type][i];
  • ut[index[i][1]]=sum*w[block_type][i+18];

} for(j=0;j<3;j++) for(i=0;i<6;i++){

  • ut[index[j*6+i][0]] += sum*win[2][i];
  • ut[index[j*6+i][1]] += sum*win[2][i+6];

}

slide-59
SLIDE 59

59

Architecture of IMDCT

FSin imdctin win const X temp1 reg1 sum reg3 reg4

  • utBuffer

preBuffer FSout

+ + +

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

slide-60
SLIDE 60

60

Implementation of Poly Phase Synthesis Filter Bank

The poly phase synthesis filter bank can be divided into three part, the sum and difference part, multiplying and accumulating(MA) part, and windowing part. Sum and difference part: Input: 32 input from imdct output

  • utput: 32 sum/difference of imdct output

Calculation flow of sum/difference consists of four states.

slide-61
SLIDE 61

61

Scheduling of Poly Phase Synthesis Filter Bank

3

reg1

+

FSout Tsamp

+

Tsamp reg2 Tsamp reg1 reg2

+

Tsamp

1 2 4

for(i=0;i<32;i++){ if(i<16) Tsamp[i]=fsout[ch][ss][i]+fsout[ch][ss][31-i]; else Tsamp[i]=fsout[ch][ss][i-16]-fsout[ch][ss][47-i]; } for(i=0;i<8;i++){ temp1=Tsamp[i]+Tsamp[15-i]; temp2=Tsamp[i]-Tsamp[15-i]; Tsamp[ i]=temp1; Tsamp[15-i]=temp2; }

slide-62
SLIDE 62

62

MA part:

Input: 32 Sum/difference from sum/difference calculation part Output: 512 elements FIFO Sum/difference results are multiplied by cosine coefficient filter and accumulated into a FIFO, which contains 1024 elements originally. From original algorithm, the depth of FIFO is 1024, but in fact we only need 512 data that interlace get from original 1024 data.

Windowing part:

Input: 512 elements FIFO Output: 32 Synthesis PCM results The FIFO data is multiplied by a window function dewin and accumulated to get the final result.

Implementation of Poly Phase Synthesis Filter Bank (cont.)

slide-63
SLIDE 63

63

Scheduling of Poly Phase Synthesis Filter Bank

N Tsamp temp1

x +

1

sum1 temp1 sum1 sum1 N

x +

temp1 sum1 sum1 Tsamp N

x +

temp1 sum1 sum1 Tsamp

for(i=0, sum=0.0; i<16; i++){ if(i%2) for(j=0;j<16;j++) sum+=Tsamp[j+16]*N[i*12-4+j]; else if(i%4) for(j=0;j<8;j++) sum+=Tsamp[j+8]*N[i*12+j]; else for(j=0;j<8;j++) sum+=Tsamp[j]*N[i*12+j]; vbuff[ch][i+off[ch]]=sum; } for(i=16, sum=0.0; i<32; i++){ if((i%2)==0) for(j=0;j<16;j++) sum+=Tsamp[j+16]*N[i*12+j]; else if((i%4)==1) for(j=0;j<8;j++) sum+=Tsamp[j+8]*N[i*12+4+j]; else for(j=0;j<8;j++) sum+=Tsamp[j]*N[i*12+4+j]; vbuff[ch][i+off[ch]]=sum; }

slide-64
SLIDE 64

64

Architecture of Poly Phase Synthesis Filter Bank

FSout

X

+

±

Tsamp reg1

±

reg2 filter

X

temp1

+

sum1 vBuffer dewin temp2 sum2

  • utBuffer

0 1 0 1 0 1 0 1 0 1 0 1 0 1

slide-65
SLIDE 65

65

Outline

MPEG1 Audio Layer III Encoding MP3 Encoder Design MPEG1 Audio Layer III Decoding MP3 Decoder Design Synthesis Result and Verification

slide-66
SLIDE 66

66

IO Diagram

Because we use the same PVCI as the interface protocol, the encoder and decoder IP IO pin diagram is similar to each other. IP pins of encoder and decoder can be divided into two parts, system VCI and external memory VCI.

clk rst system_VAL system_ACK exmem_ACK

exmem input

exmem_VAL exmem_addr[12:0] exmem_Rdata[31:0] exmem_Wdata[31:0] exmem_RD exmem_BE[3:0]

system input system

  • utput

exmem

  • utput

MP3 encoder /decoder IP

system_Wdata[31:0] system_addr[12:0] system_RD system_BE[3:0] system_Rdata[31:0]

slide-67
SLIDE 67

67

Timing Simulation of Encoder

slide-68
SLIDE 68

68

Timing Simulation of Encoder (cont.)

The encoding for a granule can be done in 68653 clocks.

slide-69
SLIDE 69

69

Timing Simulation of Decoder

slide-70
SLIDE 70

70

Timing Simulation of Decoder (cont.)

The decoding for a granule can be down in 47530 clocks.

slide-71
SLIDE 71

71

Resource Using of Encoder

24.334ns Maximum net delay 22.908MHz Maximum frequency 7488 Additional JTAG gate count for IOBs 609415 Total equivalent gate count 25% 4 1 Number of GCLKIOBs 25% 4 1 Number of GCLKs 26% 160 42 Number of Block RAMs 30% 512 155 Number of bonded IOBs 30 Number used as a route-thru 7530 Number used as LUTs 19% 38400 7560 Total Number 4 input LUTs 2% 38400 853 Total Number Slice Registers 21% 19200 4181 Number of Slices %USED available USED Resource

slide-72
SLIDE 72

72

Resource Using of Decoder

19.159ns Maximum net delay 22.332MHz Maximum frequency 5040 Additional JTAG gate count for IOBs 583704 Total equivalent gate count 50% 4 2 Number of GCLKIOBs 50% 4 2 Number of GCLKs 100% 32 32 Number of Block RAMs 25% 404 103 Number of bonded IOBs 37 Number used as a route-thru 6686 Number used as LUTs 27% 24576 6723 Total Number 4 input LUTs 7% 24576 1922 Number of Slice Flip Flops 34% 12288 4231 Number of Slices %USED available USED Resource

slide-73
SLIDE 73

73

FPGA Layout

FPGA layout of encoder (Target: VirtexE-2000 FG680) FPGA layout of decoder (Target: Virtex-1000 BG560)

slide-74
SLIDE 74

74

Real Verification Platform