July 6, 2012 Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema - - PowerPoint PPT Presentation

july 6 2012
SMART_READER_LITE
LIVE PREVIEW

July 6, 2012 Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema - - PowerPoint PPT Presentation

Applause Identification and its relevance to Archival of Carnatic Music Padi Sarala 1 Vignesh Ishwar 1 Ashwin Bellur 1 Hema A.Murthy 1 1 Computer Science Dept, IIT Madras, India. July 6, 2012 Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema


slide-1
SLIDE 1

Applause Identification and its relevance to Archival of Carnatic Music

Padi Sarala 1 Vignesh Ishwar 1 Ashwin Bellur 1 Hema A.Murthy 1 1Computer Science Dept, IIT Madras, India.

July 6, 2012

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-2
SLIDE 2

Outline of the presentation Introduction to Carnatic music concert Problem definition Feature Extraction Spectral flux Spectral Entropy Characterising the applause using Cumulative sum Highlights detection using CUSUM Results

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-3
SLIDE 3

Carnatic music concert (1) Carnatic music concert can be 2 to 3 hours long Concert consists of various pieces. Concert consists of compositions, interlaced with improvisational aspects like Raga Alapana, Nereval, Kalpanaswara, Thanam, Sloka, Thani Avarthanam.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-4
SLIDE 4

Carnatic music concert (2) In a Concert audience applauds the artist after end of piece. Some times audience applauds the artist in-between improvisational aspects like Raga vocal, Raga violin, After song, Kalpana swara, Thanam, Thani Avarthanam. Most of the carnatic music recordings which are archived today are Manually segmented into pieces. Entire recordings are stored as a single recording.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-5
SLIDE 5

Applications of Applause Identification Existing work on Applause identification Manoj et al (2011) , discusses how applause is detected in a continuous speech meetings and how it can be used as a key indicator of highlights in speech meeting. Lie Lu et al (2001) , discusses techniques for audio classification and segmenting the audio signal into speech, music, silences, environmental sounds like applause, laughter etc and these segments can be used as an index for audio retrieval.

  • Z. Xiong et al (2003), discusses how applause is detected for

determining the highlights of the game.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-6
SLIDE 6

Problem Definition Identifying the applauses in a given carnatic music concert using spectral domain features. Concert can be automatically segmented into individual pieces for archival purpose. Finding duration and strength of an applause using CUSUM technique. We can determine the highlights of the concert.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-7
SLIDE 7

Characteristics of Applause and Music

  • 10000.0
  • 6000.0
  • 2000.0
2000.0 6000.0 10000.0 0.2 0.4 0.6 0.8 1 Amplitude
  • 10000.0
  • 6000.0
  • 2000.0
2000.0 6000.0 10000.0 0.2 0.4 0.6 0.8 1
  • 10000.0
  • 6000.0
  • 2000.0
2000.0 6000.0 10000.0 0.2 0.4 0.6 0.8 1 Amplitude Time in seconds
  • 10000.0
  • 6000.0
  • 2000.0
2000.0 6000.0 10000.0 0.2 0.4 0.6 0.8 1 Time in seconds
  • 30000.0
  • 20000.0
  • 10000.0
0.0 10000.0 20000.0 30000.0 0.2 0.4 0.6 0.8 1

Amplitude

  • 30000.0
  • 20000.0
  • 10000.0
0.0 10000.0 20000.0 30000.0 0.2 0.4 0.6 0.8 1
  • 30000.0
  • 20000.0
  • 10000.0
0.0 10000.0 20000.0 30000.0 0.2 0.4 0.6 0.8 1

Amplitude Time in seconds

  • 30000.0
  • 20000.0
  • 10000.0
0.0 10000.0 20000.0 30000.0 0.2 0.4 0.6 0.8 1

Time in seconds

Figure: Typical sequence of applause and music segments(time domain) In time domain applause segment is rhythmic not structured but corresponding to music it is more structured.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-8
SLIDE 8

Characteristics of Applause and Music

0.0 20.0 40.0 60.0 80.0 2000 4000 6000 8000 Log Magnitude (dB) 0.0 20.0 40.0 60.0 80.0 2000 4000 6000 8000 0.0 20.0 40.0 60.0 80.0 2000 4000 6000 8000 Log Magnitude (dB) Frequency in Hz 0.0 20.0 40.0 60.0 80.0 2000 4000 6000 8000 Frequency in Hz 0.0 20.0 40.0 60.0 80.0 2000 4000 6000 8000 Log Magnitude (dB) 0.0 20.0 40.0 60.0 80.0 2000 4000 6000 8000 0.0 20.0 40.0 60.0 80.0 2000 4000 6000 8000 Log Magnitude (dB) Frequency in Hz 0.0 20.0 40.0 60.0 80.0 2000 4000 6000 8000 Frequency in Hz

Figure: Typical sequence of applause and music segments(spectral domain) Power spectrum of applause is flat whereas spectrum of music is structured.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-9
SLIDE 9

Feature Extraction Selecting a good feature for classification or segmentation is crucial task. Most of the audio signals spectral properties change slowly with respect to time. To discriminate between music and applause the following features are used. Spectral flux Spectral entropy

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-10
SLIDE 10

Spectral flux (1) Spectral flux (SF), also called spectral variation, characterises the change in spectra between adjacent two frames of speech signal. It measures how quickly the power spectrum changes.

SF[n] =

  • ω

(| Xn(ω) | − | Xn+1(ω) |)2dω (1)

where Xn(w) is the magnitude spectrum of nth frame of an audio signal.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-11
SLIDE 11

Spectra flux (2)

Different Normalisations of Spectral flux are:

1

Spectral flux with no normalisation.

2

Power spectral density normalisation: In this approach XNormn(ω) is defined: XNormn(ω) = Xn(ω)

  • ω Xn(ω)dω

(2)

3

Peak normalisation: In this approach XNormn(ω) is defined as: XNormn(ω) = Xn(ω) maxω(Xn(ω)) (3)

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-12
SLIDE 12

Spectral flux (3)

100 200 300 400 500 600 700 800 0.5 1 1.5 2 2.5 x 10 9 Time in Seconds Spectral flux of unnormalised spectra Music Segment Applause Segment 100 200 300 400 500 600 700 800 1 2 x 10 −4 Time in Seconds Spectral flux of Power Spectral Density Normalisation Music Segment Appaluse Segment 100 200 300 400 500 600 700 800 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 Time in Seconds Spectral flux of Peak Normalised Spectra Applause Segment Music Segment

Figure: Different Normalisations of Spectral flux

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-13
SLIDE 13

Spectral Entropy (1) Spectral Entropy (SE) is the measure of randomness of a system. Shannons entropy of a discrete stochastic variable X with probability mass function is given by H(X) = −

N

  • i=1

p(xi)log2 [p(xi)] (4) PSDn(ω) = | Xn(ω) |2

  • ω | Xn(ω) |2dω

SE[n] = −

  • ω

PSDn(ω) log PSDn(ω)dω (5)

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-14
SLIDE 14

Spectral Entropy (2)

100 200 300 400 500 600 700 800 0.5 1 1.5 2 2.5 3 3.5 4

Time in Seconds Spectral Entropy Applause Segment Music Segment

Figure: Spectral entropy of music signal

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-15
SLIDE 15

Database Used 19 Concerts of male and female singers are taken for experiments. All concerts are Vocal, in that lead musician is a singer. Each concert has 15-20 applauses resulting a total of 343 applauses.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-16
SLIDE 16

Experimental analysis

For 19 concerts Spectral flux and Spectral entropy features are extracted for a frame of 0.25 s duration with a overlap of 0.01 s with a sampling frequency of 44.1KHz. Extracted features are smoothed by a rectangular moving average filter of length 15. For all concerts applause locations and type of applauses are marked manually by a musician. Based on the ground truth DET curve and Equal Error Rates (EER) are calculated for all above extracted features.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-17
SLIDE 17

Experimental Analysis DET Curve is plotted for Applause detection for various thresholds. The Equal error rates(EER) are given in Table.

1 2 5 10 20 40 60 80 1 2 5 10 20 40 60 80

Applause Detection Performance False Alarm probability (in %) Miss probability (in %)

Entropy fluxnonorm fluxnorm EER values

Figure: DET Curve for appaluse detection Method EER Spectral Flux (no norm) 44.55 % Spectral Flux 23.33% Spectral Entropy 17.33% Table: EER for applause detection

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-18
SLIDE 18

Introduction to cumulative sum(CUSUM) method

In case of spectral flux and spectral entropy applause locations are identified based on threshold. It may not be sufficient to determine the duration and strength of an applause. So CUSUM is a non-parametric approach and it can be used to identify the statistical inhomogeneity of a given signal. CUSUM is estimated as Let X[n] be the value of feature extracted at time n,

Y[n] = X[n] − a Cusum[n] = Cusum[n − 1] + Y[n], Y[n] > 0 Otherwise If Cusum[n] > Θ, then it suggests that there is a significant structural shift in the series. The values of ‘a’ and ‘Θ’ have to be estimated empirically and may vary across different data sets.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-19
SLIDE 19

Characterising the applauses using CUSUM (1)

100 200 300 400 500 600 700 800 0.2 0.4 0.6 0.8 1

Spectral Flux

100 200 300 400 500 600 700 800 0.2 0.4 0.6 0.8 1

Spectral Flux

100 200 300 400 500 600 700 800 0.2 0.4 0.6 0.8 1

Time in seconds Spectral Entropy

(a) Spectral flux of unnormalised spectra (b) Spectral flux of peak normalised spectra (c) Spectral Entropy 100 200 300 400 500 600 700 800 2 4 6 8 x 10 10 Spectral flux 100 200 300 400 500 600 700 800 0.5 1 1.5 2 Spectral flux 100 200 300 400 500 600 700 800 50 100 150 200 Time in Seconds Spectral Entropy (b) Cusum of Spectral flux of peak normalised spectra (a) Cusum of Spectral flux of unnormalised spectra (a) Cusum of Spectral entropy

Figure: spectral flux and spectral entropy and their Cusum values The CUSUM was computed for both spectral flux and spectral entropy. Start of triangle and end of triangle indicates the location and duration of an applause.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-20
SLIDE 20

Characterising the applauses using CUSUM (1)

1 2 3 4 5 6 x 10 5 1 2 3 4 5 6 7 x 10 −3 Spectral flux peak normalisation for Abhishek−Meyundi Concert Time in Seconds Spectral flux Peak normalisation 1 2 3 4 5 6 x 10 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Spectral Entropy Values for Abhishek−Meyundi Concert Time in Seconds Spectral Entropy 1 2 3 4 5 6 x 10 5 0.5 1 1.5 2 2.5 3 3.5 Cusum of Spectral flux Peak normalisation for Abhishek−Meyundi Concert Time in Seconds Cusum of Spectral flux Peak Normalisation 1 2 3 4 5 6 x 10 5 100 200 300 400 500 600 Cusum of Spectral Entropy for Abhishek−Meyundi Concert Time in Seconds Cusum of Spectral Entropy values

Figure: Detecting the Applause locations based on Threshold and Cusum values

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-21
SLIDE 21

CUSUM values for whole concert

1 2 3 4 5 6 x 10 5 100 200 300 400 500 600 Applause Detection for Abhishek−Meyundi Concert Time in Seconds Cusum of Spectral Entropy 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 200 400 600 800 1000 1200 Applause Detection for Abhishek−Meyundi Concert Time in Seconds Cusum Triangles for Spectral Entropy

Figure: CUSUM values for whole concert Figure consists of a sequence of CUSUM triangles (for a carefully chosen value of a) for the entire piece. We can see around 22 triangles in which 20 triangles are applauses.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-22
SLIDE 22

Highlights of carnatic music concert using CUSUM values CUSUM values of spectral flux and spectral entropy determines the duration and strength of an applause. Based on duration and strength of an applause we have taken top 3 highlights for all 19 concerts. Table shows the highlights of all 19 concerts using above features.

SNO Highlights 1 Taniavrthanam 2 Raga Alapana of main song 3 Within Taniavarthanam 4 Tanam 5 Swaram of main song 6 Raga alapana vocal 7 Raga alapana violin 8 Alapana of RTP 9 Nereval 10 Main song 11 Varnam 12 Pallavi

Table: Highlights of concerts using CUSUM

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-23
SLIDE 23

Conclusion Most of the carnatic music recordings which are archived today are Manually segmented into pieces. Entire recordings are stored as single recording. Because of locating the applauses in a concert We can automatically segment the concert into pieces for archival purpose. Duration and strength of an applause are used to determine the highlights of the concert.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-24
SLIDE 24

References

1. Manoj C, Magesh S, Sankaran M S, and Manikandan M S.“ A novel approach for detecting applause in continuous meeting”. In IEEE International Conference on Electronics and Computer Technology, pages 182–186, India, April 2011. 2. Lie Lu, Hao Jiang, and HongJiang Zhang. “A robust audio classification and segmentation method”. In International ACM Multimedia Conference, pages 203–211, Canada, September 2001. 3. M J. Carey, E. S. Parris, and H. L Loyd-Thomas, “ A comparison of features for speech and music discrimination”, in proceedings of IEEE Int. Conf, Acoust., Speech, and Signal Processing, vol. 1, march 1999, pp. 149-152. 4.

  • J. O. Roman Jarina, “ A discriminative feature selection for applause sounds

detection”, in Proc. 8th Int. Workshop on Image Analysis for Multimedia interactive Service, 2007. 5.

  • A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “ The det curve

in assessment of detection task performance”, 1997, pp.1895-1898. 6.

  • Z. Xiong, R. Radhakrishnan, A. Divakaran, and T. S.Huang, “Audio events detection

based highlights extraction from baseball, golf and soccer games in a unified framework,” in Golf and Soccer Games in A Unified Framework, ICASSP 2003, pp. 401404.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-25
SLIDE 25

References

7. B E Brodsky and B S Darkhovsky. “Non-parametric Methods in change-point problems”. Kluwer Academic Publishers, New York, 1993. 8. T M Krishna. “Kalpita sangita, Kalpana sangita and Manodharma”. Private Communication, 2007-2011. 9. H Liu and M S Kim. “Real-time detection of stealthy ddos attacks using time-series decomposition”. In ICC, pages 1 – 6, Bangalore, India, July 2010. 10. Lawrence R Rabiner and Ronald W Schafer. “Theory and applications of digital speech processing”. Pearson International, Upper Saddle River, New Jersey, 2011. 11. H Wang, D Zhang, and K Shin. “Syn-dog: Sniffing syn flooding sources”. In ICDCS, pages 421 – 428, Bangalore, India, July 2002. 12. Tong Zhang. “Automatic singer identification”. In Multimedia and Expo, 2003. ICME ’03. Proceedings. 2003 International Conference on, volume 1, pages I – 33–6 vol.1, july 2003. 13. Jouni Paulus.”Improving markov model based music piece structure labelling with acoustic information”. In International Society for Music Information Retrieval Conference, pages 303–308, August 2010.

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop

slide-26
SLIDE 26

THANK YOU

Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.Murthy 2nd CompMusic Workshop