Musical Instrument Classification Using Spiking Neural Networks - - PowerPoint PPT Presentation

musical instrument classification using spiking neural
SMART_READER_LITE
LIVE PREVIEW

Musical Instrument Classification Using Spiking Neural Networks - - PowerPoint PPT Presentation

Musical Instrument Classification Using Spiking Neural Networks Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi IIT Bombay November 6, 2015 Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi Short title (UCLA)


slide-1
SLIDE 1

Musical Instrument Classification Using Spiking Neural Networks

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi

IIT Bombay

November 6, 2015

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 1 / 26

slide-2
SLIDE 2

Overview

1

Introduction

2

Biological Bases Of Musical Perception Timbre The Human Ear

3

Proposed Model Modelling the input The Neural Network Weight Training Rule

4

Observations

5

Conclusions

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 2 / 26

slide-3
SLIDE 3

Introduction

The human ear is a small physical device with disproportionately large and interesting properties We want to tackle a small problem - Musical Instrument Classification Our ears and brain are very adept at solving this, but conventionally it requires complex signal processing and algorithms We have proposed a simple model for this classification problem and implemented it using spiking neural networks

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 3 / 26

slide-4
SLIDE 4

Timbre

The ANSI definition of timbre describes it as that attribute which allows us to distinguish between sounds having the same perceptual duration, loudness, and pitch, such as two different musical instruments It can be understood naively as the information containing relative amplitudes of harmonics present in a signal We used Timbre as the basis for classifying different instruments (similar to the ear).

Figure: The timbre of the Guitar remains the same even if it is played in a different way

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 4 / 26

slide-5
SLIDE 5

The Human Ear

Cochlea

The cochlea, or inner ear, constitutes the hydrodynamic part of the ear

Basilar Membrane

The basilar membrane is a flexible gelatinous membrane that divides the cochlea longitudinally, and it contains about 25,000 nerve endings attached to numerous haircells arranged on the surface of the membrane

How it works?

Patches of hair cells are spatially located according to the propagation of frequency along the Cochlear fluid and thus respond to different frequency ranges, essentially converting the signal’s information into frequency domain.

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 5 / 26

slide-6
SLIDE 6

Proposed Model

The input model that we have used is inspired by the biological structure

  • f the inner ear while the neural network takes the inspiration from a

similar network developed for composer classification [4].

Figure: The final implementation of the network that has been proposed

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 6 / 26

slide-7
SLIDE 7

Modelling the input

We perform STFT on the input signal Each neuron in the first layer excites for a certain range of frequencies For the LIF network, we use rate encoding (number of spikes is proportional to the amplitude of the harmonic). For the AEF network, we use temporal encoding (time between spikes is proportional to the amplitude of the harmonic).

Figure: Simple network for classifying triangular and square waves (Proof of Concept)

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 7 / 26

slide-8
SLIDE 8

The Neural Network

We use the Leaky-Integrate-and-Fire (LIF) model of a neurons in our network.

Figure: A basic block diagram to describe the network

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 8 / 26

slide-9
SLIDE 9

Weight Training Rule

Avoid excessive charge fed to the network in cases of high amplitudes at excitation frequencies.

Figure: This rule helps the network to handle very frequent spiking patterns that may be due to aliasing or louder volume of the sound

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 9 / 26

slide-10
SLIDE 10

Demonstration

Problem of classifying a square wave from triangular wave of same frequency (different timbre).

Figure: Difference in Timbre for a square and triangular wave of same tone

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 10 / 26

slide-11
SLIDE 11

Observations

Figure: Response of input layer neurons for a triangular wave (we see harmonics decay as square of the harmonic number

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 11 / 26

slide-12
SLIDE 12

Observations

Figure: Response of second layer neurons for a triangular wave (we see difference

  • f amplitudes of harmonics) the output neuron (last in figure) does not spike

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 12 / 26

slide-13
SLIDE 13

Observations

Figure: Response of input layer neurons for a square wave (we see that harmonics decay linearly with the harmonic number)

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 13 / 26

slide-14
SLIDE 14

Observations

Figure: Response of second layer neurons for a square wave (we see difference of amplitudes of harmonics) the output neuron (last in figure) spikes for the square wave thus achieving classification

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 14 / 26

slide-15
SLIDE 15

Observations

Figure: Output of the neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next

  • ctave(II); At t=80, guitar note in next octave(III); At t=120, again the same

note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 15 / 26

slide-16
SLIDE 16

Observations

Figure: Output of the neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next

  • ctave(II); At t=80, guitar note in next octave(III); At t=120, again the same

note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 16 / 26

slide-17
SLIDE 17

Observations

Figure: Output of the neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next

  • ctave(II); At t=80, guitar note in next octave(III); At t=120, again the same

note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 17 / 26

slide-18
SLIDE 18

Observations: Approach 2

Figure: Output of the AEF neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next

  • ctave(II); At t=80, guitar note in next octave(III); At t=120, again the same

note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument; At t=197 and t=201, note in octave I and octave II

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 18 / 26

slide-19
SLIDE 19

Observations: Approach 2

Figure: Output of the AEF neurons of the auditory input layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next

  • ctave(II); At t=80, guitar note in next octave(III); At t=120, again the same

note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument; At t=197 and t=201, note in octave I and octave II

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 19 / 26

slide-20
SLIDE 20

Observations: Approach 2

Figure: Output of the AEF neurons’ network for the second layer. The stimuli are as follows: At t=0, guitar note in octave I; At t=40. guitar note in the next

  • ctave(II); At t=80, guitar note in next octave(III); At t=120, again the same

note is played on octave I, and finally at t=160, the same note in octave I in a wind instrument; At t=197 and t=201, note in octave I and octave II

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 20 / 26

slide-21
SLIDE 21

Tritonia Central Pattern Generator

Figure: Tritonia inspired rhythmic Pattern Generator

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 21 / 26

slide-22
SLIDE 22

Observations

Figure: Output of the Tritonia inspired three neuron pattern generator Figure: Output of the Tritonia inspired three neuron pattern generator (low frequency

  • scillations)

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 22 / 26

slide-23
SLIDE 23

Conclusions

A simple neural network based instrument classification is implemented Both note and the relevant musical instrument can be identified The approach can be extended to simultaneous identification and extraction of instrument sounds from a music file Possible drawbacks Multiple instruments and notes cannot be detected if played at the same time

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 23 / 26

slide-24
SLIDE 24

References

  • K. Patil, D. Pressnitzer, S. Shamma, M. Elhilali, ”Music in Our Ears: The

Biological Bases of Musical Timbre Perception” in PLOS Computational Biology, Volume 8, Issue 11, Nov. 2012

  • R. Brette, ”Computing with Neural Synchrony” in PLOS Computational Biology,
  • Jun. 2012, DOI: 10.1371/journal.pcbi.1002561

Engineering Acoustics/The Human Ear and Sound Perception, in Wikibooks

  • C. Prasad N., K. Saboo, B. Rajendran, Composer Classification based on Temporal

Coding in Adaptive Spiking Neural Networks, IJCNN, 2015

  • P. Donnelly, Bayesian Approaches To Musical Instrument Classification Using

Timbre Segmentation (Ph.D. Thesis), May, 2012 www.freesound.org, F. Font, G. Roma, and X. Serra, ”Freesound technical demo”, Proceedings of the 21st ACM international conference on Multimedia, ACM, 2013 Neural Coding, from Wikipedia

  • X. Zhang and Z. W. Ras, Analysis of Sound Features for Music Timbre Recognition
  • T. Voegtlin, Temporal Coding using the Response Properties of Spiking Neurons

American National Standards Institute (1973), Psycho-acoustic terminology S3:20,

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 24 / 26

slide-25
SLIDE 25

Thank You

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 25 / 26

slide-26
SLIDE 26

Other Possible approaches

Jainesh Doshi, Vishrant Tripathi, Onkar Desai, Shreyas Mangalgi (UCLA) Short title November 6, 2015 26 / 26