Motivation Analysis-and-manipulation Active music listening - - PowerPoint PPT Presentation

motivation analysis and manipulation
SMART_READER_LITE
LIVE PREVIEW

Motivation Analysis-and-manipulation Active music listening - - PowerPoint PPT Presentation

Motivation Analysis-and-manipulation Active music listening Conventional music listening Selecting from users requirement Selecting from limited playlist approach to pitch and duration of Changing music to suit users feeling


slide-1
SLIDE 1

1

Digital Audio Effects DAFx-2008

Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

Takehiro Abe † Katsutoshi Itoyama † Kazuyoshi Yoshii ‡ Kazunori Komatani † Tetsuya Ogata † Hiroshi G. Okuno †

†Department of Intelligence Science and Technology, Kyoto University, Japan ‡National Institute of Advanced Industrial Science and Technology (AIST)

Demonstration: http://winnie.kuis.kyoto-u.ac.jp/~abe/DAFx-08/

Digital Audio Effects DAFx-2008

Motivation

Selecting from users’ requirement Changing music to suit users’ feeling Selecting from limited playlist Only listening after pressing “play” Timbre Volume (All) (Drums only) Instruments Our equalizer Itoyama’s EQ. [Itoyama 08] Drumix [Yoshii 07]

Conventional music listening Active music listening

Replacing arbitrary part with users favorite timbre

Passive and limited listening experience Active and exploratory listening

Instrument equalizers have been developed

user can change

Digital Audio Effects DAFx-2008

Demonstration (Trial equalizer)

midi sound synthesized piano sound Jazz sound (synthesis)

Equalizer’s sounds are synthesized from real sounds except midi sounds

Content

genre buttons part buttons

Digital Audio Effects DAFx-2008

Requirements for our equalizer

Our research target Well studied The application of separated sounds is not well studied 2. Sound manipulation from separated sounds without timbral distortion to play arbitrary phrases 1. Sound separation from polyphonic audio to extract a musical instrument sound that users want to replace Difference from the sound excited by real instrument Synthesizing monotones excited by the same instrument from multiple musical instrument sounds Objective

slide-2
SLIDE 2

2

Digital Audio Effects DAFx-2008

Our definition of timbral features

We use the tonal model that can analyze these features [Itoyama 08]

ASA’s definition

The quality of a sound that distinguishes it from others of the same pitch and volume [ASA 60]

Our definition

The quality of a sound that consists of three features except pitch and volume 1.The relative amplitudes

  • f harmonic peaks

2.The inharmonic component 3.Temporal envelopes Concrete definition based on [Grey 77]

Digital Audio Effects DAFx-2008

Manipulation of pitch and duration

We use pitch-dependency feature function for the dependency It is not proper to achieve manipulation without changing the timbral features We preserve attack, decay segments and vibrato feature Timbre has pitch dependency [Marozeau 03] Attack, decay and vibrato feature are similar in the same instrument

Seed, (440Hz) Our method(880Hz) Phase vocoder (880Hz) Ref., (880Hz) Seed, ref., (length1) Our method(length4) Sinusoidal model(length4)

Distort high frequency Distort vibrato feature Distort attack segment

Digital Audio Effects DAFx-2008

Overview of our manipulation method

Separate harmonic and inharmonic structures and extract timbral features Synthesize harmonic and inharmonic signals and add them Step1: Analysis Step3: Synthesis Inharmonic structure Harmonic structure

Time Frequency Amplitude

Manipulate pitch, duration, and energy

  • f the inharmonic structure

Step2: Manipulation

Digital Audio Effects DAFx-2008

Analysis to obtain three features

Frequency Time Frequency Amplitude

Tonal model Harmonic model Inharmonic model Temporal structure Spectral structure

Pitch Power of harmonics Duration Envelope

Time Frequency Amplitude Amplitude

represents spectrogram

  • f inharmonic component

is expressed as the Gaussian Mixture Model is expressed as the nonparametric model Feature1 Feature3 Feature2

slide-3
SLIDE 3

3

Digital Audio Effects DAFx-2008 220 440 880 0.00 0.50 1.00 1.50 2.00 Fundamental Frequency [Hz] wH / wI

×106

Pitch manipulation

  • Pitch-dependent feature function

– approximates timbral features over pitches by polynomial function

  • power of harmonics ( )
  • the ratio of harmonic energy to inharmonic energy ( )
  • Manipulating the spectral envelope

– by multiplying the pitch trajectory ( ) by a desired ratio – Obtain timbral features from pitch-dependent feature function ) ( ' r µ ) ( ' r µ

n

v'

I H w

w /

n

v

Frequency

n

v

) (r µ ) (r µ

Amplitude ) (r µ

Pitch trajectory Power of harmonics Power of harmonics

220 440 880 −0.05 0.00 0.05 0.10 Fundamental Frequency [Hz] v of 4th 220 440 880 0.00 0.20 0.40 0.60 0.80 1.00 Fundamental Frequency [Hz] v of 1st

pitch trajectory [Hz] pitch trajectory [Hz] pitch trajectory [Hz] Power of 1 th harmonics Power of 4 th harmonics The ratio of harmonic en. to inharmonic en.

Digital Audio Effects DAFx-2008

Duration manipulation

Th r E dr r dE > < ) ( , ) ( ε

  • Manipulating the temporal envelope ( )

– by expanding or shrinking between onset ( ) and offset ( )

  • n

r

  • ff

r

Time Preserve ) (r E ) (r E

  • Preserving the vibrato

– Pitch trajectory ( ) is analyzed and synthesized by sinusoidal model

Time ) (r µ ) (r µ Smoothing Preserve Expand Preserve Preserve Synthesize Analyze

  • ff

r

  • n

r

detection equation:

Detect Detect Amplitude Frequency ) (r µ

) (r E

Synthesized Pitch trajectory

Temporal envelope

Original Pitch trajectory

Digital Audio Effects DAFx-2008

Synthesis from harmonics and inharmonics ∑

=

n n n H

t j t A t s )] ( exp[ ) ( ) ( φ

) ( ' ) ( t E v w t A

n n H n

=

+ =

t n n

d n t ) ( ' ) ( ) ( τ τ µ φ φ

) (t sH ) (t sI ) (t s ) (t sH ) (t sI ) (t s

Instance amplitude: Instance phase: Harmonic signal:

H

w

Harmonic energy:

  • Harmonic signal ( )

– using sinusoidal model

  • Inharmonic signal ( )

– from inharmonic model weighted by inharmonic energy ( )

  • Output signal ( )

– obtained by adding these two signals

Equations for harmonic signal

Power of harmonics: Temporal envelope:

※” ‘ ” parameter is a manipulated parameter.

Pitch trajectory:

'

n

v ) (t En ) ( ' τ µ '

I

w

Digital Audio Effects DAFx-2008

Evaluation in pitch manipulation

  • Baseline method

– Our method without pitch-dependent feature function

  • Criteria

– Spectral distance: evaluation of harmonic component difference – Mel-Frequency Cepstrum Coefficient (MFCC) distance:

  • quantitative auditory measurement
  • evaluation of harmonic and inharmonic components differences
  • Conditions

– 32 instruments from RWC-MDB (forte, normal articulation)

  • 3 individuals for each instrument

– 10-fold cross validation (10%:90% = [evaluation data]:[learning data])

− =

t f syn real

T r f C r f C D

, 2 /

)) , ( ) , ( (

Synthesis sound Real sound Frames i

C

Spectrum or MFCC

= Sophisticated sinusoidal model

slide-4
SLIDE 4

4

Digital Audio Effects DAFx-2008 2 4 6 8 1 1 2 1 4

P F E P H C V I M B O R A C H M H C U K A G M D E G E B V N V L V C C B H P T R T B T U S S A S T S B S O B F G C L P C F L R C A V E .

O u r m e t h

  • d

B a s e l i n e m e t h

  • d

5 1 1 5 2 2 5 3 P F E P H C V I M B O R A C H M H C U K A G M D E G E B V N V L V C C B H P T R T B T U S S A S T S B S O B F G C L P C F L R C A V E .

O u r m e t h

  • d

B a s e l i n e m e t h

  • d

Quality in pitch manipulation

Spectral distance MFCC distance

64.70% reduced 32.31% reduced

Fagot for discussion musical instruments Average

There was good improvement for the whole musical instruments

Digital Audio Effects DAFx-2008

−40 −20 20 40 5 10 15

Manipulated halftones Spectrum difference

Discussion on good improvement

The result demonstrated the validity of our method, which considering pitch dependency of timbre

MFCC distance red: baseline method blue:

  • ur method

Manipulated semitone

  • The result of the fagot

Baseline Distances increased with the absolute manipulated semitones

Distances were stable

Ours

high pitch low pitch dissimilar similar

Digital Audio Effects DAFx-2008

Discussion on poor improvement

Only is insufficient for pitch-dependency of inharmonic component

−40 −20 20 40 5 10 15 20 25 30 35

Manipulated halftones MFCC difference

−40 −20 20 40 0.2 0.4 0.6 0.8 1 1.2 1.4

Manipulated halftones Spectrum difference

I H w

w /

Spectral distance

  • The result of the mandolin

Manipulated semitone MFCC distance

The distribution of the inharmonic component of a synthesized sound differs from that of a real sound. Spectral distance The relative amplitudes of harmonic peaks of a synthesized sound are similar to those of a real sound MFCC distance

dissimilar similar

red: baseline method blue:

  • ur method

high pitch low pitch dissimilar similar

There was poor improvement for instrument sounds that have a lot

  • f inharmonic component in attack segment…

Digital Audio Effects DAFx-2008

Conclusion

  • Objective

– Manipulating pitch and duration of a musical instrument sound using multiple instrument sounds without distorting timbral characteristics

  • Approach

– We defined and analyzed timbral features. – In pitch manipulation, we use pitch-dependency of timbre as a pitch-dependent feature function – In duration manipulation, we preserve attack, decay and the vibrato

  • Future work

– Incorporating other dependencies (e.g., volume) – Evaluating our method for duration manipulation – Applying our method to musical instrument parts separated from the polyphonic audio signals of commercial CD recordings

slide-5
SLIDE 5

5

Digital Audio Effects DAFx-2008

Pitch manipulation demo. for piano

seed, (440Hz) ref., (880Hz) Real sounds Synthesized sounds (880Hz)

※1 ※2

Our method Sinusoidal model Phase vocoder STRAIGHT

※1 from MARSYAS ※2 do not use sound Ref. as learning data

Digital Audio Effects DAFx-2008

Duration manipulation demo. for violin

Our method Sinusoidal model seed, ref., (length1) Real sound Synthesized sounds(length4)

Digital Audio Effects DAFx-2008

Pitch manipulation demo. for trumpet

seed, (440Hz) Our method Sinusoidal model Phase vocoder ref., (880Hz) STRAIGHT Real sounds Synthesized sounds (880Hz)

※1 from MARSYAS ※1 ※2 do not use sound Ref. as learning data ※2