[PPT] - Sparse Time-Frequency Transforms and Applications. Bruno Torr PowerPoint Presentation

SLIDE 1

Sparse Time-Frequency Transforms and Applications.

Bruno Torr´ esani http://www.cmi.univ-mrs.fr/~torresan

LATP, Universit´ e de Provence, Marseille

DAFx, Montreal, September 2006

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 1 / 41

SLIDE 2

1 Introduction 2 Signal waveform representations

Bases Frames Multiple frames More realistic time-frequency atoms ?

3 Coefficient domain models

Hybrid random waveform models Estimation algorithms based on observed coefficients Estimation algorithms based on synthesis coefficients

4 Conclusion 5 References

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 2 / 41

SLIDE 3

Introduction

During the last twenty years (and much more than that in fact): harmonic analysis has provided many new techniques for expanding signals into “elementary” waveforms. Redundant Gabor wavelet systems (frames) Wavelet bases MDCT and wilson bases Matching pursuit and cognates ... Most often, sparsity of the representation was a key issue.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 3 / 41

SLIDE 4

Introduction

During the last twenty years (and much more than that in fact): harmonic analysis has provided many new techniques for expanding signals into “elementary” waveforms. Redundant Gabor wavelet systems (frames) Wavelet bases MDCT and wilson bases Matching pursuit and cognates ... Most often, sparsity of the representation was a key issue. In this talk: we review a number of such approaches, in view of a few selected applications.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 3 / 41

SLIDE 5

Introduction

Introduction: What is sparsity ?

A signal representation is sparse when most information is concentrated in a small amount of data (coefficients). For example, a sine wave is sparsely represented in the Fourier domain, not in the time domain. Sparsity is an “vague” concept. Ideally, the volume of data (number of coefficients for example) would be a good sparsity measure.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 4 / 41

SLIDE 6

Introduction

Introduction: What is sparsity ?

A signal representation is sparse when most information is concentrated in a small amount of data (coefficients). For example, a sine wave is sparsely represented in the Fourier domain, not in the time domain. Sparsity is an “vague” concept. Ideally, the volume of data (number of coefficients for example) would be a good sparsity measure. In noisy situations, this measure is generally polluted by a large number of small coefficients, originating from noise. Other measures may be used (entropies)... but they often do not yield the same results [Jaillet & BT 2003].

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 4 / 41

SLIDE 7

Introduction

Introduction: sparsity: what for ?

A sparse time-frequency representation concentrates the relevant information in a small amount of coefficients: the pdf of the coefficients is peaked at 0, and heavy tailed. Most popular applications Signal coding... if the cost of encoding the representation itself is not too high Signal modeling: expand signals into components that make sense. Denoising: most often, noise is not sparse. Source separation (exploiting dimension reduction). ...

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 5 / 41

SLIDE 8

Introduction

1 Introduction 2 Signal waveform representations

Bases Frames Multiple frames More realistic time-frequency atoms ?

3 Coefficient domain models

Hybrid random waveform models Estimation algorithms based on observed coefficients Estimation algorithms based on synthesis coefficients

4 Conclusion 5 References

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 6 / 41

SLIDE 9

Signal waveform representations

Signal representations

Signal waveform expansion: decompose a signal as a linear combination of “elementary waveforms” ψλ, often generated using simple rules. x(t) =

λ

αλψλ(t) with αλ the coefficients, and ψλ the waveforms. Examples: Time-frequency atoms (MDCT or Wilson bases, Gabor atoms,...) Time-scale atoms (wavelets, multiwavelets,...) Chirplets,... Higher dimensional versions See [Mallat 1998], [Carmona et al. 1998] or [Wickerhauser 1994].

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 7 / 41

SLIDE 10

Signal waveform representations Bases

Signal representations: bases

The mathematically simplest situation: orthonormal bases. The waveform system W = {ψλ, λ ∈ Λ} is an orthonormal basis of the signal space (inner product space, or Hilbert space) H is The atoms are mutually orthogonal and normalized: ψλ, ψµ = δµν They form a complete set in H: if the signal x ∈ H is such that x, ψλ = 0 for all λ ∈ Λ, then x = 0. Then, any signal may be written in an unique way as x(t) =

λ∈Λ

αλψλ(t) , with αλ = x, ψλ Thus, analysis and synthesis involve the same atoms. In addition, the “coefficient mapping” x → {αλ, λ ∈ Λ} preserves energy (Parseval’s formula)

λ∈Λ

|αλ|2 = x2 .

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 8 / 41

SLIDE 11

Signal waveform representations Bases

Signal representations: bases

MDCT basis: smooth windows modulated by a sinusoidal function. In the continuous-time setting, the following (infinite) family of functions forms an orthonormal basis of L2(R). ukn(t) =

2

ℓk wk(t) cos π ℓk

n + 1

2

(t − ak)
,

k ∈ Z, n = 0, 1, 2, . . . In bounded intervals, as well as finite dimensional settings, similar bases may be constructed (Malvar, Suter, ...)

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 9 / 41

SLIDE 12

Signal waveform representations Bases

Signal representations: bases

More precisely, the only assumption is that the window functions wk must satisfy some symmetry conditions at boundaries. In general, windows are taken as regular translates of a single one. More freedom may be introduced, as long as the symmetry conditions are

fullfilled. For example, some audio coders use systems with wide and

narrow windows:

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 10 / 41

SLIDE 13

Signal waveform representations Bases

Signal representations: bases

More precisely, the only assumption is that the window functions wk must satisfy some symmetry conditions at boundaries. In general, windows are taken as regular translates of a single one. More freedom may be introduced, as long as the symmetry conditions are

fullfilled. For example, some audio coders use systems with wide and

narrow windows: Simple implementations are available on the Wavelab Stanford package: http://www-stat.stanford.edu/~wavelab

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 10 / 41

SLIDE 14

Signal waveform representations Bases

Signal representations: bases

MDCT basis is well adapted for audio signals: the expansion of most signals is sparse. See below: pdf (log scale) of MDCT coefficients of some

rgan recording.

Besides signal coding/compression, sparsity also helps for several applications. Application: denoising: as noise is generally not sparse in the MDCT basis, simply threshold the MDCT coefficients of the noisy signal before reconstruction. Organ signal; Noisy organ signal; Denoised organ signal.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 11 / 41

SLIDE 15

Signal waveform representations Bases

Signal representations: bases

Application: source separation: Consider two mixtures (linear combinations): Mix 1; Mix 2. Below: scatter plots of the samples of mix 1 against mix 2 (left), and the mdct coefficients of mix 1 against mix 2 (right).

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 12 / 41

SLIDE 16

Signal waveform representations Bases

Signal representations: bases

Application: source separation: Consider two mixtures (linear combinations): Mix 1; Mix 2. Below: scatter plots of the samples of mix 1 against mix 2 (left), and the mdct coefficients of mix 1 against mix 2 (right). Method: identify the two directions, and project. Reconstructed organ;

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 12 / 41

SLIDE 17

Signal waveform representations Bases

Signal representations: bases

Advantages: “Optimal” in terms of redundancy. There exist bases for which fast algorithms have been developed (MDCT, Wilson, wavelets,...) Drawbacks: Being an orthonormal basis has a price: rigidity. Not any window function will generate a basis. Mathematically speaking, windows are not as smooth as one would like. Being a basis also imposes constraints on the sampling in time and

frequency. No “free access” to the time-frequency domain.

Question: can we make it sparser by introducing redundancy ?

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 13 / 41

SLIDE 18

Signal waveform representations Frames

Signal representations: frames

As an alternative to orthonormal bases, frames (wavelet, or Gabor) offer more flexibility. A frame is a (generally overcomplete) system of waveforms W = {ψλ, λ ∈ Λ} with respect to which signals may be expanded, with stable synthesis. In the case of frames, the energy conservation (Parseval’s formula) is generally replaced by an inequality of the form Ax2 ≤

λ∈Λ

|x, ψλ|2 ≤ Bx2 , for some constants 0 < A ≤ B < ∞, for all signal x.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 14 / 41

SLIDE 19

Signal waveform representations Frames

Signal representations: frames

In such cases, one does not have exact reconstruction as before, but an approximation

x −

2 A + B

λ∈Λ

x, ψλψλ

2

≤ B − A B + A

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 15 / 41

SLIDE 20

Signal waveform representations Frames

Signal representations: frames

In such cases, one does not have exact reconstruction as before, but an approximation

x −

2 A + B

λ∈Λ

x, ψλψλ

2

≤ B − A B + A Good news: there exists a (non unique) dual waveform system { ˜ ψλ, λ ∈ Λ} such that for all signal: x =

λ∈Λ

x, ψλ ˜ ψλ =

λ∈Λ

x, ˜ ψλψλ . Hence: analysis and synthesis do not involve the same waveforms.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 15 / 41

SLIDE 21

Signal waveform representations Frames

Signal representations: frames

An example: as an alternative to MDCT bases, the Gabor frames ψmn(t) = e2iπnν0tψ(t − mb0) provide a regular sampling of the time-frequency plane: a regular grid with mesh sizes b0 and ν0. For b0ν0 small enough, these indeed for a frame of the considered signal space, and the (canonical) dual system is a Gabor frame too. There exists a dual window ˜ ψ such that the dual atoms are of the form ˜ ψmn. The smaller b0ν0, the more redundant the system, the closer A and B, and the closer ψ and ˜ ψ.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 16 / 41

SLIDE 22

Signal waveform representations Frames

Signal representations: frames

Sampling grids in time-frequency domain have to be adapted to the time/frequency resolution of the atoms. Left: good frequency resolution (wide windows); Right: good time resolution (narrow window). Full circles correspond to “large” coefficients.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 17 / 41

SLIDE 23

Signal waveform representations Frames

Signal representations: frames

A Gaussian Gabor atom (blue), and its duals for low redundancy (left) and high redundancy (right) To play with Gabor atoms, dual atoms,...: the Linear Time-Frequency Analysis Toolbox (P. S¨

ndergaard)

LTFAT: http://www.univie.ac.at/nuhag-php/ltfat

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 18 / 41

SLIDE 24

Signal waveform representations Frames

Signal representations: multiple frames

Gabor frames offer more flexibility than MDCT bases in the choice of the

window. However, in the standard construction, the Gabor atoms are of

constant size, which is not always convenient for describing all features of audio signals. The time-frequency resolution of the atoms plays a significant role. Wide windows (from 20 to 40 ms) are well adapted to tonals (partials), while shorter ones (or wavelets instead of Gabor atoms) are beter suited for transients. Example: xilophone;

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 19 / 41

SLIDE 25

Signal waveform representations Frames

Signal representations: multiple frames

Question: can we get the best of the two frames ?

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 20 / 41

SLIDE 26

Signal waveform representations Frames

Signal representations: multiple frames

Question: can we get the best of the two frames ? Answer: yes, provided we can select the right signal expansion (among infinitely many), and control sparsity.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 20 / 41

SLIDE 27

Signal waveform representations Multiple frames

Signal representations: multiple frames

Idea: expand signals with respect to a larger system, involving both wide atoms Wg = {gmn, (m, n) ∈ Λg} and narrow atoms Wh = {hmn, (m, n) ∈ Λh}: D = Wg ∪ Wh D is still a frame, i.e. stable signal expansions on D exist.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 21 / 41

SLIDE 28

Signal waveform representations Multiple frames

Signal representations: multiple frames

Idea: expand signals with respect to a larger system, involving both wide atoms Wg = {gmn, (m, n) ∈ Λg} and narrow atoms Wh = {hmn, (m, n) ∈ Λh}: D = Wg ∪ Wh D is still a frame, i.e. stable signal expansions on D exist. For all finite-energy signal x, there exist infinitely many expansions x = xg + xh =

λ∈Λg

αλgλ +

µ∈Λh

βµhµ The coefficients α and β provide information on the time-frequency content of x; but some expansions are more meaningful than others.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 21 / 41

SLIDE 29

Signal waveform representations Multiple frames

Signal representations: multiple frames

How to pick the right time-frequency atoms ? Quilted frames [D¨

rfler 2002]: tile the time-frequency plane into

domains corresponding to different time-frequency resolutions Time-frequency Jigsaw Puzzle [Jaillet & BT 2006]: let the computer choose the “right” atoms in time-frequency domain, using sparsity requirement. Matching Pursuit and Orthogonal Matching Pursuit [Mallat & Zhang 1993]: recursive search of atoms that correlate best with the signal. Basis Pursuit and Basis Pursuit Denoising [Chen et al 1998]. In all cases, partial synthesis from atoms of similar properties (i.e. time-frequency resolution) is possible

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 22 / 41

SLIDE 30

Signal waveform representations Multiple frames

Signal representations: multiple frames

Example with the TFJP algorithm [Jaillet & BT 2006]

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 23 / 41

SLIDE 31

Signal waveform representations More realistic time-frequency atoms ?

More realistic time-frequency atoms ?

In the previous approaches, sparsity was the only requirement. Can we do more ? In addition, whatever the choice of the waveform system, the time-frequency atoms are generally not realistic as “sound atoms”. Several possible approaches for improvement Learn more realistic atoms from sound databases: “dictionary learning” approach (e.g. [Bluemensath & Davies 2004]).

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 24 / 41

SLIDE 32

Signal waveform representations More realistic time-frequency atoms ?

More realistic time-frequency atoms ?

In the previous approaches, sparsity was the only requirement. Can we do more ? In addition, whatever the choice of the waveform system, the time-frequency atoms are generally not realistic as “sound atoms”. Several possible approaches for improvement Learn more realistic atoms from sound databases: “dictionary learning” approach (e.g. [Bluemensath & Davies 2004]). Build “time-frequency molecules” from atoms as compound objects (see for example [Daudet 2006])

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 24 / 41

SLIDE 33

Signal waveform representations More realistic time-frequency atoms ?

More realistic time-frequency atoms ?

In the previous approaches, sparsity was the only requirement. Can we do more ? In addition, whatever the choice of the waveform system, the time-frequency atoms are generally not realistic as “sound atoms”. Several possible approaches for improvement Learn more realistic atoms from sound databases: “dictionary learning” approach (e.g. [Bluemensath & Davies 2004]). Build “time-frequency molecules” from atoms as compound objects (see for example [Daudet 2006]) Model dependencies between atoms in the coefficient domain. In the rest of the lecture, we focus on this last approach, using a pair of

rthonormal bases (following [Daudet & Torr´

esani 2005]).

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 24 / 41

SLIDE 34

Signal waveform representations More realistic time-frequency atoms ?

1 Introduction 2 Signal waveform representations

Bases Frames Multiple frames More realistic time-frequency atoms ?

3 Coefficient domain models

Hybrid random waveform models Estimation algorithms based on observed coefficients Estimation algorithms based on synthesis coefficients

4 Conclusion 5 References

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 25 / 41

SLIDE 35

Coefficient domain models

Experimental observation”: Interesting features in signals are “often” characterized by localized families of large coefficients, forming structured sets: Tonals: Horizontal lines in the short time Fourier domain. Transients: Vertical lines in the short time Fourier domain, or vertical trees in the wavelet domain. To encode separately such structures, explicit models may be introduced in the coefficient domain. Strategy: characterize the behavior of certain indicators (coefficients, or

thers) in the framework of the model, in view of estimation from real

data.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 26 / 41

SLIDE 36

Coefficient domain models Hybrid random waveform models

Hybrid random waveform models

The generic form of such models (in N-dimensional space) is the following [Kowalski & BT 2006] x =

λ∈Λ

αλgλ +

δ∈∆

βδhδ + r where Wg = {gλ , λ = 1, . . . N} and Wh = {hδ , δ = 1 . . . N} are two

rthonormal bases of waveforms.

The coefficients αλ and βδ are iid Gaussian random variables, with frequency dependent variances (λ and δ are time-frequency indices) and r is a small residual signal, modeled as white noise. The sets Λ and ∆ are sparse random subsets of the index set. The simplest model is the Bernoulli model: iid sets, with membership probabilities p and ˜

p. More complex models (for example Markov

models) introduce dependencies between coefficients.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 27 / 41

SLIDE 37

Coefficient domain models Hybrid random waveform models

Hybrid random waveform models

Are such models able to reproduce “experimental observations” ? Study the behavior of observed coefficients (which differ from the synthesis coefficients αn and βn) an = x, gn , bm = x, hm Introduce the membership variables X Λ

n = 1 if n ∈ Λ and 0 otherwise, and

similarly for X ∆

n .

Then

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 28 / 41

SLIDE 38

Coefficient domain models Hybrid random waveform models

Hybrid random waveform models

Are such models able to reproduce “experimental observations” ? Study the behavior of observed coefficients (which differ from the synthesis coefficients αn and βn) an = x, gn , bm = x, hm Introduce the membership variables X Λ

n = 1 if n ∈ Λ and 0 otherwise, and

similarly for X ∆

n .

Then    an = x, gn = αnX Λ

n + N m=1 βmX ∆ m hm, gn

bn = x, hn = βnX ∆

n + N m=1 αmX Λ mgm, hn

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 28 / 41

SLIDE 39

Coefficient domain models Estimation algorithms based on observed coefficients

Estimation: observed coefficients

In particular, assuming for simplicity that all coefficients α (resp. β) have the same variance σ2 (resp. ˜ σ2), one has var{ak} = σ2X Λ

k +

˜

σ2

δ∈∆

|gk, hδ|2 + σ2

.

If the significance maps are sparse, and if the two bases are sufficiently different, one recovers the “experimental” observations. This justifies the fact of approximating the signal by keeping the largest coefficients.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 29 / 41

SLIDE 40

Coefficient domain models Estimation algorithms based on observed coefficients

Estimation: observed coefficients

Bernoulli model: It may be proved that the observed coefficients follow a mixture distribution: A “small variance” Gaussian mixture for coefficients whose time-frequency index does not belong to the significance map A “large variance” Gaussian mixture for coefficients whose time-frequency index does belong to the significance map. Exploiting numerically such a results yields an algorithm for estimating the significance maps, which yields significant dimension reduction, and allows

ne to estimate the coefficients.
B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 30 / 41

SLIDE 41

Coefficient domain models Estimation algorithms based on observed coefficients

Bernoulli-based estimation algorithm: 3 steps

Goal: decompose the signal into two layers (+ residual) Parameter estimation: membership probabilities and synthesis coefficients variances (EM algorithm). Estimation of the significance maps Λ and ∆ (maximum likelihood... thresholding for the Bernoulli model) Estimation of the layers: orthogonal projection onto the subspace generated by the selected time-frequency atoms.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 31 / 41

SLIDE 42

Coefficient domain models Estimation algorithms based on observed coefficients

Bernoulli-based estimation algorithm: 3 steps

Goal: decompose the signal into two layers (+ residual) Parameter estimation: membership probabilities and synthesis coefficients variances (EM algorithm). Estimation of the significance maps Λ and ∆ (maximum likelihood... thresholding for the Bernoulli model) Estimation of the layers: orthogonal projection onto the subspace generated by the selected time-frequency atoms. The algorithm is in fact more complex, and involves several iterations of steps 1 and 2.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 31 / 41

SLIDE 43

Coefficient domain models Estimation algorithms based on observed coefficients

Bernoulli-based estimation algorithm: example

Decomposition of a Xilophone signal (top) into transient (bottom left) and tonal (bottom right) layers

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 32 / 41

SLIDE 44

Coefficient domain models Estimation algorithms based on observed coefficients

Structured model-based estimation algorithm

Structured model: implements other a priori information, such as the fact that significant coefficients tend to form clusters, or lines (horizontal or vertical). Several models may be developed, among which Markov models for the significance maps [Molla & Torr´ esani 2005] Two-levels Bernoulli models (M. Kowalski) The estimation procedure has to be modified accordingly.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 33 / 41

SLIDE 45

Coefficient domain models Estimation algorithms based on observed coefficients

Structured model-based estimation algorithm

Example: Markov model: (see [Molla & Torr´ esani 2005]) for the tonal layer, the distribution of the indicator random variables X ∆

kn is

characterized by transition matrices Pn = πn 1 − πn 1 − π′

n

π′

n

,

with πn = P∆

X ∆

k+1,n = 1|X ∆ k,n = 1

,

π′

n = P∆

X ∆

k+1,n = 0|X ∆ k,n = 0

In particular,

P{Xk+1,n = 1, Xk,n = 1} > P{Xk+1,n = 1}P{Xk,n = 1} P{Xk+1,n = 0, Xk,n = 0} > P{Xk+1,n = 0}P{Xk,n = 0} .

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 34 / 41

SLIDE 46

Coefficient domain models Estimation algorithms based on observed coefficients

Structured model-based estimation algorithm

The estimation algorithm keeps a similar structure as before... but becomes more complex: the estimation of the significance maps is not local (in the coefficient domain) anymore. Parameter estimation may be performed via EM algorithms The estimation of membership probabilities has to be replaced with the estimation of Markov matrices The estimation of the maps is done using Viterbi algorithm.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 35 / 41

SLIDE 47

Coefficient domain models Estimation algorithms based on observed coefficients

Structured model-based algorithm: example

“Blues Brothers” recording: original, tonal, transient and residual.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 36 / 41

SLIDE 48

Coefficient domain models Estimation algorithms based on synthesis coefficients

Estimation: synthesis coefficients

Alternative: in the framework of such random models, work directly on the distribution of the synthesis coefficients. A hierarchical Bayesian model model can be constructed [F´ evotte et al 2006], implementing Sparse signal decomposition into a dictionary of two MDCT bases, with Gaussian random coefficients Markov significance maps, implementing “structured” sets of coefficients Suitable priors for the model coefficients MAP and MMSE estimates are obtained by MCMC algorithms, significantly heavier than the previous ones.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 37 / 41

SLIDE 49

Coefficient domain models Estimation algorithms based on synthesis coefficients

Estimation: synthesis coefficients

Alternative: in the framework of such random models, work directly on the distribution of the synthesis coefficients. A hierarchical Bayesian model model can be constructed [F´ evotte et al 2006], implementing Sparse signal decomposition into a dictionary of two MDCT bases, with Gaussian random coefficients Markov significance maps, implementing “structured” sets of coefficients Suitable priors for the model coefficients MAP and MMSE estimates are obtained by MCMC algorithms, significantly heavier than the previous ones. Example: S. Raman; noisy version; denoised version (MMSE estimate); Tonal; Transient.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 37 / 41

SLIDE 50

Conclusion

Conclusions

Hybrid expansions generally provide sparser signal representations. The introduction of structured significance maps also improves sparsity, generally at the price of increased computational burden. In addition, tonal layers turn out to be more difficult to model accurately (MDCT bases do not offer the same flexibility as harmonic models). These techniques yield “transient + tonal + residual” signal decompositions: a sort of elementary (single captor) source separation. These may be exploited for various tasks (denoising, source separation, coding...), in situations where the residual signal is not too important. Otherwise, the residual will have to be modelled. To do next: relax the assumption of independence of the tonal and transient layers.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 38 / 41

SLIDE 51

References

Bibliography

T. Blumensath, M.E. Davies, Unsupervised learning of sparse and shift-invariant

decompositions of polyphonic music, in: Proceedings of ICASSP ’04, vol. 5, 2004, pp. V:497-V:500.

R. Carmona, W.L. Hwang, and B. Torr´
esani. Practical Time-Frequency Analysis:

continuous wavelet and Gabor transforms, with an implementation in S, volume 9 of Wavelet Analysis and its Applications. Academic Press, San Diego, 1998. S.S. Chen, D.L. Donoho and M.A. Saunders, Atomic Decomposition by Basis Pursuit SIAM Journal on Scientific Computing 20:1 (1998), pp. 33 - 61

L. Daudet. Sparse and structured decompositions of signals with the molecular matching
pursuit. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2006, to

appear.

L. Daudet and B. Torr´

esani, Sparse adaptive representations for musical signals, Technical report, to appear in Signal processing for music transcription, M. Davy and A. Klapuri Eds. (2005).

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 39 / 41

SLIDE 52

References

Bibliography

M. D¨
rfler, Gabor Analysis for a Class of Signals called Music, PhD Dissertation, 2002,

Mathematics Department, University of Vienna. http://www.mat.univie.ac.at/~moni/diss.pdf

C. Fevotte, L. Daudet, S.J. Godsill and B. Torr´

esani, Sparse Regression with Structured Priors: Application to Audio Denoising. Proceedings of ICASSP 2006, Volume: 3, pp. III-57 - III-60.

C. F´

evotte, B. Torr´ esani, L. Daudet and S. Godsill, Denoising of musical audio using sparse linear regression and structured priors, submitted.

F. Jaillet and B. Torr´

esani, Remarques sur l’adaptativit des reprsentations temps-frquence Proceedings of the GRETSI’03 conference, Vol 1, pp. 145-148.

F. Jaillet and B. Torr´

esani, Time-Frequency Jigsaw Puzzles, To appear in Int. J. on Wavelets and Multiresolution Information Processing (2006).

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 40 / 41

SLIDE 53

References

Bibliography

M. Kowalski, and B. Torr´

esani, A Family of Random Waveform Models for Audio Coding, Proceedings of ICASSP 2006, Volume: 3, pp III-472 - III-475.

S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE

Transactions on Signal Processing, 41:3397–3415, 1993.

S. Mallat. A wavelet tour of signal processing. Academic Press, 1998.
S. Molla and B. Torr´
esani. Hybrid Audio Scheme using Hidden Markov Models of

Waveforms Applied and Computational Harmonic Analysis 18 (2005), pp. 137-166.

M. V. Wickerhauser. Adapted Wavelet Analysis from Theory to Software. AK Peters,

Boston, MA, USA, 1994.

B. Torr´

esani (LATP Marseille) Sparse Time-Frequency Transforms September 2006 41 / 41