[PPT] - Perceptually informed organization of textural sounds Research PowerPoint Presentation

SLIDE 1

Austrian Research Institute for Artificial Intelligence (OFAI)

Perceptually informed

rganization of textural sounds

Thomas Grill

SLIDE 2

Thomas Grill: Perceptually informed organization of textural sounds

Research context

Towards automatic annotation of electroacoustic music

(2008–2010)

Audiominer – Mathematical signal analysis and modeling for

manipulation of sound objects (2010–2013)

Automatic segmentation, labelling, and characterization of

audio streams (2013–2016)

2

SLIDE 3

Thomas Grill: Perceptually informed organization of textural sounds Smalley D.: Klien V., Grill T., Flexer A.: On automated annotation of acousmatic music, Journal of New Music Research, 2012

Towards automatic annotation of electroacoustic music (2008–2010)

3

Annotation of un-notated music
Manual vs. automatic annotation
The system of spectromorphology
Structural overview and detection
f representative sound groupings

John Chowning: Turenas

SLIDE 4

Thomas Grill: Perceptually informed organization of textural sounds Holighaus N., Dörfler M., Velasco G. A. and Grill T.: A framework for invertible, real-time constant-Q transforms, IEEE Transactions on Audio, Speech and Language Processing, 2013.

Audiominer (2010–2013)

4

Cooperation with NUHAG
Variations on the Gabor transform

(aka STFT) – invertible CQ-NSGT

Structured sparsity

SLIDE 5

Thomas Grill: Perceptually informed organization of textural sounds

Automatic segmentation, labelling, and characterization of audio streams (2013–)

5

Just started
Segmentation and characterization algorithms with a general

applicability (perceptually informed)

Improved relation between semantic descriptions and

measurable quantities

Interesting data available, partly annotated

SLIDE 6

Thomas Grill: Perceptually informed organization of textural sounds

Outline

Why / what are textural sounds?
How can textural sounds be described?
How can (textural) sounds be organized?
How can textural sounds and collections thereof be

visualized?

6

SLIDE 7

Thomas Grill: Perceptually informed organization of textural sounds

G. Strobl, G. Eckel and D. Rocchesso. “Sound Texture Modeling: A Survey”. Proceedings of the 2006 Sound and Music Computing

(SMC) International Conference.

Textural sounds

Activity sounds: chip, sweep, rustle, typing, scroop, rasp,

crumple, clap, rub, walking

Machine sounds: buzz, whir, hammer, grumble, drone, traffic
Natural sounds: fire, water (rain, waterfall, ocean), wind
Animal sounds: sea gulls, crickets, humming
Human utterances: babble, chatter

7

SLIDE 8

Examples of textural sounds

Thomas Grill: Perceptually informed organization of textural sounds 8 0.0 0.5 1.0 1.5 2.0 2.5 time [s] 100 200 400 800 1600 3200 6400 12800 frequency [Hz]

SLIDE 9

Thomas Grill: Perceptually informed organization of textural sounds

Examples of textural sounds

9 0.0 0.5 1.0 1.5 2.0 2.5 time [s] 100 200 400 800 1600 3200 6400 12800 frequency [Hz]

SLIDE 10

Thomas Grill: Perceptually informed organization of textural sounds Saint-Arnaud, N. (1995). Classification of sound textures. Master’s thesis, MIT Media Lab, Cambridge, MA, USA

A sound texture is like wallpaper: it can have local structure and randomness, but the characteristics of the structure and randomness must remain constant on the large scale.

10

Definition attempt

2.2 Working Definition

f Sound Textures

First Time Constraint: Constant Long-term Characteristics

A definition for a so~und texture could be quite wide, but we chose to restrict our working definition for many perceptual and conceptual

reasons. First of all, there is no consensus among people as to what a

sound texture might be; and more people will accept sounds that fit a more restrictive definition. The first constraint WC! put on our definition

f a sound textures

is that it should exhibit similar characteristics

ver time; that is, a

two-second snippet of a texture should not differ significantly from another two-second snippet. To use another metaphor, one could say that any two snippets of a sound texture seem to be cut from the same rug [RIC79]. A sound texture is like wallpaper: it can have local structure and randomness, but the characteristics of the structure and randomness must remain constant on the large scale. This means that the pitch should not change like in a racing car, the rhythm should not increase or decrease, etc. This constraint also means that sounds in which the attack plays a great part (like many timbres) cannot be sound textures. A sound texture is characterized by its sustain. Figure 2.2.1 shows an interesting way of segregating sound textures from other sounds, by showing how the “potential information content” increases with time. “Information” is taken here in the cognitive sense rather then the information theory sense. Speech or music can provide new information at any time, and their “potential information content” is shown here as a continuously increasing function

f time. Textures, on the other hand, have constant long

term characteristics, which translates into a flattening

f the potential

information

increase. Noise (in the auditory

cognitive sense) has somewhat less information than textures.

FIGURE 2.2.1 Potential Information Content of A Sound Texture vs. Time content

speech music sound texture noise b

time

Sounds that carry a lot of meaning are usually perceived as a

message. The semantics take the foremost position in the cognition,

downplaying the characteristics

f the sound proper. We choose to

work with sounds which are not primarily perceived as a message.

Chapter 2 Human Perception

f Sound Textures

24

SLIDE 11

Thomas Grill: Perceptually informed organization of textural sounds Landy, L. (2007). Understanding the Art of Sound Organization. The MIT Press, Cambridge, MA, USA. Truax, B. (2008). Soundscape composition as global music: Electroacoustic music as soundscape. Organised Sound, 13(2):103–109.

Sound-based music and textural sound

Sound-based music:

“art form in which the sound and not the musical note is the basic unit.” ➡Acousmatic music and soundscape composition

Textural sound as "sound material"

11

SLIDE 12

Thomas Grill: Perceptually informed organization of textural sounds

Low Frequency Orchestra plays Robert Lettner: Das Spiel vom Kommen und Gehen

12

SLIDE 13

SLIDE 14

Thomas Grill: Perceptually informed organization of textural sounds

Describing sounds

Predominant scheme: Semantic tagging

(sound origin, recording context, etc.)

Sonic qualities are equally important/interesting,

especially for abstract sounds or use in sound design etc.

Description ⇨ Organization

14

SLIDE 15

Thomas Grill: Perceptually informed organization of textural sounds Grill, Flexer and Cunningham. Identification of perceptual qualities in textural sounds using the repertory grid method. Proceedings of the 6th Audio Mostly Conference, 2011

What are the most significant qualities of textural sounds?

➡Repertory grid technique used to elicit qualities (personal constructs) "ex nihilo", for a specific selection of subjects (interviewees) and objects under examination (items)

Interviewees (subjects) are asked to name differences

between two randomly chosen sound examples ➡Bipolar qualities spanning range from one sound to the other

Identification of perceptual qualities in textural sounds

15

SLIDE 16

Thomas Grill: Perceptually informed organization of textural sounds

Example

16

Straight differentiation:

In which ways do two sounds differ?

Triads:

Group three objects to form two groups, then name differences between groups

SLIDE 17

Thomas Grill: Perceptually informed organization of textural sounds

Repertory Grid for sounds

Elicitation of ~10 bipolar constructs per subject
Subjects rate all 20 sounds (grades 1 to 5)

using own personal constructs

17

motion textural impulse high excentric evolutionary well-defined regular narrative pitched smooth static coherent continuous low contained repetitive diffused irregular static non-pitched porous

A

4 4 4 1 2 4 4 2 4 3 3

B

5 3 5 5 5 1 3 1 5 2 1

C

4 5 2 2 4

5

5 3 5 5 4

D

4 2 5 4 3 4 4 3 4 2 3

E

2 4 1 1 2 4 1 5 5 3 5

F

1 1 2 2 2

3

2 5 5 4 5

G

5 5 5 5 5 2 1 2 5 1 1

H

4 3 3 1 2 5 1 1 5 2 4

I

4 2 2 2 2 5 2 2 4 1 4

J

2 1 5 3 1

2

5 5 3 5 3

K

5 2 4 4 4 4 3 1 5 4 2

L

1 1 1 3 1

2

1 5 5 5 5

M

4 5 5 1 2 2 3 2 5 3 2

N

3 1 4 4 1 4 4 5 5 4 2

O

4 2 4 3 3

3

5 4 3 5 3

P

2 2 3 3 3 4 5 3 5 5 4

Q

5 5 5 3 5

5

1 1 5 1 1

R

3 3 4 2 3 2 2 3 4 2 3

S

2 2 5 2 3 4 4 4 2 3 2

T

1 1 4 4 1 4 3 2 3 5 2

1 … 5

SLIDE 18

Thomas Grill: Perceptually informed organization of textural sounds

16 subjects
expert listeners
202 constructs
mostly German

high/low

rdered/chaotic

18

SLIDE 19

Thomas Grill: Perceptually informed organization of textural sounds 19

http://grrrr.org/test/classify

SLIDE 20

Thomas Grill: Perceptually informed organization of textural sounds

Inter-rater agreement

20

*nine subjects who took part in the elicitation process Construct Agreement α (core group)* Agreement α (all n ≥ 10) high – low 0.588 0.519

rdered – chaotic

0.556 0.447 natural – artificial 0.551 0.492 smooth – coarse 0.527 0.420 tonal – noisy 0.523 0.435 homogeneous – heterogeneous 0.519 0.416 dense – sparse 0.492 0.342 edgy – flowing 0.465 0.376 static – dynamic 0.403 0.383 near – far 0.252 0.249

SLIDE 21

Thomas Grill: Perceptually informed organization of textural sounds

Sounds along axis high–low

21

⟵ high low ⟶

SLIDE 22

Thomas Grill: Perceptually informed organization of textural sounds

Pearson correlation between constructs

22

SLIDE 23

Thomas Grill: Perceptually informed organization of textural sounds

Pearson correlation between constructs

23

SLIDE 24

Thomas Grill: Perceptually informed organization of textural sounds

Visualizing sounds in a collection

Representation of properties of individual sounds

➡Auditory (or semantic) characteristics

Representation of properties of the sound collection

➡Clusters, similarities, principal characteristics

Waveforms and sonograms?

24

SLIDE 25

Thomas Grill: Perceptually informed organization of textural sounds Lawrence Marks: On Perceptual Metaphors. Metaphor and Symbolic Activity 11(1), 39–66, 1996

Perceptual metaphors

Strong synesthesia

➡very rare, asymmetric, individual

Weak synesthesia – cross-modal similarity

25

SLIDE 26

Thomas Grill: Perceptually informed organization of textural sounds Wolfgang Köhler, Gestalt psychology,1929

Cross-modal similarity

26

SLIDE 27

Thomas Grill: Perceptually informed organization of textural sounds Grill and Flexer: Visualization of perceptual qualities in textural sounds, Proceedings of the ICMC, 2012.

Based on most relevant of the previously elicited

personal constructs

Layout in two dimensions (screen compatibility)
Synesthesia-like mappings from auditory to visual domain

Visualization of perceptual qualities in textural sounds

27

SLIDE 28

Thomas Grill: Perceptually informed organization of textural sounds 28

high–low

rdered–

chaotic tonal– noisy smooth– coarse homogeneous– heterogeneous

SLIDE 29

Thomas Grill: Perceptually informed organization of textural sounds

Online survey: Visualization of textural sounds

29

http://grrrr.org/test/texvis

SLIDE 30

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation: Survey B – Dependence on expertise

30

group voters / votes correctness (random: 20%) mean RMS error (random: 0,243) non-musicians, ≥ 20 votes 19 / 876 33,9% 0,178 classical musical training, ≥ 20 votes 29 / 1570 40,0% 0,163 electronic music practice, ≥ 20 votes 48 / 2811 45,2% 0,137 electronic music practice, good listening conditions, ≥ 20 votes 36 / 2019 46,4% 0,133

SLIDE 31

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation: Pearson correlation selection to reference

31

Survey B: electronic music practitioners, good listening conditions, ≥ 10 votes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

high–low

rdered–chaotic

smooth–coarse tonal–noisy homogeneous– heterogeneous

selected

high–low

rdered–chaotic

smooth–coarse tonal–noisy homogeneous– heterogeneous

reference

0.71

0.24
0.33
0.08
0.22
0.27

0.65 0.43 0.36 0.54

0.34

0.43 0.68 0.54 0.27

0.11

0.37 0.55 0.69 0.20

0.25

0.55 0.26 0.18 0.62

SLIDE 32

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation: Mean RMS error vs. decision time

32

5 10 15 20 25 30 35 40 45

time per vote(s)

0.05 0.10 0.15 0.20 0.25 0.30 0.35

avg RMS error

users=94, x-y correlation=-0.13 @ significance(p=0.05)=0.20 mean duration=13.83 (6.65)

SLIDE 33

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation: Mean RMS error vs. perceived difficulty

33

0.0 0.2 0.4 0.6 0.8 1.0

perceived difficulty

0.00 0.05 0.10 0.15 0.20 0.25

avg RMS error

sounds=100, x-y correlation=0.484 @ significance(p=0.05)=0.197

SLIDE 34

Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012

Perceptually informed high-level descriptors for textural audio

Attempt to model the previously elicited personal constructs,

i.e. metaphoric descriptions ➞ audio descriptors for high–low, ordered–chaotic, smooth–coarse, tonal–noisy, homogeneous–heterogeneous

Build on extensive experimental data from online survey

covering 100 textural sounds

34

SLIDE 35

Thomas Grill: Perceptually informed organization of textural sounds Holighaus N., Dörfler M., Velasco G. A. and Grill T.: A framework for invertible, real-time constant-Q transforms, IEEE Transactions

n Audio, Speech and Language Processing, Volume 21, Issue 4, pp. 775-785, 2013.

Perceptually informed high-level descriptors for textural audio

Use a uniform underlying time-frequency representation
Small number of adjustable parameters for each descriptor
Parameters to be tuned, so that the descriptors correlate well

with human perception

35

SLIDE 36

Thomas Grill: Perceptually informed organization of textural sounds

Examples for high–low

frequency [Hz]

windspiel1.cut.aiff

frequency [Hz]

steelplantL.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800

36

SLIDE 37

Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012

Descriptor for high–low

Calculate the mean over time

37

Attenuate spectrally
Warp the loudness range
Warp the frequency axis

and calculate the centroid

SLIDE 38

Thomas Grill: Perceptually informed organization of textural sounds

Descriptor tuning

38

c-pulsar1 c-tri-longl f-degraded2l windspiel1 shimmeringdigita who01_46_4

ver-cry1

salz-sparsebit.l a-reiben1l atmo2l pulver-01 salz-frq.l tiere1 bolzbund1l cicada03 tiere10 f-degraded1l beat-high-r-01 aero-64kb-15db egrain01 salz-fullbit.l chor-hi bigglassbreaking who01_65 a-prickel1 atmo3l bolzbund2l tiere18 tiere5 env21 flirr vst1a brizzl tiere3 a-reiben2l sinmodl prickel2l folieluft1l prickel1l kidrock-20-3 atmo1l c-flitch tiere12 env17 ns-brrrrr prickel3l tiere15 applaus1 kugelsortier-exp.l tanz-slow noise-mid-r-01 longrisingmusics who02_19_1 schaumknull1l regenhof env19 ns-divers brizzlowl eff-flirr-r-01 machine16 folieknister1l tiere9 schaumriss1l tiere17 schreiben2 ampel-verkehr2 diesel-laut tiere2 longwindywhoosh who01_10_4 tiere11 surfybrightwindw who01_16_2 feed-ghost-r-01 env5 whirlingwhoosh_bonus 72 rush-30-6 machine13 machine1l b-halll radio2m machine14 env3 machine15 mischmaschine-exp.l jetafterburnerki who01_32_1 industrial 04 env11 kuhstall eff-low-r-01 baumaschine leise+brumm noise-ton schritte guns-96-20-6 howlingbreathenh who02_34_1 airlowl steelplantl industrial 01 flaredpass who02_51_1 a-darkns biglowdrone who01_07 raspyexhale_bonus 86 lowbrl rumblesweepwhoos who01_16_5

high low

Weighted Pearson correlation between user data (black) and descriptor values (red)

SLIDE 39

Thomas Grill: Perceptually informed organization of textural sounds

Examples for ordered–chaotic

frequency [Hz]

tiere18.cut.aiff

frequency [Hz]

beat-high-r-01.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800

39

SLIDE 40

Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012

Descriptor for ordered–chaotic

40

Detrend by high-pass filtering using

a Gaussian kernel

Slice sonogram along time axis
Reveal inherent temporal repetitions by

comparing with shifted slices

Subtract per-slice means
Compare repetition detection functions
ver subsequent slices
Factor in the amount of similarity in the

repetitions

Average and take logarithm

˘ ct,f = ˆ ct,f −

M1

f (ˆ

ct,f) ⋆ Gσ(t)

si,ν,t,f =
t′

wn(t′ − (im + ν)) ˘ ct′,f δi,ν = Mξ

t,f |si,ν,t,f − si,0,t,f|

¯ δi,ν = δi,ν − M1

ν (δi,ν)

γi = Mη

ν

¯

δi+1,ν − ¯ δi,ν

˜

γi = γi · Mα

ν (δi,ν) · Mα ν (δi+1,ν)

Dordered–chaotic = log M1

i (˜

γi)

SLIDE 41

Thomas Grill: Perceptually informed organization of textural sounds

Examples for smooth–coarse

frequency [Hz]

airlowL.cut.aiff

frequency [Hz]

bolzbund1L.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800

41

SLIDE 42

Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012

Descriptor for smooth–coarse

42

Spectral attenuation
Integrate absolute differences
ver frequency, also compressing

the magnitudes

Integrate over the time

axis,boosting the temporal contrast

ˆ ct,f = ¯ ct,f attΦ(f) δt = Mξ

f

∆

t ˆ

ct,f

Dsmooth–coarse = log Mη

t (δt)

SLIDE 43

Thomas Grill: Perceptually informed organization of textural sounds

Examples for tonal–noisy

frequency [Hz]

flirr.cut.aiff

frequency [Hz]

schaumknull1L.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800

43

SLIDE 44

Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012

Descriptor for tonal–noisy

44

Spectral attenuation
Integrate over time,

compress loudness

Integrate along frequency axis,

boost spectral contrast, take logarithm

ˆ ct,f = ¯ ct,f attΦ(f) βf = Mξ

t (ˆ

ct,f) Dtonal–noisy = log Mη

f (βf)

SLIDE 45

Thomas Grill: Perceptually informed organization of textural sounds

Examples for homogeneous–heterogeneous

frequency [Hz]

leise+brumm.cut.aiff

frequency [Hz]

tiere3.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800

45

SLIDE 46

Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012

Descriptor for homogeneous–heterogeneous

46

Slice sonogram along time axis
Normalize slices
Calculate loudness modulation spectra

by taking absolute DFTs over time

Account for perception of fluctuation strength
Measure the overall similarity of adjacent

slices with adjustable sensitivity

Average over all the measures and

take the logarithm

si,t,f =

t′

wn(t′ − im) ˆ ct′,f ¯ si,t,f = si,t,f Mξ

t′,f ′(si,t′,f ′)

ˆ si,ρ,f =

F

t (¯

si,t,f)

˜

si,ρ,f = ˆ si,ρ,f attΨ(ρ) δi = Mη

ρ,f |˜

si+1,ρ,f − ˜ si,ρ,f| Dhomogeneous–heterogeneous = log M1

i (δi)

SLIDE 47

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation – Weighted Pearson correlation

47

high–low

rdered–chaotic

smooth–coarse

perceived qualities computed qualities

tonal–noisy homogeneous– heterogeneous high–low

rdered–chaotic

smooth–coarse tonal–noisy homogeneous– heterogeneous

0.11

0.70 0.21 0.24 0.75

0.32

0.53 0.62 0.75 0.41

0.56

0.59 0.75 0.62 0.38

0.09

0.74 0.37 0.35 0.69 0.90

0.42
0.47
0.38
0.33

tuned for individual accuracy

perceived qualities

high–low

rdered–chaotic

smooth–coarse tonal–noisy homogeneous– heterogeneous

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.08

0.66 0.13 0.14 0.75

0.13

0.43 0.52 0.74 0.37

0.57

0.57 0.74 0.59 0.35

0.12

0.74 0.40 0.39 0.65 0.88

0.36
0.37
0.28
0.31

tuned for mutual independence

SLIDE 48

Thomas Grill: Perceptually informed organization of textural sounds

Application: Musical interface – Continuous map of textural sounds

48

http://grrrr.org/data/research/texmap