Perceptually informed organization of textural sounds OFAI research - - PowerPoint PPT Presentation

perceptually informed organization of textural sounds
SMART_READER_LITE
LIVE PREVIEW

Perceptually informed organization of textural sounds OFAI research - - PowerPoint PPT Presentation

Austrian Research Institute for Artificial Intelligence (OFAI) Thomas Grill Perceptually informed organization of textural sounds OFAI research seminar, 2012-10-23 ringing cheeping gasping smashing piercing peeping whooping tinkling


slide-1
SLIDE 1

Austrian Research Institute for Artificial Intelligence (OFAI)

Perceptually informed

  • rganization of textural sounds

Thomas Grill OFAI research seminar, 2012-10-23

slide-2
SLIDE 2
slide-3
SLIDE 3

ringing cheeping gasping smashing piercing peeping whooping tinkling raucous chattering crooning bellowing sobbing bumping snarling growling pitch crying thumping burping croaking clattering yapping keening splashing yelping rustling volume squealing howling barking sniveling moaning pealing tone rattling grunting clanging coughing quacking whining gagging fizzing wheezing honking hissing bawling trumpeting swishing sneezing rumbling bubbling ripping cooing chirping shouting shuffling tearing popping roaring thunderous scratching snorting crashing crunching cackling tolling clucking silent tapping soothing crowing tranquil melodious cacophonous singing quiet tune loud tinkling noisy rhythmic mumbling twittering din beat blaring cawing racket chattering murmuring whistling clapping booming whispering mewing snapping snoring yelling mooing crackling sighing

slide-4
SLIDE 4

Thomas Grill: Perceptually informed organization of textural sounds

Fundamental questions

  • How can digital sound material be described?
  • How can sounds be organized?
  • How can sounds and collections thereof be visualized?
  • Focus on sampled sound with textural characteristics

4

slide-5
SLIDE 5

Thomas Grill: Perceptually informed organization of textural sounds Saint-Arnaud, N. (1995). Classification of sound textures. Master’s thesis, MIT Media Lab, Cambridge, MA, USA

A sound texture is like wallpaper: it can have local structure and randomness, but the characteristics of the structure and randomness must remain constant on the large scale.

5

Textural sounds

2.2 Working Definition

  • f Sound Textures

First Time Constraint: Constant Long-term Characteristics

A definition for a so~und texture could be quite wide, but we chose to restrict our working definition for many perceptual and conceptual

  • reasons. First of all, there is no consensus among people as to what a

sound texture might be; and more people will accept sounds that fit a more restrictive definition. The first constraint WC! put on our definition

  • f a sound textures

is that it should exhibit similar characteristics

  • ver time; that is, a

two-second snippet of a texture should not differ significantly from another two-second snippet. To use another metaphor, one could say that any two snippets of a sound texture seem to be cut from the same rug [RIC79]. A sound texture is like wallpaper: it can have local structure and randomness, but the characteristics of the structure and randomness must remain constant on the large scale. This means that the pitch should not change like in a racing car, the rhythm should not increase or decrease, etc. This constraint also means that sounds in which the attack plays a great part (like many timbres) cannot be sound textures. A sound texture is characterized by its sustain. Figure 2.2.1 shows an interesting way of segregating sound tex- tures from other sounds, by showing how the “potential information content” increases with time. “Information” is taken here in the cog- nitive sense rather then the information theory sense. Speech or music can provide new information at any time, and their “potential information content” is shown here as a continuously increasing function

  • f time. Textures, on the other hand, have constant long

term characteristics, which translates into a flattening

  • f the potential

information

  • increase. Noise (in the auditory

cognitive sense) has somewhat less information than textures.

FIGURE 2.2.1 Potential Information Content of A Sound Texture vs. Time content

speech music sound texture noise b

time

Sounds that carry a lot of meaning are usually perceived as a

  • message. The semantics take the foremost position in the cognition,

downplaying the characteristics

  • f the sound proper. We choose to

work with sounds which are not primarily perceived as a message.

Chapter 2 Human Perception

  • f Sound Textures

24

slide-6
SLIDE 6

Thomas Grill: Perceptually informed organization of textural sounds

  • G. Strobl, G. Eckel and D. Rocchesso. “Sound Texture Modeling: A Survey”. Proceedings of the 2006 Sound and Music Computing

(SMC) International Conference.

Examples of audio textures

  • Natural sounds: fire, water (rain, waterfall, ocean), wind
  • Animal sounds: sea gulls, crickets, humming
  • Human utterances: babble, chatter
  • Machine sounds: buzz, whir, hammer, grumble, drone, traffic
  • Activity sounds: chip, sweep, rustle, typing, scroop, rasp,

crumple, clap, rub, walking

6

slide-7
SLIDE 7

Examples of audio textures

Thomas Grill: Perceptually informed organization of textural sounds

windspiel1.aif

7

slide-8
SLIDE 8

Thomas Grill: Perceptually informed organization of textural sounds

tiere7.aif

Examples of audio textures

8

slide-9
SLIDE 9

Thomas Grill: Perceptually informed organization of textural sounds Landy, L. (2007). Understanding the Art of Sound Organization. The MIT Press, Cambridge, MA, USA. Truax, B. (2008). Soundscape composition as global music: Electroacoustic music as soundscape. Organised Sound, 13(2):103–109.

Sound-based music and textural sound

  • Sound-based music:

“art form in which the sound and not the musical note is the basic unit.” ➡Acousmatic music and soundscape composition ➡Primacy of listening experience

  • Textural sound as "sound material"

9

slide-10
SLIDE 10

Thomas Grill: Perceptually informed organization of textural sounds

Low Frequency Orchestra plays Robert Lettner: Das Spiel vom Kommen und Gehen

10

slide-11
SLIDE 11

Thomas Grill: Perceptually informed organization of textural sounds

Describing sounds

  • Predominant scheme: Semantic tagging

(sound origin, recording context, etc.)

  • Sonic qualities are equally important/interesting,

especially for abstract sounds or use in acousmatic composition

  • Description ⇨ Organization

11

slide-12
SLIDE 12

Thomas Grill: Perceptually informed organization of textural sounds Grill, Flexer and Cunningham. Identification of perceptual qualities in textural sounds using the repertory grid method. Proceedings of the 6th Audio Mostly Conference, 2011

  • What are the most significant qualities of textural sounds?

➡Repertory grid technique used to elicit qualities (personal constructs) "ex nihilo", for a specific selection of subjects (interviewees) and objects under examination (items)

  • Interviewees (subjects) are asked to name differences

between two randomly chosen sound examples ➡Bipolar qualities spanning range from one sound to the other

Identification of perceptual qualities in textural sounds

12

slide-13
SLIDE 13

Thomas Grill: Perceptually informed organization of textural sounds

Example

13

  • Straight differentiation:

In which ways do two sounds differ?

  • Triads:

Group three objects to form two groups, then name differences between groups

slide-14
SLIDE 14

Thomas Grill: Perceptually informed organization of textural sounds

Repertory Grid for sounds

  • Elicitation of ~10 bipolar constructs per subject
  • Subjects rate all 20 sounds (grades 1 to 5)

using own personal constructs

14

motion textural impulse high excentric evolutionary well-defined regular narrative pitched smooth static coherent continuous low contained repetitive diffused irregular static non-pitched porous

A

4 4 4 1 2 4 4 2 4 3 3

B

5 3 5 5 5 1 3 1 5 2 1

C

4 5 2 2 4
  • 5
5 3 5 5 4

D

4 2 5 4 3 4 4 3 4 2 3

E

2 4 1 1 2 4 1 5 5 3 5

F

1 1 2 2 2
  • 3
2 5 5 4 5

G

5 5 5 5 5 2 1 2 5 1 1

H

4 3 3 1 2 5 1 1 5 2 4

I

4 2 2 2 2 5 2 2 4 1 4

J

2 1 5 3 1
  • 2
5 5 3 5 3

K

5 2 4 4 4 4 3 1 5 4 2

L

1 1 1 3 1
  • 2
1 5 5 5 5

M

4 5 5 1 2 2 3 2 5 3 2

N

3 1 4 4 1 4 4 5 5 4 2

O

4 2 4 3 3
  • 3
5 4 3 5 3

P

2 2 3 3 3 4 5 3 5 5 4

Q

5 5 5 3 5
  • 5
1 1 5 1 1

R

3 3 4 2 3 2 2 3 4 2 3

S

2 2 5 2 3 4 4 4 2 3 2

T

1 1 4 4 1 4 3 2 3 5 2

1 … 5

slide-15
SLIDE 15

Thomas Grill: Perceptually informed organization of textural sounds

  • 16 subjects
  • expert listeners
  • 202 constructs
  • mostly German

high/low

  • rdered/chaotic

15

slide-16
SLIDE 16

Thomas Grill: Perceptually informed organization of textural sounds 16

http://grrrr.org/test/classify

slide-17
SLIDE 17

Thomas Grill: Perceptually informed organization of textural sounds

Inter-rater agreement

17

*nine subjects who took part in the elicitation process Construct Agreement α (core group)* Agreement α (all n ≥ 10) high – low 0.588 0.519

  • rdered – chaotic

0.556 0.447 natural – artificial 0.551 0.492 smooth – coarse 0.527 0.420 tonal – noisy 0.523 0.435 homogeneous – heterogeneous 0.519 0.416 dense – sparse 0.492 0.342 edgy – flowing 0.465 0.376 static – dynamic 0.403 0.383 near – far 0.252 0.249

slide-18
SLIDE 18

Thomas Grill: Perceptually informed organization of textural sounds

Sounds along axis high–low

18

⟵ high low ⟶

slide-19
SLIDE 19

Thomas Grill: Perceptually informed organization of textural sounds

Pearson correlation between constructs

19

slide-20
SLIDE 20

Thomas Grill: Perceptually informed organization of textural sounds

Pearson correlation between constructs

20

slide-21
SLIDE 21

Thomas Grill: Perceptually informed organization of textural sounds

Visualizing sounds in a collection

  • Representation of properties of individual sounds

➡Auditory characteristics

  • Representation of properties of the sound collection

➡Clusters, similarities, dominating characteristics

  • Waveforms and sonograms?

21

slide-22
SLIDE 22

Thomas Grill: Perceptually informed organization of textural sounds Lawrence Marks: On Perceptual Metaphors. Metaphor and Symbolic Activity 11(1), 39–66, 1996

Perceptual metaphors

  • Strong synesthesia

➡very rare, asymmetric, individual

  • Weak synesthesia – cross-modal similarity

22

slide-23
SLIDE 23

Thomas Grill: Perceptually informed organization of textural sounds Wolfgang Köhler, Gestalt psychology,1929

Cross-modal similarity

23

slide-24
SLIDE 24

Thomas Grill: Perceptually informed organization of textural sounds Grill and Flexer: Visualization of perceptual qualities in textural sounds, Proceedings of the ICMC, 2012.

  • Based on most relevant of the previously elicited

personal constructs

  • Layout in two dimensions (screen compatibility)
  • Synesthesia-like mappings from auditory to visual domain

Visualization of perceptual qualities in textural sounds

24

slide-25
SLIDE 25

Thomas Grill: Perceptually informed organization of textural sounds 25

high–low

  • rdered–

chaotic tonal– noisy smooth– coarse homogeneous– heterogeneous

slide-26
SLIDE 26

Thomas Grill: Perceptually informed organization of textural sounds

Online survey: Visualization of textural sounds

26

http://grrrr.org/test/texvis

slide-27
SLIDE 27

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation: Survey B – Dependence on expertise

27

group voters / votes correctness (random: 20%) mean RMS error (random: 0,243) non-musicians, ≥ 20 votes 19 / 876 33,9% 0,178 classical musical training, ≥ 20 votes 29 / 1570 40,0% 0,163 electronic music practice, ≥ 20 votes 48 / 2811 45,2% 0,137 electronic music practice, good listening conditions, ≥ 20 votes 36 / 2019 46,4% 0,133

slide-28
SLIDE 28

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation: Pearson correlation selection to reference

28

Survey B: electronic music practitioners, good listening conditions, ≥ 10 votes

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

high–low

  • rdered–chaotic

smooth–coarse tonal–noisy homogeneous– heterogeneous

selected

high–low

  • rdered–chaotic

smooth–coarse tonal–noisy homogeneous– heterogeneous

reference

0.71

  • 0.24
  • 0.33
  • 0.08
  • 0.22
  • 0.27

0.65 0.43 0.36 0.54

  • 0.34

0.43 0.68 0.54 0.27

  • 0.11

0.37 0.55 0.69 0.20

  • 0.25

0.55 0.26 0.18 0.62

slide-29
SLIDE 29

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation: Mean RMS error vs. decision time

29

5 10 15 20 25 30 35 40 45

time per vote(s)

0.05 0.10 0.15 0.20 0.25 0.30 0.35

avg RMS error

users=94, x-y correlation=-0.13 @ significance(p=0.05)=0.20 mean duration=13.83 (6.65)

slide-30
SLIDE 30

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation: Mean RMS error vs. perceived difficulty

30

0.0 0.2 0.4 0.6 0.8 1.0

perceived difficulty

0.00 0.05 0.10 0.15 0.20 0.25

avg RMS error

sounds=100, x-y correlation=0.484 @ significance(p=0.05)=0.197

slide-31
SLIDE 31

Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012

Perceptually informed high-level descriptors for textural audio

  • Attempt to model the previously elicited personal constructs,

i.e. metaphoric descriptions

  • Build on extensive experimental data from online survey

covering 100 textural sounds

31

slide-32
SLIDE 32

Thomas Grill: Perceptually informed organization of textural sounds

  • G. A. Velasco, N. Holighaus, M. Dörfler, and T. Grill (2011). Constructing an invertible constant-Q transform with nonstationary

Gabor frames. In Proceedings of the 14th International Conference on Digital Audio Effects (DAFx 11), pages 93–99, Paris, France

Perceptually informed high-level descriptors for textural audio

  • Use a uniform underlying time-frequency representation
  • Small number of adjustable parameters for each descriptor
  • Parameters to be tuned, so that the descriptors correlate well

with human perception

32

slide-33
SLIDE 33

Thomas Grill: Perceptually informed organization of textural sounds

Examples for high–low

frequency [Hz]

windspiel1.cut.aiff

frequency [Hz]

steelplantL.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800

33

slide-34
SLIDE 34

Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012

Descriptor for high–low

  • Calculate the mean over time

34

  • Attenuate spectrally
  • Warp the loudness range
  • Warp the frequency axis

and calculate the centroid

slide-35
SLIDE 35

Thomas Grill: Perceptually informed organization of textural sounds

Descriptor tuning

35

c-pulsar1 c-tri-longl f-degraded2l windspiel1 shimmeringdigita who01_46_4

  • ver-cry1

salz-sparsebit.l a-reiben1l atmo2l pulver-01 salz-frq.l tiere1 bolzbund1l cicada03 tiere10 f-degraded1l beat-high-r-01 aero-64kb-15db egrain01 salz-fullbit.l chor-hi bigglassbreaking who01_65 a-prickel1 atmo3l bolzbund2l tiere18 tiere5 env21 flirr vst1a brizzl tiere3 a-reiben2l sinmodl prickel2l folieluft1l prickel1l kidrock-20-3 atmo1l c-flitch tiere12 env17 ns-brrrrr prickel3l tiere15 applaus1 kugelsortier-exp.l tanz-slow noise-mid-r-01 longrisingmusics who02_19_1 schaumknull1l regenhof env19 ns-divers brizzlowl eff-flirr-r-01 machine16 folieknister1l tiere9 schaumriss1l tiere17 schreiben2 ampel-verkehr2 diesel-laut tiere2 longwindywhoosh who01_10_4 tiere11 surfybrightwindw who01_16_2 feed-ghost-r-01 env5 whirlingwhoosh_bonus 72 rush-30-6 machine13 machine1l b-halll radio2m machine14 env3 machine15 mischmaschine-exp.l jetafterburnerki who01_32_1 industrial 04 env11 kuhstall eff-low-r-01 baumaschine leise+brumm noise-ton schritte guns-96-20-6 howlingbreathenh who02_34_1 airlowl steelplantl industrial 01 flaredpass who02_51_1 a-darkns biglowdrone who01_07 raspyexhale_bonus 86 lowbrl rumblesweepwhoos who01_16_5

high low

Weighted Pearson correlation between user data (black) and descriptor values (red)

slide-36
SLIDE 36

Thomas Grill: Perceptually informed organization of textural sounds

Examples for ordered–chaotic

frequency [Hz]

tiere18.cut.aiff

frequency [Hz]

beat-high-r-01.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800

36

slide-37
SLIDE 37

Thomas Grill: Perceptually informed organization of textural sounds

Examples for smooth–coarse

frequency [Hz]

airlowL.cut.aiff

frequency [Hz]

bolzbund1L.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800

37

slide-38
SLIDE 38

Thomas Grill: Perceptually informed organization of textural sounds

Examples for tonal–noisy

frequency [Hz]

flirr.cut.aiff

frequency [Hz]

schaumknull1L.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800

38

slide-39
SLIDE 39

Thomas Grill: Perceptually informed organization of textural sounds

Examples for homogeneous–heterogeneous

frequency [Hz]

leise+brumm.cut.aiff

frequency [Hz]

tiere3.cut.aiff

time [s] time [s]

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800

39

slide-40
SLIDE 40

Thomas Grill: Perceptually informed organization of textural sounds

Evaluation – Weighted Pearson correlation

40

high–low

  • rdered–chaotic

smooth–coarse

perceived qualities computed qualities

tonal–noisy homogeneous– heterogeneous high–low

  • rdered–chaotic

smooth–coarse tonal–noisy homogeneous– heterogeneous

  • 0.11

0.70 0.21 0.24 0.75

  • 0.32

0.53 0.62 0.75 0.41

  • 0.56

0.59 0.75 0.62 0.38

  • 0.09

0.74 0.37 0.35 0.69 0.90

  • 0.42
  • 0.47
  • 0.38
  • 0.33

tuned for individual accuracy

perceived qualities

high–low

  • rdered–chaotic

smooth–coarse tonal–noisy homogeneous– heterogeneous

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

  • 0.08

0.66 0.13 0.14 0.75

  • 0.13

0.43 0.52 0.74 0.37

  • 0.57

0.57 0.74 0.59 0.35

  • 0.12

0.74 0.40 0.39 0.65 0.88

  • 0.36
  • 0.37
  • 0.28
  • 0.31

tuned for mutual independence

slide-41
SLIDE 41

Thomas Grill: Perceptually informed organization of textural sounds

Application: Musical interface – Continuous map of textural sounds

41

http://grrrr.org/data/research/texmap