Austrian Research Institute for Artificial Intelligence (OFAI)
Perceptually informed
- rganization of textural sounds
Perceptually informed organization of textural sounds Research - - PowerPoint PPT Presentation
Austrian Research Institute for Artificial Intelligence (OFAI) Thomas Grill Perceptually informed organization of textural sounds Research context Towards automatic annotation of electroacoustic music (20082010) Audiominer
Austrian Research Institute for Artificial Intelligence (OFAI)
Thomas Grill: Perceptually informed organization of textural sounds
(2008–2010)
manipulation of sound objects (2010–2013)
audio streams (2013–2016)
2
Thomas Grill: Perceptually informed organization of textural sounds Smalley D.: Klien V., Grill T., Flexer A.: On automated annotation of acousmatic music, Journal of New Music Research, 2012
3
John Chowning: Turenas
Thomas Grill: Perceptually informed organization of textural sounds Holighaus N., Dörfler M., Velasco G. A. and Grill T.: A framework for invertible, real-time constant-Q transforms, IEEE Transactions on Audio, Speech and Language Processing, 2013.
4
(aka STFT) – invertible CQ-NSGT
Thomas Grill: Perceptually informed organization of textural sounds
5
applicability (perceptually informed)
measurable quantities
Thomas Grill: Perceptually informed organization of textural sounds
visualized?
6
Thomas Grill: Perceptually informed organization of textural sounds
(SMC) International Conference.
crumple, clap, rub, walking
7
Thomas Grill: Perceptually informed organization of textural sounds 8 0.0 0.5 1.0 1.5 2.0 2.5 time [s] 100 200 400 800 1600 3200 6400 12800 frequency [Hz]
Thomas Grill: Perceptually informed organization of textural sounds
9 0.0 0.5 1.0 1.5 2.0 2.5 time [s] 100 200 400 800 1600 3200 6400 12800 frequency [Hz]
Thomas Grill: Perceptually informed organization of textural sounds Saint-Arnaud, N. (1995). Classification of sound textures. Master’s thesis, MIT Media Lab, Cambridge, MA, USA
A sound texture is like wallpaper: it can have local structure and randomness, but the characteristics of the structure and randomness must remain constant on the large scale.
10
2.2 Working Definition
First Time Constraint: Constant Long-term Characteristics
A definition for a so~und texture could be quite wide, but we chose to restrict our working definition for many perceptual and conceptual
sound texture might be; and more people will accept sounds that fit a more restrictive definition. The first constraint WC! put on our definition
is that it should exhibit similar characteristics
two-second snippet of a texture should not differ significantly from another two-second snippet. To use another metaphor, one could say that any two snippets of a sound texture seem to be cut from the same rug [RIC79]. A sound texture is like wallpaper: it can have local structure and randomness, but the characteristics of the structure and randomness must remain constant on the large scale. This means that the pitch should not change like in a racing car, the rhythm should not increase or decrease, etc. This constraint also means that sounds in which the attack plays a great part (like many timbres) cannot be sound textures. A sound texture is characterized by its sustain. Figure 2.2.1 shows an interesting way of segregating sound tex- tures from other sounds, by showing how the “potential information content” increases with time. “Information” is taken here in the cog- nitive sense rather then the information theory sense. Speech or music can provide new information at any time, and their “potential information content” is shown here as a continuously increasing function
term characteristics, which translates into a flattening
information
cognitive sense) has somewhat less information than textures.
FIGURE 2.2.1 Potential Information Content of A Sound Texture vs. Time content
speech music sound texture noise b
time
Sounds that carry a lot of meaning are usually perceived as a
downplaying the characteristics
work with sounds which are not primarily perceived as a message.
Chapter 2 Human Perception
24
Thomas Grill: Perceptually informed organization of textural sounds Landy, L. (2007). Understanding the Art of Sound Organization. The MIT Press, Cambridge, MA, USA. Truax, B. (2008). Soundscape composition as global music: Electroacoustic music as soundscape. Organised Sound, 13(2):103–109.
“art form in which the sound and not the musical note is the basic unit.” ➡Acousmatic music and soundscape composition
11
Thomas Grill: Perceptually informed organization of textural sounds
Low Frequency Orchestra plays Robert Lettner: Das Spiel vom Kommen und Gehen
12
Thomas Grill: Perceptually informed organization of textural sounds
(sound origin, recording context, etc.)
especially for abstract sounds or use in sound design etc.
14
Thomas Grill: Perceptually informed organization of textural sounds Grill, Flexer and Cunningham. Identification of perceptual qualities in textural sounds using the repertory grid method. Proceedings of the 6th Audio Mostly Conference, 2011
➡Repertory grid technique used to elicit qualities (personal constructs) "ex nihilo", for a specific selection of subjects (interviewees) and objects under examination (items)
between two randomly chosen sound examples ➡Bipolar qualities spanning range from one sound to the other
15
Thomas Grill: Perceptually informed organization of textural sounds
16
In which ways do two sounds differ?
Group three objects to form two groups, then name differences between groups
Thomas Grill: Perceptually informed organization of textural sounds
using own personal constructs
17
motion textural impulse high excentric evolutionary well-defined regular narrative pitched smooth static coherent continuous low contained repetitive diffused irregular static non-pitched porousA
4 4 4 1 2 4 4 2 4 3 3B
5 3 5 5 5 1 3 1 5 2 1C
4 5 2 2 4D
4 2 5 4 3 4 4 3 4 2 3E
2 4 1 1 2 4 1 5 5 3 5F
1 1 2 2 2G
5 5 5 5 5 2 1 2 5 1 1H
4 3 3 1 2 5 1 1 5 2 4I
4 2 2 2 2 5 2 2 4 1 4J
2 1 5 3 1K
5 2 4 4 4 4 3 1 5 4 2L
1 1 1 3 1M
4 5 5 1 2 2 3 2 5 3 2N
3 1 4 4 1 4 4 5 5 4 2O
4 2 4 3 3P
2 2 3 3 3 4 5 3 5 5 4Q
5 5 5 3 5R
3 3 4 2 3 2 2 3 4 2 3S
2 2 5 2 3 4 4 4 2 3 2T
1 1 4 4 1 4 3 2 3 5 21 … 5
Thomas Grill: Perceptually informed organization of textural sounds
high/low
18
Thomas Grill: Perceptually informed organization of textural sounds 19
http://grrrr.org/test/classify
Thomas Grill: Perceptually informed organization of textural sounds
20
*nine subjects who took part in the elicitation process Construct Agreement α (core group)* Agreement α (all n ≥ 10) high – low 0.588 0.519
0.556 0.447 natural – artificial 0.551 0.492 smooth – coarse 0.527 0.420 tonal – noisy 0.523 0.435 homogeneous – heterogeneous 0.519 0.416 dense – sparse 0.492 0.342 edgy – flowing 0.465 0.376 static – dynamic 0.403 0.383 near – far 0.252 0.249
Thomas Grill: Perceptually informed organization of textural sounds
21
⟵ high low ⟶
Thomas Grill: Perceptually informed organization of textural sounds
22
Thomas Grill: Perceptually informed organization of textural sounds
23
Thomas Grill: Perceptually informed organization of textural sounds
➡Auditory (or semantic) characteristics
➡Clusters, similarities, principal characteristics
24
Thomas Grill: Perceptually informed organization of textural sounds Lawrence Marks: On Perceptual Metaphors. Metaphor and Symbolic Activity 11(1), 39–66, 1996
➡very rare, asymmetric, individual
25
Thomas Grill: Perceptually informed organization of textural sounds Wolfgang Köhler, Gestalt psychology,1929
26
Thomas Grill: Perceptually informed organization of textural sounds Grill and Flexer: Visualization of perceptual qualities in textural sounds, Proceedings of the ICMC, 2012.
personal constructs
27
Thomas Grill: Perceptually informed organization of textural sounds 28
high–low
chaotic tonal– noisy smooth– coarse homogeneous– heterogeneous
Thomas Grill: Perceptually informed organization of textural sounds
29
http://grrrr.org/test/texvis
Thomas Grill: Perceptually informed organization of textural sounds
30
group voters / votes correctness (random: 20%) mean RMS error (random: 0,243) non-musicians, ≥ 20 votes 19 / 876 33,9% 0,178 classical musical training, ≥ 20 votes 29 / 1570 40,0% 0,163 electronic music practice, ≥ 20 votes 48 / 2811 45,2% 0,137 electronic music practice, good listening conditions, ≥ 20 votes 36 / 2019 46,4% 0,133
Thomas Grill: Perceptually informed organization of textural sounds
31
Survey B: electronic music practitioners, good listening conditions, ≥ 10 votes
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
high–low
smooth–coarse tonal–noisy homogeneous– heterogeneous
selected
high–low
smooth–coarse tonal–noisy homogeneous– heterogeneous
reference
0.71
0.65 0.43 0.36 0.54
0.43 0.68 0.54 0.27
0.37 0.55 0.69 0.20
0.55 0.26 0.18 0.62
Thomas Grill: Perceptually informed organization of textural sounds
32
5 10 15 20 25 30 35 40 45
time per vote(s)
0.05 0.10 0.15 0.20 0.25 0.30 0.35
avg RMS error
users=94, x-y correlation=-0.13 @ significance(p=0.05)=0.20 mean duration=13.83 (6.65)
Thomas Grill: Perceptually informed organization of textural sounds
33
0.0 0.2 0.4 0.6 0.8 1.0
perceived difficulty
0.00 0.05 0.10 0.15 0.20 0.25
avg RMS error
sounds=100, x-y correlation=0.484 @ significance(p=0.05)=0.197
Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012
i.e. metaphoric descriptions ➞ audio descriptors for high–low, ordered–chaotic, smooth–coarse, tonal–noisy, homogeneous–heterogeneous
covering 100 textural sounds
34
Thomas Grill: Perceptually informed organization of textural sounds Holighaus N., Dörfler M., Velasco G. A. and Grill T.: A framework for invertible, real-time constant-Q transforms, IEEE Transactions
with human perception
35
Thomas Grill: Perceptually informed organization of textural sounds
frequency [Hz]
windspiel1.cut.aiff
frequency [Hz]
steelplantL.cut.aiff
time [s] time [s]
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800
36
Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012
37
and calculate the centroid
Thomas Grill: Perceptually informed organization of textural sounds
38
c-pulsar1 c-tri-longl f-degraded2l windspiel1 shimmeringdigita who01_46_4
salz-sparsebit.l a-reiben1l atmo2l pulver-01 salz-frq.l tiere1 bolzbund1l cicada03 tiere10 f-degraded1l beat-high-r-01 aero-64kb-15db egrain01 salz-fullbit.l chor-hi bigglassbreaking who01_65 a-prickel1 atmo3l bolzbund2l tiere18 tiere5 env21 flirr vst1a brizzl tiere3 a-reiben2l sinmodl prickel2l folieluft1l prickel1l kidrock-20-3 atmo1l c-flitch tiere12 env17 ns-brrrrr prickel3l tiere15 applaus1 kugelsortier-exp.l tanz-slow noise-mid-r-01 longrisingmusics who02_19_1 schaumknull1l regenhof env19 ns-divers brizzlowl eff-flirr-r-01 machine16 folieknister1l tiere9 schaumriss1l tiere17 schreiben2 ampel-verkehr2 diesel-laut tiere2 longwindywhoosh who01_10_4 tiere11 surfybrightwindw who01_16_2 feed-ghost-r-01 env5 whirlingwhoosh_bonus 72 rush-30-6 machine13 machine1l b-halll radio2m machine14 env3 machine15 mischmaschine-exp.l jetafterburnerki who01_32_1 industrial 04 env11 kuhstall eff-low-r-01 baumaschine leise+brumm noise-ton schritte guns-96-20-6 howlingbreathenh who02_34_1 airlowl steelplantl industrial 01 flaredpass who02_51_1 a-darkns biglowdrone who01_07 raspyexhale_bonus 86 lowbrl rumblesweepwhoos who01_16_5
high low
Weighted Pearson correlation between user data (black) and descriptor values (red)
Thomas Grill: Perceptually informed organization of textural sounds
frequency [Hz]
tiere18.cut.aiff
frequency [Hz]
beat-high-r-01.cut.aiff
time [s] time [s]
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800
39
Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012
40
a Gaussian kernel
comparing with shifted slices
repetitions
˘ ct,f = ˆ ct,f −
f (ˆ
ct,f) ⋆ Gσ(t)
wn(t′ − (im + ν)) ˘ ct′,f δi,ν = Mξ
t,f |si,ν,t,f − si,0,t,f|
¯ δi,ν = δi,ν − M1
ν (δi,ν)
γi = Mη
ν
δi+1,ν − ¯ δi,ν
γi = γi · Mα
ν (δi,ν) · Mα ν (δi+1,ν)
Dordered–chaotic = log M1
i (˜
γi)
Thomas Grill: Perceptually informed organization of textural sounds
frequency [Hz]
airlowL.cut.aiff
frequency [Hz]
bolzbund1L.cut.aiff
time [s] time [s]
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800
41
Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012
42
the magnitudes
axis,boosting the temporal contrast
f
t ˆ
t (δt)
Thomas Grill: Perceptually informed organization of textural sounds
frequency [Hz]
flirr.cut.aiff
frequency [Hz]
schaumknull1L.cut.aiff
time [s] time [s]
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800
43
Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012
44
compress loudness
boost spectral contrast, take logarithm
t (ˆ
f (βf)
Thomas Grill: Perceptually informed organization of textural sounds
frequency [Hz]
leise+brumm.cut.aiff
frequency [Hz]
tiere3.cut.aiff
time [s] time [s]
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 100 200 400 800 1600 3200 6400 12800
45
Thomas Grill: Perceptually informed organization of textural sounds Thomas Grill: Constructing high-level perceptual audio descriptors for textural sounds. Proceedings of the 9th Sound and Music Computing Conference, 2012
46
by taking absolute DFTs over time
slices with adjustable sensitivity
take the logarithm
si,t,f =
wn(t′ − im) ˆ ct′,f ¯ si,t,f = si,t,f Mξ
t′,f ′(si,t′,f ′)
ˆ si,ρ,f =
t (¯
si,t,f)
si,ρ,f = ˆ si,ρ,f attΨ(ρ) δi = Mη
ρ,f |˜
si+1,ρ,f − ˜ si,ρ,f| Dhomogeneous–heterogeneous = log M1
i (δi)
Thomas Grill: Perceptually informed organization of textural sounds
47
high–low
smooth–coarse
perceived qualities computed qualities
tonal–noisy homogeneous– heterogeneous high–low
smooth–coarse tonal–noisy homogeneous– heterogeneous
0.70 0.21 0.24 0.75
0.53 0.62 0.75 0.41
0.59 0.75 0.62 0.38
0.74 0.37 0.35 0.69 0.90
tuned for individual accuracy
perceived qualities
high–low
smooth–coarse tonal–noisy homogeneous– heterogeneous
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.66 0.13 0.14 0.75
0.43 0.52 0.74 0.37
0.57 0.74 0.59 0.35
0.74 0.40 0.39 0.65 0.88
tuned for mutual independence
Thomas Grill: Perceptually informed organization of textural sounds
48
http://grrrr.org/data/research/texmap