Measuring & Modeling Musical Expression Douglas Eck University - - PowerPoint PPT Presentation

measuring modeling musical expression
SMART_READER_LITE
LIVE PREVIEW

Measuring & Modeling Musical Expression Douglas Eck University - - PowerPoint PPT Presentation

Measuring & Modeling Musical Expression Douglas Eck University of Montreal Department of Computer Science BRAMS Brain Music and Sound International Laboratory for Brain, Music and Sound Research Overview Task: human-realistic music


slide-1
SLIDE 1

International Laboratory for Brain, Music and Sound Research

Measuring & Modeling Musical Expression

Douglas Eck University of Montreal Department of Computer Science BRAMS Brain Music and Sound

slide-2
SLIDE 2

Douglas Eck douglas.eck@umontreal.ca

Overview

  • Task: human-realistic music performance
  • Challenges:
  • expressive timing and dynamics
  • generating musical variations
  • choosing appropriate timbres (instruments)
  • Today: Learning expressive timing and dynamics for the piano
  • Applications: music generation for film and video games
  • Work done in collaboration with Stanislas Lauly

2

slide-3
SLIDE 3

Douglas Eck douglas.eck@umontreal.ca FRANTIC RESTFUL SPOOKY

An interesting task... context-aware music generation

3

Faster → more dangerous →

VICTORIOUS

slide-4
SLIDE 4

Douglas Eck douglas.eck@umontreal.ca Music Composition (from video game composer) FRANTIC RESTFUL SPOOKY

P e r f

  • r

m a n c e m

  • d

e l

4

Faster → more dangerous →

VICTORIOUS

slide-5
SLIDE 5

Douglas Eck douglas.eck@umontreal.ca Music Composition (from video game composer) FRANTIC RESTFUL SPOOKY

P e r f

  • r

m a n c e m

  • d

e l

5

Faster → more dangerous →

VICTORIOUS

slide-6
SLIDE 6

Douglas Eck douglas.eck@umontreal.ca Music Composition (from video game composer) FRANTIC RESTFUL SPOOKY

P e r f

  • r

m a n c e m

  • d

e l

6

Faster → more dangerous →

VICTORIOUS ????

slide-7
SLIDE 7

Douglas Eck douglas.eck@umontreal.ca

Audio similarity + morphing

  • We can predict words like “sad” and “jazzy” from audio. Resulting

wordset useful for music recommendation (Eck et al. NIPS 07)

  • We can also morph between artists based on word vector similarity
  • Similar technique may allow us to generate a “dangerous” sound based
  • n analysis of songs people think sound dangerous.

7

This work is a part of Sun Labs “Project Aura” recommendation framework.

slide-8
SLIDE 8

Douglas Eck douglas.eck@umontreal.ca

Audio similarity + morphing

  • We can predict words like “sad” and “jazzy” from audio. Resulting

wordset useful for music recommendation (Eck et al. NIPS 07)

  • We can also morph between artists based on word vector similarity
  • Similar technique may allow us to generate a “dangerous” sound based
  • n analysis of songs people think sound dangerous.

7

This work is a part of Sun Labs “Project Aura” recommendation framework.

slide-9
SLIDE 9
  • Many challenges:
  • expressive timing and

dynamics

  • generating musical

variations

  • choosing appropriate

timbres (instruments)

  • Today: Learning

expressive timing and dynamics for the piano

Chopin Etude Opus 10 No 3

slide-10
SLIDE 10

Example: Chopin Etude Opus 10 No 3

Deadpan (no expressive timing or dynamics) Human performance (Recorded on Boesendorfer ZEUS)

Differences limited to:

  • timing (onset, length)
  • velocity (seen as red)
  • pedaling (blue shading)
slide-11
SLIDE 11

Douglas Eck douglas.eck@umontreal.ca

What can we measure?

  • Repp (1989) measured note IOIs in 19 famous recordings of a

Beethoven minuet (Sonata op 31 no 3)

10

  • u

R

T

!

N

!

N

1

  • ?

6 5 .

9 c q

  • 8

8

  • 7
  • 6

.

1

  • *

9 .

8

  • 7
  • .

6 .

M I N U E T

! I I I [ I I [ I I I I I

  • I

I

I E : 3 q 5 6 7 8 9 1 8 t ! 1

  • 1

3 , t q t 5 1 6

B R R N O .

TRIO

I I I I I I I I I I I I I

t7 18 19 aO Et EE E3 Eq

  • 5

g6

  • 7

a8

  • 9

30

3 1 3 E 3 3 3 q 3 5 3 6 3 7 3 8

B R R N O .

CODA

I I I I I I I

3 9 q a q t q E q 3 q q q 5

BRR ND.

F I G . 3 . G r a n d a v e r a g e t i m i n g p a t t e r n

  • f

1 5 h u m a n

  • f

f

  • r

m a n c e s , w i t h r e

  • a

t s p l

  • t

t

  • s

e p a r a t e l y . T h e l

  • t
  • n
  • t
  • n
  • t

i n t e r v a l i n t h e C

  • d

a ( a r r

  • w

) i s 1 5 3 8

ms.

I . R e p e a t s

I t i s e v i d e n t , f i r s t , t h a t r e p e a t s

  • f

t h e s a m e m a t e r i a l h a d

e x t r e m e l y s i m i l a r t i m i n g p a t t e r n s . T h i s c

  • n

s i s t e n c y

  • f

p r

  • f

e s s i

  • n

a l k e y b

  • a

r d p l a y e r s w i t h r e s p e c t t

  • d

e t a i l e d t i m i n g

patterns has been noted many times in the literature, begin-

ning with Seashore ( 1938, p. 244). The

  • nly

systematic de-

v i a t i

  • n

s

  • c

c u r r e d i n b a r 1 a n d a t p h r a s e e n d i n g s ( b a r s 8 , 1 5

  • 16,

and 23/37-24/38), where the music was, in fact, not

i d e n t i c a l a c r

  • s

s r e p e a t s ( s e e F i g . 1 ) : I n b a r 1 , B e e t h

  • v

e n

a d d e d a n

  • r

n a m e n t ( a t u r n

  • n

E

  • f

l a t ) i n t h e r e p e a t ( b a r 1 B ) ,

w h i c h w a s s l i g h t l y d r a w n

  • u

t b y m

  • s

t p i a n i s t s . I n b a r 8 A ,

w h i c h l e d b a c k t

  • t

h e b e g i n n i n g

  • f

t h e M i n u e t , t h e u p b e a t

was prolonged, but in bar 8B, which led into the second

s e c t i

  • n
  • f

t h e M i n u e t , a n a d d i t i

  • n

a l r i t a r d

  • c

c u r r e d

  • n

t h e

p h r a s e

  • f

i n a l ( s e c

  • n

d ) b e a t . S i m i l a r l y , a u n i f

  • r

m r i t a r d w a s p r

  • d

u c e d i n b a r 1 6 A , w h i c h l e d b a c k t

  • t

h e b e g i n n i n g

  • f

t h e

s e c

  • n

d M i n u e t s e c t i

  • n

, a n d a n e v e n s t r

  • n

g e r r i t a r d

  • c

c u r r e d

  • n

t h e p h r a s e

  • f

i n a l ( f i r s t a n d s e c

  • n

d ) n

  • t

e s

  • f

b a r 1 6 B ,

w h i c h c

  • n

s t i t u t e d t h e e n d

  • f

t h e M i n u e t , w h e r e a s t h e t h i r d

note constituted the upbeat to the Trio and was taken shorter. Bar 15 anticipated these changes, which were more

pronounced in the second playing

  • f

the Minuet, following

the Trio. Similarly, bar 37 anticipated the large ritard in bar

6 2 8 J . A c

  • u

s t . S e c . A m . , V

  • l

. 8 8 , N

  • .

2 , A u g u s t 1 9 9 B r u n

  • H

. R e p p : E x p r e s s i v e t i m i n g i n a B e e t h

  • v

e n m i n u e t 6 2 8

Grand average timing patterns of performances with repeats plotted separately. (From B. Repp “Patterns of expressive timing in performances of a Beethoven minuet by nineteen famous pianists”,1990)

slide-12
SLIDE 12

Douglas Eck douglas.eck@umontreal.ca

What can we measure?

  • PCA analysis yields 2 major

components

  • Phrase final lengthening
  • Phrase internal variation
  • Simply taking mean IOIs yields can

yield pleasing performance

  • Reconstructing using principal

component(s) can yield pleasing performance

  • Concluded that timing underlies

musical structure

11

A d a p t e d f r

  • m

R e p p ( 1 9 9 )

  • L00-
  • 000-

600- B00- 700- 600- 500-

FACTOR I

I I I I I I I I

I E 3 q 5 6 7 8 BRR

I I I I I I I

9 10 11 1E 13 lq 15 16

NO. D U R T I O N I N 800 788- 608- 580- 900- 088- 708-

608-

500-

FACTOR 2

./

I I I I I I I I I I I I I I I I

I E :3 q 5 6 7 8 El 18 11 1E 13 lq 15 BRR NO,

FACTOR 3

I [ I I I I I I I I I I I I I I

I E 3 q 5 6 7 8 9 10 11 IE 13 tq 15 16

BRR NO,

were quite rare in the present composition. Eighth-notes

were common but provided less information, since they re-

duced the four-beat pulse to a two-beat pulse. Some mea-

surement problems were also encountered. Nevertheless, some data were

  • btained

about the temporal microstructure

at this level.

  • A. Sixteenth-notes

1. Measurement procedures

Sequences

  • f

two sixteenth-notes

  • ccur

in several places

(bars 7, 20, and 34), but proved very difficult to measure; the

  • nset
  • f the

second note could usually not be found in the

acoustic

  • waveform. Therefore, the measurements

were re-

stricted to single sixteenth-notes following a dotted eighth-

note. Such notes

  • ccur

in bars 0/8A, 1, 4, and 8B/16A of the

Minuet, in bar 23/37

  • f

the Trio, and throughout the Coda.

With four repeats

  • f the

Minuet and two

  • f the

Trio in most

performances, there were generally four independent mea-

sures available for each of the four sixteenth-note occur-

rences in the Minuet and for the single

  • ccurrence

in the Trio

(the latter really being two similar

  • ccurrences,

each repeat-

ed twice). For the Coda,

  • f

course,

  • nly

a single set

  • f

mea-

surements was available for each artist, but there were 11

632 J. Acoust. Sec. Am., Vol. 88, No. 2, August 1990 Bruno H. Repp: Expressive timing in a Beethoven minuet 632

  • L00-
  • 000-

600- B00- 700- 600- 500-

FACTOR I

I I I I I I I I

I E 3 q 5 6 7 8 BRR

I I I I I I I

9 10 11 1E 13 lq 15 16

NO. D U R T I O N I N 800 788- 608- 580- 900- 088- 708-

608-

500-

FACTOR 2

./

I I I I I I I I I I I I I I I I

I E :3 q 5 6 7 8 El 18 11 1E 13 lq 15 BRR NO,

FACTOR 3

I [ I I I I I I I I I I I I I I

I E 3 q 5 6 7 8 9 10 11 IE 13 tq 15 16

BRR NO,

were quite rare in the present composition. Eighth-notes

were common but provided less information, since they re-

duced the four-beat pulse to a two-beat pulse. Some mea-

surement problems were also encountered. Nevertheless, some data were

  • btained

about the temporal microstructure

at this level.

  • A. Sixteenth-notes

1. Measurement procedures

Sequences

  • f

two sixteenth-notes

  • ccur

in several places

(bars 7, 20, and 34), but proved very difficult to measure; the

  • nset
  • f the

second note could usually not be found in the

acoustic

  • waveform. Therefore, the measurements

were re-

stricted to single sixteenth-notes following a dotted eighth-

note. Such notes

  • ccur

in bars 0/8A, 1, 4, and 8B/16A of the

Minuet, in bar 23/37

  • f

the Trio, and throughout the Coda.

With four repeats

  • f the

Minuet and two

  • f the

Trio in most

performances, there were generally four independent mea-

sures available for each of the four sixteenth-note occur-

rences in the Minuet and for the single

  • ccurrence

in the Trio

(the latter really being two similar

  • ccurrences,

each repeat-

ed twice). For the Coda,

  • f

course,

  • nly

a single set

  • f

mea-

surements was available for each artist, but there were 11

632 J. Acoust. Sec. Am., Vol. 88, No. 2, August 1990 Bruno H. Repp: Expressive timing in a Beethoven minuet 632

slide-13
SLIDE 13

Douglas Eck douglas.eck@umontreal.ca

Timing versus expressive dynamics

  • Repp (1997; experiment 2): generated MIDI from audio for 15 famous

performances of Chopin’s op. 10 No 3; Added 9 graduate student performances

  • Retained only timing (no expressive dynamics)
  • Judges ranked the average timing profile of the expert pianists (EA)

highest, followed by E11, S1, S3, S9, S2, and SA.

  • Conclusions:
  • EA, SA sound better than average but “lack individuality” (Repp)
  • Something is lost in discarding non-temporal expressive dynamics.
  • Crucial point: EA and SA sound good

12

slide-14
SLIDE 14

Douglas Eck douglas.eck@umontreal.ca

KTH Model

  • Johan Sundberg, Anders Friberg, many others
  • Models performance of Western music
  • Rule-based system built using
  • analysis-by-synthesis: assess impact of individual rules by listening
  • analysis-by-measurement: fit rules to performance data
  • Incorporates wide range of music perception research (e.g.

meter perception, pitch perception, motor control constraints)

13

slide-15
SLIDE 15

Phrasing Phrase arch Create arch-like tempo and sound level changes over phrases Final ritardando Apply a ritardando in the end of the piece High loud Increase sound level in proportion to pitch height Micro-level timing Duration contrast Shorten relatively short notes and lengthen relatively long notes Faster uphill Increase tempo in rising pitch sequences Metrical patterns and grooves Double duration Decrease duration ratio for two notes with a nominal value of 2:1 Inégales Introduce long-short patterns for equal note values (swing) Articulation Punctuation Find short melodic fragments and mark them with a final micropause Score legato/staccato Articulate legato/staccato when marked in the score Repetition articulation Add articulation for repeated notes. Overall articulation Add articulation for all notes except very short ones Tonal tension Melodic charge Emphasize the melodic tension of notes relatively the current chord Harmonic charge Emphasize the harmonic tension of chords relatively the key Chromatic charge Emphasize regions of small pitch changes Intonation High sharp Stretch all intervals in proportion to size Melodic intonation Intonate according to melodic context Harmonic intonation Intonate according to harmonic context Mixed intonation Intonate using a combination of melodic and harmonic intonation Ensemble timing Melodic sync Synchronize using a new voice containing all relevant onsets Ensemble swing Introduce metrical timing patterns for the instruments in a jazz ensemble Performance noise Noise control Simulate inaccuracies in motor

Table 1.

An overview of the rule system

From: A. Friberg, R. Bresin & J. Sundberg (2006). Overview of the KTH rule system for musical

  • performance. Advances in Cognitive

Psychology, 2(2-3):145-161.

slide-16
SLIDE 16

From: A. Friberg, R. Bresin & J. Sundberg (2006). Overview of the KTH rule system for musical

  • performance. Advances in

Cognitive Psychology, 2(2-3): 145-161.

Figure 2.

The resulting IOI deviations by applying Phrase arch, Duration contrast, Melodic charge, and Punctuation to the Swedish nursery tune “Ekorr’n satt i granen”. All rules were applied with the rule quantity k=1 except the Melodic charge rule that was applied with k=2.

slide-17
SLIDE 17

Douglas Eck douglas.eck@umontreal.ca

Widmer et al. performance model

  • Automatic deduction of rules for music performance
  • Rich feature set (29 attributes including local melodic contour,

scale degree, duration, etc)

  • Performance is matched to score (metrical position).
  • PLCG: Partition Learn Cluster Generalize (Widmer, 2003)
  • Discovery of simple partial rules-based models
  • Inspired by ensemble learning
  • PLCG compares favorably to rule learning algorithm RIPPER
  • Rules learned by PLCG similar to some KTH rules

16

slide-18
SLIDE 18
  • Fig. 5. Mozart Sonata K.331, 1st movement, 1st part, as played by pianist and learner. The curve plots the relative

tempo at each note—notes above the 1.0 line are shortened relative to the tempo of the piece, notes below 1.0 are lengthened. A perfectly regular performance with no timing deviations would correspond to a straight line at y = 1.0.

RULE TL2: abstract_duration_context = equal-longer & metr_strength 1 ⇒ ritardando “Given two notes of equal duration followed by a longer note, lengthen the note (i.e., play it more slowly) that precedes the final, longer one, if this note is in a metrically weak position (‘metrical strength’ 1).”

From: G. Widmer (2003). Discovering simple rules in complex data: A meta- learning algorithm and some surprising musical

  • discoveries. Artificial

Intelligence 146:129-148.

slide-19
SLIDE 19

Douglas Eck douglas.eck@umontreal.ca

Another approach...

  • KTH model has many rule-weighting parameters to set

by hand

  • Widmer improves this by using optimization to set rules
  • Both models rely heavily on score for feature extraction
  • Our goals:
  • Rely less on scores in order to work with non-scored music
  • Treat as a regression task in order to use standard machine

learning techniques

18

slide-20
SLIDE 20

Douglas Eck douglas.eck@umontreal.ca

Relying less on scores...

  • Score provides crucial info about

phrasing and meter

  • But... musical score not always available
  • Jazz, pop, blues use simple scores or none
  • Millions of audio examples available, but

audio-to-score is hard

  • Solution: estimate phrasing and meter

from audio or MIDI (Eck, 2007)

  • In current study we use scores but rely
  • n features (mostly) obtainable using

estimation.

19

  • D. Eck. (2007). Beat tracking using an autocorrelation phase
  • matrix. In Proceedings of the 2007 International Conference on

Acoustics, Speech and Signal Processing (ICASSP), 1313-1316.

slide-21
SLIDE 21

Douglas Eck douglas.eck@umontreal.ca

Treat as regression task ...

y = f(x)

x is a note or set of notes (chord) described by:

Durations (quarter note, half note, ... ) Amplitudes (piano, forte, ...) Accelerations (crescendo, decrescendo, ...) Position in measure Position in phrase

y is expressive deviation described by:

Note velocities Local time deviations (chord spread...) Overall tempo deviation

20

When exact note durations are known (i.e. when a score is available) we use a binary input encoding.

slide-22
SLIDE 22

Douglas Eck douglas.eck@umontreal.ca

Regression algorithms

  • Local variations likely learnable by any good regression method
  • We also want to learn long timescale structure not encoded locally
  • Baseline: recurrent neural network trained using BackProp Through Time
  • Alternative: Deep Belief Network (Hinton et.al.) trained using contrastive

divergence

21

slide-23
SLIDE 23

Douglas Eck douglas.eck@umontreal.ca

Experiment: Learn to Perform Schubert Waltzes

  • 10 highly trained pianists (performance PhD,

University of Montreal Faculty of Music)

  • 5 similar waltzes by Schubert
  • Recorded multiple performances for each pianist on

Bösendorfer ZEUS reproducing imperial grand piano

  • Store as MIDI (note times and durations; pedaling)
  • At least 2 performances per piece per pianist;

for each performance, the piece was repeated

  • 115 total performances; 38284 notes in all

22

slide-24
SLIDE 24

Timing deviations for all 20 performances of a single waltz.

slower faster time (measures) →

Mean values predictions shown as red squares

slide-25
SLIDE 25

Douglas Eck douglas.eck@umontreal.ca

Training and generation

Training:

  • Train algorithms on 4 pieces using MIDI performances

captured from Bösendorfer ZEUS.

  • Ensure generalization using out-of-sample data

Generation:

  • Predict note velocities, local time deviations and overall

tempo deviation for 5th piece

  • Generate machine performance as MIDI from predictions
  • Record performance from MIDI on Bösendorfer ZEUS

24

Pianist pedaling was

  • ignored. We generated

pedaling from note timing

  • profile. (Future work)
slide-26
SLIDE 26

Mean timing deviations (blue) versus predicted deviations (red)

slower faster time (measures) →

Model was not trained on this piece.

slide-27
SLIDE 27

Douglas Eck douglas.eck@umontreal.ca

Discussion

  • Model learned:
  • phrase final lengthening
  • basic waltz feel (“lilt”)
  • voice leading
  • Model did not learn:
  • more complex melodic phrasing
  • good pedaling
  • ability to make “radical” performance (regression to the mean)
  • Baseline algorithm (standard BPTT recurrent network)

performed as well as more complex algorithm (Deep Belief network)

26

slide-28
SLIDE 28

Douglas Eck douglas.eck@umontreal.ca

Conclusions

  • Expressive timing and dynamics can be learned using

straight-forward machine learning approach

  • Score-related information is relatively easy to obtain from

MIDI performance or audio

  • Can form core “performace module” for online music

generation software

27

slide-29
SLIDE 29

Douglas Eck douglas.eck@umontreal.ca

Bibliography

  • B. Repp. (1990). Patterns of expressive timing in performances of a Beethoven minuet by

nineteen famous pianists. Journal of the Acoustical Society of America, 88: 622-641.

  • B. Repp. (1997). The aesthetic quality of a quantitatively average music performance: Two

preliminary experiments. Music Perception, 14: 419-444.

  • A. Friberg, R. Bresin & J. Sundberg (2006). Overview of the KTH rule system for musical
  • performance. Advances in Cognitive Psychology, 2(2-3):145-161.
  • G. Widmer (2003). Discovering simple rules in complex data: A meta-learning algorithm and

some surprising musical discoveries. Artificial Intelligence 146:129-148.

  • C. Raphael (2004). A Bayesian Network for Real-Time Musical Accompaniment, Neural

Information Processing Systems (NIPS) 14.

  • J.F. Paiement, D. Eck, S. Bengio & D. Barber (2005). A graphical model for chord progressions

embedded in a psychoacoustic space. In Proceedings of the 22nd International Conference on Machine Learning (ICML), Bonn, Germany.

  • S. Dixon, W. Goebl & G. Widmer (2002). The performance worm: Real time visualisation of

expression based on Langner’s tempo-loudness animation. In Proceedings of the International Computer Music Conference (ICMC).

  • D. Eck. (2007). Beat tracking using an autocorrelation phase matrix. In Proceedings of the 2007

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1313-1316.

28